I'm still figuring out "inference time" but what left me puzzled at first was that there is - to humans at least - an infinite amount of tokens that might come next, technical jargon, synonyms, lexical levels in general, so in my mind there was an RNG build into the function, that, after "filtering" the weights based on the user request - and a lot of different tokens, even those meaning the same or almost the same have the same weights - simply rolled the dice to produce the return string.
I thought the LLM was "getting to know the user" but it had it a short memory span (the context) and thus "forgot" already calculated weights that it would use to (re)generate new weights.
Further down I learned it freaking forgets all the previous weights in general (I think that's what I learned, I'm getting there)
I thought the LLM was "getting to know the user" but it had it a short memory span (the context) and thus "forgot" already calculated weights that it would use to (re)generate new weights.
Further down I learned it freaking forgets all the previous weights in general (I think that's what I learned, I'm getting there)