There are all sorts of changes one could imagine being made to how LLMs are trai...

There are all sorts of changes one could imagine being made to how LLMs are trained and run, but if you are asking about what actually exists today, then:

1) At runtime, when you feed a "request" (prompt) into the model, the model will use a fixed amount of compute/time to generate each word of output. There is no looping going on internally - just a fixed number of steps to generate each word. Giving it more or less processing power at runtime will not change the output, just how fast that output is generated.

If you, as a user, are willing to take more time (and spend more money) to get a better answer, then a trick that often works is to take the LLM's output and feed it back in as a request, just asking the LLM to refine/reword it. You can do this multiple times.

2) At training time, for a given size of model and given set of training data, there is essentially an optimal amount of time to train for (= amount of computing power and time taken to train). Train for too short a time and the model won't have learnt all that it could. Train for too long a time (repeating the training data), and the model will start to memorize the training set rather than generalize from it, meaning that the model is getting worse.