Really interesting findings around fine-tuning. Goes to show it doesn't really affect the deeper "functionality" of the LLM (if you think of the LLM running a set of small functions on very high-dimensional numbers to produce a token).
Using regurgitation to get around the assistant/user token separation is another fun tool for the toolbox, relevant for whenever you want a model that doesn't support continuation actually perform continuation (at the cost of a lot of latency).
I wonder if any type of reflection or chains of thought would help it play better. I wouldn't be surprised if getting the LLM to write an analysis of the game in English is more likely to move it out of distribution than to make it pick better chess moves.
Using regurgitation to get around the assistant/user token separation is another fun tool for the toolbox, relevant for whenever you want a model that doesn't support continuation actually perform continuation (at the cost of a lot of latency).
I wonder if any type of reflection or chains of thought would help it play better. I wouldn't be surprised if getting the LLM to write an analysis of the game in English is more likely to move it out of distribution than to make it pick better chess moves.