Skip to main content


Surprising LLM reasoning failures


Wrote a post some time back on reasoning failures that current LLMs exhibit, and how I think we still need some qualitative breakthroughs to get from them to something that can respond to genuinely novel situations. (Some of the commenters on my post disagree. Are they right? Maybe!)

There seem to be multiple different issues:

* Applying stereotyped patterns that don’t make sense in context [...]
* Not knowing which pattern to apply [...]
* Entirely lacking an appropriate pattern in the first place
* Some kind of failure of spatial reasoning [...]

There’s a thing that sometimes happens to me, where I obviously know something, but then an old way of reacting is so strongly cached that it temporarily overrides my knowledge. For example, one time the electricity was out, and I thought to myself something along the lines of “well no matter, if the electricity is out then I’ll just play some video game until- oh.”

Some of the [...] examples feel to me a lot like that. [...] It’s like [the LLM] has some established pattern that it automatically applies in a given situation, and then it doesn’t stop to evaluate whether that makes sense. [...]

The funny thing is that it would be trivial to fine-tune the relevant LLMs to catch [the specific examples I discuss in the post]. [...] There seems to be some quality of actually looking at the problem that current LLMs are failing to exhibit, that would be necessary for good performance on novel tasks. [...]

If a system runs into a situation that vaguely resembles one it has encountered before and just applies a stereotyped behavior without noticing how the situation is different, it will inevitably fail.