This is the problem with LLM researchers all but giving up on the problem of ins...

codeulike · 2024-11-22T12:24:53 1732278293

Here's an article where they teach an LLM Othello and then probe its internal state to assess whether it is 'modelling' the Othello board internally

https://thegradient.pub/othello/

Associated paper: https://arxiv.org/abs/2210.13382

mattmcknight · 2024-11-22T12:58:08 1732280288

It's weird because it is not a black box at the lowest level, we can see exactly what all of the weights are doing. It's just too complex for us to understand it.

What is difficult is finding some intermediate pattern in between there which we can label with an abstraction that is compatible with human understanding. It may not exist. For example, it may be more like how our brain works to produce language than it is like a logical rule based system. We occasionally say the wrong word, skip a word, spell things wrong...violate the rules of grammar.

The inputs and outputs of the model are human language, so at least there we know the system as a black box can be characterized, if not understood.

_heimdall · 2024-11-22T14:09:29 1732284569

> The inputs and outputs of the model are human language, so at least there we know the system as a black box can be characterized, if not understood.

This is actually where the AI safety debates tend to lose. From where I sit we can't characterize the black box itself, we can only characterize the outputs themselves.

More specifically, we can decide what we think the quality of the output for the given input and we can attempt to infer what might have happened in between. We really have no idea what happened in between, and though many of the "doomers" raise concerns that seem far fetched, we have absolutely no way of understanding whether they are completely off base or raising concerns of a system that just hasn't shown problems in the input/output pairs yet.

raincole · 2024-11-22T16:32:04 1732293124

> we have absolutely no way to know

To me, this means that it absolutely doesn't matter whether LLM does reason or not.

_heimdall · 2024-11-22T16:40:41 1732293641

It might if AI/LLM safety is a concern. We can't begin to really judge safety without understanding how they work internally.

lukeschlather · 2024-11-22T13:35:06 1732282506

> (a) the LLM does reason through the rules and understands what moves are legal or (b) was trained on a large set of legal moves and therefore only learned to make legal moves.

How can you learn to make legal moves without understanding what moves are legal?

_heimdall · 2024-11-22T14:06:20 1732284380

I'm spit balling here so definitely take this with a grain of salt.

If I only see legal moves, I may not think outside the box come up with moves other than what I already saw. Humans run into this all the time, we see things done a certain and effectively learn that that's just how to do it and we don't innovate.

Said differently, if the generative AI isn't actually being generative at all, meaning its just predicting based on the training set, it could be providing only legal moves without ever learning or understanding the rules of the game.

ramraj07 · 2024-11-22T14:13:32 1732284812

I think they’ll acknowledge these models are truly intelligent only when the LLMs also irrationally go circles around logic to insist LLMs are statistical parrots.

_heimdall · 2024-11-22T14:59:12 1732287552

Acknowledging an LLM is intelligent requires a general agreement of what intelligence is and how to measure it. I'd also argue that it requires a way of understanding how an LLM comes to its answer rather than just inputs and outputs.

To me that doesn't seem unreasonable and has nothing to do with irrationally going in circles, curious if you disagree though.

Retric · 2024-11-22T15:43:37 1732290217

Humans judge if other humans are intelligent without going into philosophical circles.

How well they learn completely novel tasks (fail in conversation, pass with training). How well they do complex tasks (debated just look at this thread). How generally knowledgeable they are (pass). How often they do non sensical things (fail).

So IMO it really comes down if you’re judging by peak performances or minimum standards. If I had an employee that preformed as well as an LLM I’d call them an idiot because they needed constant supervision for even trivial tasks, but that’s not the standard everyone is using.

_heimdall · 2024-11-22T16:49:24 1732294164

> Humans judge if other humans are intelligent without going into philosophical circles

That's totally fair. I expect that to continue to work well when kept in the context of something/someone else that is roughly as intelligent as you are. Bonus points for the fact that one human understands what it means to be human and we all have roughly similar experiences of reality.

I'm not so sure if that kind of judging intelligence by feel works when you are judging something that is (a) totally different from your or (b) massively more (or less) intelligent than you are.

For example, I could see something much smarter than me as acting irrationally when in reality they may be working with a much larger or complex set of facts and context that don't make sense to me.