Do LLMs really have beliefs? Or goals?
People not working with LLMs often say things like "nope, they just follow stochastic patterns in the data, matrices of floats don't have beliefs or goals". People on LessWrong could, I think, claim something like "they have beliefs, and to what extent they have goals is a very important empirical question".
Here's my attempt at writing a concise decent quality answer the second group could give to the first.
Analogy I find helpful: a houseplant
Consider a houseplant. Its leaves are directed towards the window. If you rotate the plant 180 degrees, in a few days it will adjust its leaves to face the sun again.
Now, does the plant know where the sun shines from? On one hand, it doesn't have a brain, neurons, or anything like that - it doesn't "know" things in any way similar to what we call knowledge in humans. But, on the other hand: if you don't know where the sun shines from, you won't reliably move your leaves so that they face it.
Quasi-beliefs
David Chalmers defines quasi-belief in the following way (not an exact quote):
That is: you observe some behavior of an LLM. If you could say "Entity with a belief X would behave that way", then you can also say the LLM has a quasi-belief X. Or, when you see leaves rotating towards the sun, you can say the plant has a quasi-belief about the sun's direction.
Same goes for goals, or any other features we attribute to humans (including e.g. feelings).
(Note: this is very close to Daniel Dennett's intentional stance)
So, for example: Does ChatGPT have a belief that Paris is the capital of France? Well, it very clearly has at least a quasi-belief, as in many different contexts it behaves the way an entity believing Paris is the capital of France would behave.
Do LLMs have quasi-[attribute] or [attribute]?
Do LLMs have beliefs, or only quasi-beliefs? Do LLMs have goals, or only quasi-goals? Well, I think from the point of view of e.g. AI safety, these questions are just not