Customize

Quick Takes

dr_s

Do LLMs really have beliefs? Or goals? People not working with LLMs often say things like "nope, they just follow stochastic patterns in the data, matrices of floats don't have beliefs or goals". People on LessWrong could, I think, claim something like "they have beliefs, and to what extent they have goals is a very important empirical question". Here's my attempt at writing a concise decent quality answer the second group could give to the first. Analogy I find helpful: a houseplant Consider a houseplant. Its leaves are directed towards the window. If you rotate the plant 180 degrees, in a few days it will adjust its leaves to face the sun again. Now, does the plant know where the sun shines from? On one hand, it doesn't have a brain, neurons, or anything like that - it doesn't "know" things in any way similar to what we call knowledge in humans. But, on the other hand: if you don't know where the sun shines from, you won't reliably move your leaves so that they face it. Quasi-beliefs David Chalmers defines quasi-belief in the following way (not an exact quote): That is: you observe some behavior of an LLM. If you could say "Entity with a belief X would behave that way", then you can also say the LLM has a quasi-belief X. Or, when you see leaves rotating towards the sun, you can say the plant has a quasi-belief about the sun's direction. Same goes for goals, or any other features we attribute to humans (including e.g. feelings). (Note: this is very close to Daniel Dennett's intentional stance) So, for example: Does ChatGPT have a belief that Paris is the capital of France? Well, it very clearly has at least a quasi-belief, as in many different contexts it behaves the way an entity believing Paris is the capital of France would behave. Do LLMs have quasi-[attribute] or [attribute]? Do LLMs have beliefs, or only quasi-beliefs? Do LLMs have goals, or only quasi-goals? Well, I think from the point of view of e.g. AI safety, these questions are just not

Nikola Jurkovic22h232

StanislavKrym, Cole Wyeth

I very roughly polled METR staff (using Fatebook) what the 50% time horizon will be by EOY 2026, conditional on METR reporting something analogous to today's time horizon metric. I got the following results: 29% average probability that it will surpass 32 hours. 68% average probability that it will surpass 16 hours. The first question got 10 respondents and the second question got 12. Around half of the respondents were technical researchers. I expect the sample to be close to representative, but maybe a bit more short-timelines than the rest of METR staff. The average probability that the question doesn't resolve AMBIGUOUS is somewhere around 60%.

Hastings3h30

Dumb solution to the insane domestic shipping situation: allow US companies to declare their loading docks to be chinese embassies and thus get the E package shipping rates. Non dumb solutions wanted.

leogao4d*9116

DirectedEvolution, shawnghu

running the agi survey really reminded me just how brutal statistical significance is, and how unreliable anecdotes are. even setting aside sampling bias of anecdotes, the sheer sample size you need to answer a question like "do more people this year know what agi is than last year" is kind of depressing - you need like 400 samples for each year just to be 80% sure you'd notice a 10 percentage point increase even if it did exist, and even if there was no real effect you'd still think there was one 5% of the time. this makes me a lot more bearish on vibes in general.

faul_sname2d46-6

Caleb Biddulph, anaguma

RL capability gains might mostly come from better self-elicitation. Ran across a paper NUDGING: Inference-time Alignment of LLMs via Guided Decoding. The authors took a base model and a post-trained model. They had the base model try to answer benchmark questions, found the positions where the base model was least certain, and replaced specifically those tokens with tokens from the post-trained model. The base model, so steered, performed surprisingly well on benchmarks. Surprisingly (to me at least), the tokens changed tended to be transitional phrases rather than the meat of the specific problems. Example from the paper: This worked even when the post-trained model was significantly smaller than the base model: on gsm8k, llama-2-7b-chat "nudging" llama-2-70b (base) scored 46.2 on gsm8k, while 7b-chat alone scored 25.5. 70b-chat barely scored better, at 48.5. Surprisingly, I haven't seen much discussion of this paper on here. It seems very relevant to the question of whether RL bakes new behaviors into models or makes them better at eliciting behaviors they already know how to execute in appropriate situations. I am tempted to do a longer writeup and attempt to reproduce/extend the paper, if there's interest.

Cole Wyeth2d252

Amalthea, interstice

OpenAI claims 5.2 solved an open COLT problem with no assistance: https://openai.com/index/gpt-5-2-for-science-and-math/ This might be the first thing that meets my bar of autonomously having an original insight??

Viliam5d7124

Gunnar_Zarncke, Steven Byrnes, and 3 more

Do we have some page containing resources for rationalist parents, or generally for parents of smart children? Such as recommended books, toys, learning apps, etc. I found tag https://www.lesswrong.com/w/parenting but I was hoping for some kind of best textbooks / recommendations / reference works but for parents/children.

Your Feed