Customize

Quick Takes

I very roughly polled METR staff (using Fatebook) what the 50% time horizon will be by EOY 2026, conditional on METR reporting something analogous to today's time horizon metric. I got the following results: 29% average probability that it will surpass 32 hours. 68% average probability that it will surpass 16 hours. The first question got 10 respondents and the second question got 12. Around half of the respondents were technical researchers. I expect the sample to be close to representative, but maybe a bit more short-timelines than the rest of METR staff. The average probability that the question doesn't resolve AMBIGUOUS is somewhere around 60%.

faul_sname1d45-7

Caleb Biddulph, anaguma

RL capability gains might mostly come from better self-elicitation. Ran across a paper NUDGING: Inference-time Alignment of LLMs via Guided Decoding. The authors took a base model and a post-trained model. They had the base model try to answer benchmark questions, found the positions where the base model was least certain, and replaced specifically those tokens with tokens from the post-trained model. The base model, so steered, performed surprisingly well on benchmarks. Surprisingly (to me at least), the tokens changed tended to be transitional phrases rather than the meat of the specific problems. Example from the paper: This worked even when the post-trained model was significantly smaller than the base model: on gsm8k, llama-2-7b-chat "nudging" llama-2-70b (base) scored 46.2 on gsm8k, while 7b-chat alone scored 25.5. 70b-chat barely scored better, at 48.5. Surprisingly, I haven't seen much discussion of this paper on here. It seems very relevant to the question of whether RL bakes new behaviors into models or makes them better at eliciting behaviors they already know how to execute in appropriate situations. I am tempted to do a longer writeup and attempt to reproduce/extend the paper, if there's interest.

leogao3d*9116

DirectedEvolution, shawnghu

running the agi survey really reminded me just how brutal statistical significance is, and how unreliable anecdotes are. even setting aside sampling bias of anecdotes, the sheer sample size you need to answer a question like "do more people this year know what agi is than last year" is kind of depressing - you need like 400 samples for each year just to be 80% sure you'd notice a 10 percentage point increase even if it did exist, and even if there was no real effect you'd still think there was one 5% of the time. this makes me a lot more bearish on vibes in general.

Cole Wyeth1d222

Amalthea, interstice

OpenAI claims 5.2 solved an open COLT problem with no assistance: https://openai.com/index/gpt-5-2-for-science-and-math/ This might be the first thing that meets my bar of autonomously having an original insight??

Parker Conley1h10

An AI content X/Twitter account with nearly 100k followers blocked me, and I got a couple of disapproving replies for pointing out that the account was AI-generated. I quote-tweeted the account mostly to share a useful Chrome Extension that I've been using the detect AI content, but I was surprised that there was a negative reaction in the form of a few replies pointing out the account was AI-generated. I am neither pro- nor anti-AI accounts, but being aware of the nature of the content seems to be useful. Would be curious to hear others' thoughts on the phenomenon. Original Tweet: https://x.com/parconley/status/2000064102543376413

Viliam4d7024

Gunnar_Zarncke, Steven Byrnes, and 3 more

Do we have some page containing resources for rationalist parents, or generally for parents of smart children? Such as recommended books, toys, learning apps, etc. I found tag https://www.lesswrong.com/w/parenting but I was hoping for some kind of best textbooks / recommendations / reference works but for parents/children.

Annabelle2d3216

DirectedEvolution, Dagon, and 1 more

One theme I've been thinking about recently is how bids for connection and understanding are often read as criticism. For example: Person A shares a new idea, feeling excited and hoping to connect with Person B over something they've worked hard on and hold dear. Person B asks a question about a perceived inconsistency in the idea, feeling excited and hoping for an answer which helps them better understand the idea (and Person B). Person A feels hurt and unfairly rejected by Person B. Specifically, Person A feels like Person B isn't willing to give their sincere idea (and effort to connect) a chance, so shuts down and labels Person B as an idea-hater. Person B feels hurt and unfairly rejected by Person A. Specifically, Person B feels like Person A isn't willing to give their sincere question (and effort to connect) a chance, so shuts down and labels Person A as a question-hater. This seems like a huge source of human suffering, and I have been Person A and Person B in different interactions. Does anyone else resonate with this? Do you see things differently?

Your Feed