fi-le.net

Measuring Is Not Enough Anymore

24th of June, 2026

In the following, we'll examine whether measuring general capabilities is a good strategy when trying to reduce existential AI risk. We'll use a few organizations as an example to be more concrete, but the argument applies across the whole field, and the organizations mentioned do much great work overall, as we'll also come back to in the end.

METR is often cited as one of the most successful AI safety organizations. Its stated mission is to "develop scientific methods to assess catastrophic risks stemming from AI systems' autonomous capabilities and enable good decision-making about their development". It produced perhaps the most famous result coming out of all of AI safety, the METR time horizon graph, showing the increase of AI capabilities over time in units of human time to task completion.

Let's try to figure out how METR's mission fits the broader picture, which is making AI not spell the end of humanity (or other bad, less catastrophic things). One useful case study might be: Whose decision-making does the time horizon inform? Certainly, from personal experience, I can say it informs intuitions of AI safety researchers. When designing a safety case that hinges on an AI having or not having the capability to do some bad thing, a time estimate of what general tasks it can do is useful. It also helps with planning ahead what research to do in the coming months and even years. But the impacts were much wider of course. The most influential pieces that cite the time horizon experiment might be:

  1. The New York Times piece by Kevin Roose, on the time horizon plot and METR's work more generally. It explains that some have used it to justify accelerating AI development further: "Techno-optimists have seized on METR's time-horizon chart to claim that artificial general intelligence -- machines capable of doing most of what a skilled human can do -- is close at hand. A.I. safety worriers have used it as evidence that the apocalypse is nigh." The last section touches on risks again, framed in an academic way, for example "An even spookier possibility is that some of today's A.I. models are powerful enough to recognize when they are being tested, and may be altering their behavior accordingly."
  2. An Atlantic piece by Rogé Karma on whether there is an AI bubble economy. The time horizon provides evidence that this is not the case.
  3. Sequoia's Pat Grady and Sonya Huang, the most famous venture capital firm on the planet, writes in January 2026: "If there's one exponential curve to bet on, it's the performance of long-horizon agents. METR has been meticulously tracking AI's ability to complete long-horizon tasks. The rate of progress is exponential, doubling every ~7 months. If we trace out the exponential, agents should be able to work reliably to complete tasks that take human experts a full day by 2028, a full year by 2034, and a full century by 2037." Given that venture capitalists are herd animals, it seems likely that the time horizon graph has caused more venture funding for AI.
  4. (As an honorable mention, influential tweeter and OpenAI employee @roon writes in November 2025: "the METR graph has become a load bearing institution on which our global stock markets depend", to ~800 likes.)

While this is far from an unbiased, comprehensive summary of the coverage of the time horizon graph, and while we cannot take into account non-public information, superficially it seems that the most widely read writing does not emphasize safety more than other writing on frontier AI. If the most famous venture firm is writing a blog post about how your evidence is the most solid basis for increased investment in AI capabilities, that is hard to offset.

AI risk sometimes feels like a classic tragedy-of-the-commons style market failure. Automating labor yields a financial gain for AI companies, investors, and suppliers, but imposes a negative externality on everyone else, in the form of existential risks and job loss. Information like the time horizon graph is currently more useful to AI companies, investors, and suppliers than to the general public. Insofar as AI risk can be described as a power struggle between the general public and the AI industry, the time horizon graph is net-negative. Since many people interact with frontier AI every day, at least until this month, normal people with any interest in engaging with the time horizon plot already have a pretty good sense of what AI can and cannot do. Granted, most will be anchored to the performance of 6 months past.

The best counter-argument I can come up with is that governments do seem to be effectively informed with evidence like the time horizon graph. (See for instance this piece by the Center for AI Policy advocating for "proactive governance and robust safeguards to prevent misuse".) I am not sure whether governments are currently drawing good conclusions from this evidence, but it sounds plausible that governments are better aligned at least with their own citizens than private companies are.

I think the impact of work like the time horizon graph can be modeled as shifting a balance of power in favor of governments, investors, AI companies, and in disfavor of the general public, which might or might not be net-positive.

The time horizon experiment is rightly described as an analogue of Moore's law: by the aforementioned NYT article, by the safety-coded AI digest, and even by METR itself. This is confusing. Moore's law superficially sounds like it just describes a phenomenon (like Newton's Law or Amdahl's law), but it is better summarized as a prescription for how semiconductors ought to progress. Gordon Moore wasn't an analyst or forecaster. He was the founder of a semiconductor company. Take this 1992 talk by Carver Mead, the man who coined the term "Moore's Law":

"After it's happened long enough, people begin to talk about it in retrospect, and in retrospect it's really a curve that goes through some points and so it looks like a physical law and people talk about it that way. But actually if you're living it, which I am, then it doesn't feel like a physical law. It's really a thing about human activity, it's about vision, it's about what you're allowed to believe."

Or Moore in 2000:

"[I]t's become kind of a self-fulfilling prophecy. The industry has generally accepted that this is the rate at which we make progress. Companies recognize if they don't move that fast, they fall behind. So everybody invests to move at least as fast as that curve suggests. It's been amazing we've been able to stay on it."

The analogy from the time horizon experiment to Moore's law is accurate. Work like the time horizon is both predicting and causing the steady increase of AI capabilities, and overall I think it increases existential AI risk, though I'm of course unsure.

I've spent a good amount of time in several general capability evaluations, including making one task in the METR task suite itself. While doing that work, I was convinced that I was doing something helpful. However, with hindsight on the last few years of evaluations work, we have much more information on how general capabilities research gets read, which I found hard to predict. Now that the data is there, we can be more explicit about who an experiment is supposed to inform, and what the target audience will do with the information.

We've used the time horizon graph as the most prominent example of predicting general capabilities, but the problem spans many different organizations: the evaluations teams from the large AI companies, other non-profit evaluation work like Center for AI Safety's, and Epoch AI come to mind. Symmetrically, METR also does much other work that is really great and doesn't have the problems discussed here, for example the recent monitorability evaluations. In fact, almost all of their work seems very helpful, it's basically just this most famous experiment that our argument applies to. What separates the better work from the time horizon experiment is that it is opinionated on how AI development should continue. For example, a monitorability evaluation implicitly says: your AI should be more monitorable.

The people in charge of measuring general capabilities at organizations like the above are highly competent and thoughtful, so it would be especially unfortunate if their talent and ambition to improve the world are wasted. We should be more ambitious, should mold the trajectory of AI development into a more sane one instead of getting a precise measurement of exactly how insane the current trajectory is. Few people can do that, and as the time horizon plot shows, not for long.