There is a quiet rule about measurement systems that almost no one names out loud: a metric becomes load-bearing the moment someone’s compensation, status, or budget depends on it. Once that happens, the metric stops being a measurement and starts being a contract. People will optimize the contract. They are not cheating. They are doing exactly what the system asked them to do.
The current AI-rollout era is producing the cleanest example of this dynamic the technology industry has seen in a decade.
In April 2026, Business Insider reported that Disney and ESPN’s product and technology staff had collectively used 3.1 billion Claude tokens and 13.3 billion Cursor tokens over nine working days. One employee invoked Claude roughly 460,600 times in that span: more than 51,000 invocations per workday; or, if you assume a fourteen-hour day with no breaks, more than one invocation per second. This is not a person using a tool. This is a script.
The pattern is not isolated. The Information reported in the same window that a Meta employee had independently built a leaderboard called “Claudeonomics” that ranked the company’s eighty-five thousand employees by token usage, awarding badges like “Token Legend” and “Cache Wizard”. The top-ranked individual averaged 281 billion tokens over thirty days. Meta took the unofficial leaderboard down two days after the story broke, but kept its separate, official token-usage dashboard for engineers. OpenAI runs its own employee leaderboard; the top user there logged 210 billion tokens in a single week in March. Google made AI usage part of its formal performance review process and issued weekly usage quotas to sales staff. In June 2025, Microsoft’s developer-division VP Julia Liuson sent an internal memo stating that AI use was “no longer optional” and directing managers to factor it into performance evaluations. Across the sector, are you using AI was elevated from a question to a metric, and a metric is a contract.
The result was tokenmaxxing, a coinage that did not exist eighteen months ago and is now the name of a global phenomenon. Token volume, measured at the dashboard, decoupled almost entirely from work output, measured at the deliverable. Jellyfish, analyzing roughly seven and a half thousand developers for whom they could join token usage to pull-request activity, found that the median developer used about seven million tokens per PR while top-decile developers used roughly sixty-nine million; nearly ten times more tokens for about twice the throughput. The leaderboard rewarded the volume. The leaderboard did not measure the multiplier.
This is not a story about lazy employees. It is a story about three measurement regimes that mutually reinforced each other in ways no individual party had to design.
The first regime was the employer’s. AI adoption became a board-level commitment in 2025, and boards committed to commitments measure them. Are people using AI? is a hard question. How many tokens did people consume? is an easy one. The first requires judgment about value. The second requires only a counter. Organizations that needed to demonstrate adoption to their boards reached for the counter, exactly as they were going to.
The second regime was the vendor’s. Through most of 2024 and 2025, AI coding tools were sold below cost: Cursor, GitHub Copilot, and Claude Code all ran on flat-rate subscriptions that effectively subsidized usage. The Register later noted that Anthropic’s $200 Max plan had been giving subscribers “several thousand dollars worth of API-rate credits if you actually push it”. A vendor losing money on every token has no commercial interest in surfacing the metric that would slow consumption: namely, the cost per actual unit of shipped work. The vendors built the token counters and shipped the dashboards. They did not build the value counters, because the value counters would have argued against the subscription. Then the economics broke. Cursor moved from request-based to usage-based billing in June 2025. Claude Code introduced weekly caps on August 28 2025. Anthropic began transitioning enterprise customers to usage-based plans in November 2025. The instrument changed because the incentive changed.
The third regime was the employee’s. In a weak knowledge-work labor market with frequent layoffs, a measurable signal of productivity is a survival instrument. Token volume is measurable. Code quality is not, or at least not on a weekly cadence. Faced with a dashboard their employer was watching, employees did the rational thing: they fed the dashboard. The most extreme cases (fifty-one thousand invocations per workday) required automation. The median case required only a slight shift in working style toward token-generating behaviors and away from token-light ones.
None of these three parties needed to coordinate. The system selected for tokenmaxxing all on its own. The dashboard paid for itself, in the sense that every party to the dashboard had something to gain from its continued existence and almost nothing to gain from questioning it.
Field Note № 06 made the point that what an organization refuses to measure is itself a measurement, a statement of which questions are dangerous. This is the inverse: what an organization chooses to measure with conviction, especially when the metric is suspiciously easy and the alternatives are suspiciously hard, is also a statement. Often, the statement is we agreed not to ask the harder question.
There is a Goodhart formulation of this (when a measure becomes a target, it ceases to be a good measure), and it is correct but underspecified. The full version is that a measure becomes a target whenever someone’s economic interest is served by treating it as one, and the parties most likely to push for it are the ones the measure flatters. Tokenmaxxing flatters three parties simultaneously: the board, the vendor, and the employee. The party it does not flatter (the shareholder, the customer, the eventual user of the unreviewed code) is not in the room when the dashboard is designed.
For organizations in the DACH Bio sector, the relevance is not the specific case. It is the pattern. Wherever you have a metric that is easy to count, prominently displayed, and tied to budget or recognition, ask three questions. Who benefits if this number goes up regardless of whether it represents work. Who is paying for the measurement instrument, and what would happen to their economics if a better instrument existed. And which party at the table is the one the metric does not flatter; because that party’s silence, more than anyone’s enthusiasm, is the diagnostic.
The instruments at this Observatory are built on the inverse premise. A measurement that flatters no one is harder to build, slower to operate, and less popular. It is also the only kind that survives contact with reality.
A dashboard that pays for itself is paying for something. It is worth asking what.
Apparatus: The Disney/ESPN token-usage figures are from Hugh Langley, Business Insider, April 2026; the Meta “Claudeonomics” leaderboard from Sylvia Varnham O’Regan, The Information, April 2026, with the OpenAI and Meta-shutdown detail in Beatrice Nolan, Fortune, April 2026. Google’s performance-review tie-in is Langley, Business Insider, February 2026; the Microsoft “no longer optional” memo (Julia Liuson) is Ashley Stewart, Business Insider, June 2025. The tokens-per-PR multiplier is Jellyfish Research, April 2026 (roughly 7,500 developers joined to pull-request activity; about seven million tokens per PR at the median against sixty-nine million at the top decile). The pricing-model shifts (Cursor to usage-based, June 2025; Claude Code weekly caps, August 2025; Anthropic enterprise usage-based, November 2025) are reported by The Register, April 2026, and Vantage. The measurement axiom is Goodhart (1975), in Marilyn Strathern’s 1997 phrasing. Field Note № 06 is its inverse partner: № 06 is what an organization refuses to measure; № 07 is what it measures with conviction to avoid the harder question.