Tokenmaxxing: focusing on the trivial rather than the important

Every so often, the tech sector comes up with a new way of confusing the issue. Tokenmaxxing is just the latest chapter in a story we’ve heard before.
In recent months, a term has emerged in Silicon Valley that you may have heard of: tokenmaxxing. The practice involves maximising the consumption of AI tokens (the smallest unit processed by models such as Claude, GPT or Llama) as a sign of productivity. The more tokens you burn, the more cutting-edge, productive or code-generating you are.
Jensen Huang, CEO of Nvidia, was one of the first to say so publicly: he proposed that engineers receive tokens as part of their compensation (a sort of ‘fourth pillar’ alongside salary, bonus and equity) and that spending little would be directly suspect. Meta created internal leaderboards where employees compete for the title of Token Legend. Amazon requires over 80% of its engineers to use AI tools every week and tracks compliance via consumption KPIs.
Just using common sense, this no longer seems like a good idea; as if generating more code or spending more tokens were synonymous with something positive.
And what happened? The classic story of perverse incentives: several Amazon employees created an internal AI agent tool to automate unnecessary tasks, with the sole aim of inflating their token usage. As one of them told the Financial Times: “When they track usage, it creates perverse incentives and there are people who are very competitive about this.”
This is nothing new. We’ve been here before
At first, a developer’s productivity was measured, among other things, by hours worked. More time in the chair, more work done. The result was a generation of managers obsessed with presenteeism and a generation of workers who were experts at looking busy.
Then came lines of code. A seemingly objective, quantifiable, comparable metric. Bill Gates had already warned of how absurd this was: “Measuring software progress by lines of code is like measuring the progress of building an aeroplane by its weight.” And once again we screwed it up: engineers churning out code by the bucketful to look better in the metrics, rather than producing quality, optimal code (or even not producing it at all).
Then came commits. The logic, once again, seemed reasonable: whoever contributes most to the repository is contributing the most. The result was trivial commits, artificial fragmentation of changes and a race to appear active on the GitHub graph. Once again, something that measured NOTHING and produced the same effect all over again.
Now it’s all about tokens. A technical metric that measures the processing power of AI models has come to serve as an indicator of productivity, internal prestige and, in some cases, pay (thankfully, we still have time to change this).
Goodhart’s Law has been explaining this for fifty years: “When a metric becomes a target, it ceases to be a good metric.” People are not lazy or dishonest by nature; they simply respond to the incentives the system creates.
The constant obsession with measuring developers’ individual productivity has always led us to an assessment system that is both dishonest and highly unreliable. Perhaps, and only perhaps, we have spent years obsessed with measuring something impossible.
Could this be interesting?
Of course there is something interesting about tokenmaxxing, beyond the recurring pattern.
The emergence of AI in technical work is proving transformative. There are engineers who, using the right tools, are increasing their delivery capacity in ways that would have seemed impossible two years ago. Actual adoption does matter, and it makes sense for companies to want to understand how it is happening so they can roll it out across the entire organisation.
The problem isn’t measuring AI usage. It’s fine to know how much we’re spending on tokens, of course. The problem is believing that actual adoption is equivalent to token consumption. It’s like counting the number of times you click the mouse.
Using more tokens doesn’t mean adding more value. And every CEO should know this.
And there’s certainly no point in using tokens as a measure of individual performance.
If this is a long-standing issue, why do we find it so hard to change?
At Manfred, we’ve been talking to companies for years about how to assess technical talent. And one of the most recurring conversations is precisely this: the temptation to measure what is easy.
Activity metrics — hours, lines, commits, tokens — are appealing because they are objective, quantifiable and easy to visualise. They create a sense of control, allowing decisions to be compared and justified. It seems that the data doesn’t lie.
The problem is that what really matters in a technical profile is difficult to measure: the quality of judgement, the ability to solve complex problems, and the real impact on the business. Even things that are far more complex to measure, such as the ability to decide when not to build something, when not to use a tool, or when the correct answer is not to add more code.
None of this is quantifiable and is therefore difficult to compare; it has no place on a dashboard, which is why we tend to ignore it and replace it with metrics such as lines of code or tokens.
Are we looking in the right place?
It’s tempting to create the umpteenth framework that tracks our developers’ activity. Metrics based on tokens, hours, commits, PRs or anything else that can be measured. We always bear in mind that saying: “What gets measured gets managed”. And it’s absolutely true—we need to measure what gets delivered—but perhaps we’re focusing our attention in the wrong place.
This constant digital Taylorism, which attempts to reduce intellectual work to something measurable, as if it were a matter of making screws, clashes with the fact that building software is never an individual endeavour.
Great products are created by teams made up of diverse people working in different areas. Teams complement and feed off one another.
Within a team, there will be people who generate more code and others who generate less; some who do not produce anything directly, but who are taking on other equally critical roles: coordination, technical decisions, problem prevention, system simplification. The person who convinces the team not to build an unnecessary feature may be the one who has contributed the most value that quarter. They have not a single token to show for it, nor thousands of lines of code to quantify.
An engineer who uses 200,000 tokens a week might be solving business-critical problems, or they might be carrying out unnecessary tasks just to look productive. One who uses 5,000 might be writing the cleanest and most valuable code on the team, or they might be resisting the adoption of tools that would make them more effective.
The number tells you nothing at an individual level. And that is why we remain stuck in this endless loop of individualistic metrics.
Activity vs. impact
There is a distinction that seems obvious, but is very difficult to apply in practice: the difference between measuring what someone does and measuring what someone achieves.
Activity metrics measure movement: tokens consumed, lines written, hours spent, commits made. They are quantifiable and comparable.
Impact metrics measure results: how long does it take us to deliver value to the user? Are we solving the problems that matter to the user? Is the system more robust, more maintainable, cheaper to operate than it was six months ago? These questions are harder to answer. They require discussion, judgement and someone willing to take a step back.
It seems we always come back to the same conversation: does the team deliver value on time and as required? Have we improved the time it takes us to deliver? Is what we deliver being used, is it useful and/or does it generate revenue for the company? Are there fewer software crashes? Are there fewer errors and bugs?
These metrics are not individual. Because software development isn’t either.
That’s why, if we truly want to apply metrics that make sense, the best approach is to step back and look at the bigger picture: to focus more on team performance and less on individual performance. To view teams as units whose collective output is indeed more measurable than the sum of their members’ individual metrics.
From x100 Dev to x100 Team
Measuring activity rather than impact leads us to the latest trend: the promise of x100 Dev: a single engineer, armed with the right AI tools, capable of doing the work of a hundred. Some founders are even using it as an argument for reducing headcount. And it’s not that it isn’t true; we’ve discussed this before, but we’re falling into the same trap again.
The promise of the x100 Dev makes exactly the same mistake as tokenmaxxing: it assumes that the value of software is code produced. And if we have learnt anything from decades of building technical teams, it is that this is not the case.
Software is built by teams. And teams are not simply the sum of their most productive individual members. They are the result of how they communicate, how they make decisions under uncertainty, how they manage conflict, how they align technical criteria with business needs, how they integrate new members, and how they maintain momentum when things get complicated. None of that is solved by a more productive engineer. None of that is solved by AI.
Perhaps we should shift the conversation towards the concept of the x100 Team: using AI to multiply the team’s capacity as a unit, reducing operational friction, automating tasks that do not require human judgement, and freeing up time for those that do.
Teams in the Age of AI
‘Tokenmaxxing’, beyond the Silicon Valley joke and curious anecdote, is a symptom: a tendency to measure people’s value by their direct, quantifiable output, and to eliminate everything that doesn’t appear in any KPI, everything that seems not to add direct value.
We are already seeing this idea taken to the extreme in many corporate messages and strategies: eliminating layers of coordination and alignment because they ‘do not produce directly’. Middle managers are the first to bear the brunt. There are voices advocating for teams made up solely of ICs, with AI taking on coordination functions. The argument sounds efficient. In practice, it is tokenmaxxing applied to a company’s organisational chart. An obsession with measuring individual productivity.
It is undeniable that AI has changed the way a software team works. It has also changed the way a company recruits. And it is changing the way a company is structured.
But we shouldn’t revert to the endless obsession with measuring individual productivity. Instead, we should continue to emphasise these ‘x100 teams’. AI shouldn’t be the lever that eliminates the need to build good teams, but rather the one that multiplies what a good team can achieve.
Once the hype dies down and the conversation returns to the realm of common sense, we’ll hear more talk of x100 Teams: teams where trust, coordination and shared judgement make what they build together impossible to replicate individually. With or without AI.