As AI adoption grows, organizations have unprecedented visibility into how AI is being used. They can see token consumption, model usage, prompt volumes and adoption rates in real time. These metrics are increasingly being mistaken as indicators of success. In some cases, organizations have even introduced token leaderboards to encourage AI adoption.
However, consumption reveals little about outcomes. An engineer might consume thousands of tokens, but without attribution to a feature, fix or business objective, it’s impossible to know whether that spend created value. Finance teams see costs, engineering teams see usage, but neither can easily connect the two to what was ultimately achieved.
The problem is compounded by the fact that many engineering measurement frameworks predate AI. Traditional delivery metrics remain useful, but they were never designed to capture the difference that coding assistants, autonomous agents or AI-powered workflows make. In fact, research found that 94% of engineering leaders believe the metrics that matter most are missing from their current measurement frameworks.
The result is a growing attribution gap. Organizations can see what AI costs, but not what it creates.
The Hidden Cost of AI-Driven Software Delivery
To address this gap, organizations must move beyond use-based metrics toward a more complete view of AI-driven delivery cost and value creation. AI use and provider spend capture a narrow part of the picture. A coding assistant might generate code in seconds, but the real cost of delivery also includes the infrastructure required to build, test, secure and deploy it, as well as the time spent validating outputs and resolving issues downstream. Looking at tokens alone risks obscuring where value is created — or lost.
Meanwhile, each AI-assisted task carries a different cost profile across the different layers. A frontier model might be justifiable to use for complex architectural decisions or highly specialized development work, but it might provide little additional value for routine activities such as summarizing logs, generating documentation or creating boilerplate code. In some cases, AI might not be the most efficient option. Understanding when to use AI, which model to use and when alternative approaches are more cost-effective is becoming a critical part of software delivery economics.
This isn’t the first time engineering teams have had to rethink costs. Not that long ago, cloud computing pushed developers from broad infrastructure budgets toward more granular unit economics. AI is creating a similar shift.
From AI Use to End-to-End Delivery Traceability
A delivery model that actively ties AI spend to engineering and productivity outcomes will help organizations better measure their ROI. That starts with connecting AI activity directly to the work it produces. Token spend, prompts, sessions and generated code need to be automatically attributed to the developers, teams, repositories and business units responsible for shipping it. Without that level of traceability, AI use remains visible in isolation, but disconnected from outcomes.
The same level of automation is required in the processes engineering leaders use to identify inefficiency in their workflows. Organizations need to be able to see when AI spend is going toward code that never ships, when expensive models are being used where lighter ones would suffice, and when prompts and workflows are generating unnecessary cost without improving delivery. Without this insight, inefficiencies are only visible after they have already accumulated.
That visibility also needs to extend from code generation through to production. AI-generated work should be tracked end-to-end — from prompt to pull request to deployment — with delivery metrics such as ship rate, PR cycle time and DORA indicators correlated against incident and quality data. Only then can organizations understand whether AI is improving software delivery or introducing new forms of rework.
Across all of this, organizations need a consistent baseline for benchmarking adoption, efficiency and impact across teams. Without that, increases in AI use risk being interpreted as progress, even when they do not translate into improved performance.
Better Economics Lead to Better Decisions
When they can connect AI use to delivery outcomes, organizations start understanding value. A true cost model doesn’t just show where money is being spent — it reveals which work is driving impact.
Organizations that get there first won’t just reduce AI spend, they’ll improve how they decide what to build in the first place. Governance and cost awareness become embedded in the workflow, making AI use context-aware — applied when it improves delivery, avoided when it doesn’t, and balanced against simpler or cheaper alternatives that would drive better value. In that environment, AI stops being a default and becomes a tool used deliberately to ship software that makes a difference.

