Google says it’s looking to help its enterprise customers use fewer tokens when using AI agents.
At I/O 2026, its developer conference on Tuesday, the tech giant rolled out a new enterprise AI model, Gemini 3.5 Flash, which is aimed directly at the cost problem.
Google also introduced Gemini Omni, a new multimodal model, and new agents, including Daily Brief and Gemini Spark. The cloud vendor also revamped its Antigravity agentic AI platform.
With these moves, Google sought to reassert its prominence in the highly competitive generative AI market and capitalize on its position as one of the few model makers to have successfully monetized its AI offerings, unlike competitors such as Anthropic and OpenAI.
Gemini Flash 3.5 homes in on one of the major hurdles enterprises face when using AI agents. As the use of AI agents grows, enterprises are becoming more sensitive to the number of tokens required to power them. With enterprises sensitive to token costs, vendors are catering to enterprise needs by lowering costs. For example, Anthropic’s new Claude Opus 4.6 model reduced costs to $5 per million input tokens and $25 per million output tokens, a significant drop from $15 per million input tokens and $75 per million output tokens. Gemini Flash 3.5 follows that trend, with Google touting the model as the least expensive to use proprietary frontier model, costing $1.50 per million tokens.
“We’ve heard that many companies are already blowing through their annual token budgets, and it’s only May,” said Sundar Pichai, Google CEO, during a conference keynote. “If companies used a mix of Flash and other frontier models, they could save a lot of money.”
A Needed Move
That Google is slashing token costs is a big move, said Mark Beccue, an analyst at Omdia, a division of Informa TechTarget.
“Thinking cheaper, faster, better,” Beccue said. “[This] has been this logical progression over time, particularly for Google on these models.”
Not only has paying attention to token cost sensitivity been a natural progression, but it has also been something that Google, Anthropic and other generative AI vendors have been forced to do.
“With token anxiety being what it is right now in the enterprise, and how much customers are realizing that their expense is really unbridled right now … there is some pressure on the model makers like Google, Anthropic … to give you a much more cost-effective solution,” said Bradley Shimmin, an analyst at Futurum Group.
The focus on cost is much needed because most non-proprietary models, such as Alibaba Qwen and other open source models, boast similar performance to frontier models, he added. There is not much that differentiates them, barring specialization of some sort.
“When you’re just talking about raw AI intelligence … the outcome is not that different, and the barrier to entry and exit can be quite low,” Shimmin said.
Other Models
Google revealed a new specialized model, Gemini Omni Flash, which it launched across all products. Omni Flash can generate outputs in any modality from any input, whether text, images or video.
Google is also using Gemini 3.5 to push its latest agentic AI applications, including Gemini Spark, a new personal AI agent that Google said works on behalf of users and developers. Spark runs on dedicated machines in Google Cloud and works across Gmail, Docs, Sheets, Slides and third-party tools using the Model Context Protocol standard. It can compile information across emails, chats and documents, manage RSVPs with live updating spreadsheets and send follow-up reminders automatically.
“It’s your personal AI agent that helps you navigate your digital life, taking action on your behalf and under your direction,” Pichai said, joking that even though Spark runs all day, users can close their laptops.
The new agent is Google’s response to OpenClaw, the popular open source agent framework that is now part of OpenAI. Like OpenClaw, Gemini Spark can run in the background and comes with built-in permission to Google Workspace without an API. It is also cost-efficient because Gemini 3.5 is already deeply integrated, so there is no need to incur deep token costs by accessing APIs like with Claude 3.5 Sonnet, according to Google
“I have a feeling that when they made Spark, it was with this model in mind, given how Spark purports to work,” Shimmin said, referring to Claude 3.5 Sonnett.
Gemini 3.5 Flash also powers Google’s AntiGravity 2.0, a revamp of the vendor’s agentic development platform. AntiGravity capabilities include a fully agent-first desktop application. The tech giant said the platform built a complete OS from scratch in 12 hours and consumed less than $1,000 in API credits.
The Price of Competition
With all its new tools and applications, Google is directly showing how agentic AI does more than respond to queries; it works on behalf of users. In Search, Google launched a new intelligent search box. New information agents that monitor specific topics and scan across websites and social media will launch this summer.
“They’re really showing us that using AI for personal pursuit in the enterprise is a full continuum … whether you’re using it in the browser, whether you’re using it in a native app,” Shimmin said.
Despite these advances, though, Google is responding to competitors, and each new product release is a response to another vendor, said Svetlana Sicular, an analyst at Gartner.
“We’re in the middle of companies trying to outdo each other,” Sicular said. “The … competition is overheated.”
The dramatic rivalry among the top generative AI vendors, though, can affect enterprises because they must constantly update their software stacks with each new model update.
“That rapidity of innovation can make it hard for companies to really sort of standardize beyond advanced proof of concepts, unless they’re willing to spend the time and energy to keep monitoring and keep revising and testing and governing these living systems,” Shimmin said.

