8 billion tokens a day forced AT&T to rethink AI orchestration — and cut costs by 90%

February 25, 2026 at 09:44 PM

When your average daily token usage is 8 billion a day, you have a massive scale problem. This was the case at AT&T, and chief data officer Andy Markus and his team recognized that it simply wasn’t feasible (or economical) to push everything through large reasoning models. So, when building out an internal Ask AT&T personal assistant, they reconstructed the orchestration layer. The result: A multi-agent stack built on LangChain where large language model “super agents” direct smaller, underlying “worker” agents performing more concise, purpose-driven work. This flexible orchestration layer has dramatically improved latency, speed and response times, Markus told VentureBeat. Most notably, his team has seen up to 90% cost savings. “I believe the future of agentic AI is many, many, many small language models (SLMs),” he said. “We find small language models to be just about as accurate, if not as accurate, as a large language model on a given domain area.”Most recently, Markus and his team used this re-architected stack along with Microsoft Azure to build and deploy Ask AT&T Workflows, a graphical drag-and-drop agent builder for employees to automate tasks. The agents pull from a suite of proprietary AT&T tools that handle document processing, natural language-to-SQL conversion, and image analysis. “As the workflow is executed, it's AT&T’s data that's really driving the decisions,” Markus said. Rather than asking general questions, “we're asking questions of our data, and we bring our data to bear to make sure it focuses on our information as it makes decisions.” Still, a human always oversees the “chain reaction” of agents. All agent actions are logged, data is isolated throughout the process, and role-based access is enforced when agents pass workloads off to one another. “Things do happen autonomously, but the human on the loop still provides a check and balance of the entire process,” Markus said.Not overbuilding, using ‘interchangeable and selectable’ modelsAT&T doesn’t take a "build everything from scratch" mindset, Markus noted; it’s more relying on models that are “interchangeable and selectable” and “never rebuilding a commodity.” As functionality matures across the industry, they’ll deprecate homegrown tools in lieu of off the shelf options, he explained. “Because in this space, things change every week, if we're lucky, sometimes multiple times a week,” he said. “We need to be able to pilot, plug in and plug out different components.” They do “really rigorous” evaluations of available options as well as their own; for instance, their Ask Data with Relational Knowledge Graph has topped the Spider 2.0 text to SQL accuracy leaderboard, and other tools have scored highly on the BERT SQL benchmark. In the case of homegrown agentic tools, his team uses LangChain as a core framework, fine-tunes models with standard retrieval-augmented generation (RAG) and other in-house algorithms, and partners closely with Microsoft, using the tech giant’s search functionality for their vector store. Ultimately, though, it’s important not to just fuse agentic AI or other advanced tools into everything for the sake of it, Markus advised. “Sometimes we over complicate things,” he said. “Sometimes I've seen a solution over engineered.” Instead, builders should ask themselves whether a given tool actually needs to be agentic. This could include questions like: What accuracy level could be achieved if it was a simpler, single-turn generative solution? How could they break it down into smaller pieces where each piece could be delivered “way more accurately”?, as Markus put it. Accuracy, cost and tool responsiveness should be core principles. “Even as the solutions have gotten more complicated, those three pretty basic principles still give us a lot of direction,” he said. How 100,000 employees are actually using itAsk AT&T Workflows has been rolled out to 100,000-plus employees. More than half say they use it every day, and active adopters report productivity gains as high as 90%, Markus said. “We're looking at, are they using the system repeatedly? Because stickiness is a good indicator of success,” he said. The agent builder offers “two journeys” for employees. One is pro-code, where users can program Python behind the scenes, dictating rules for how agents should work. The other is no-code, featuring a drag-and-drop visual interface for a “pretty light user experience,” Markus said. Interestingly, even proficient users are gravitating toward the latter option. At a recent hackathon geared to a technical audience, participants were given a choice of both, and more than half chose low code. “This was a surprise to us, because these people were all very competent in the programming aspect,” Markus said. Employees are using agents across a variety of functions; for instance, a network engineer may build a series of them to address alerts and reconnect customers when they lose connectivity. In this scenario, one agent can correlate telemetry to identify the network issue and its location, pull change logs and check for known issues. Then, it can open a trouble ticket. Another agent could then come up with ways to solve the issue and even write new code to patch it. Once the problem is resolved, a third agent can then write up a summary with preventative measures for the future. “The [human] engineer would watch over all of it, making sure the agents are performing as expected and taking the right actions,” Markus said. AI-fueled coding is the futureThat same engineering discipline — breaking work into smaller, purpose-built pieces — is now reshaping how AT&T writes code itself, through what Markus calls "AI-fueled coding." He compared the process to RAG; devs use agile coding methods in an integrated development environment (IDE) along with “function-specific” build archetypes that dictates how code should interact. The output is not loose code; the code is “very close to production grade,” and could reach that quality in one turn. “We've all worked with vibe coding, where we have an agentic kind of code editor,” Markus noted. But AI-fueled coding “eliminates a lot of the back and forth iterations that you might see in vibe coding.” He sees this coding technique as “tangibly redefining” the software development cycle, ultimately shortening development timelines and increasing output of production-grade code. Non-technical teams can also get in on the action, using plain language prompts to build software prototypes. His team, for instance, has used the technique to build an internal curated data product in 20 minutes; without AI, building it would have taken six weeks. “We develop software with it, modify software with it, do data science with it, do data analytics with it, do data engineering with it,” Markus said. “So it's a game changer.”

💡Analysis & Context

When your average daily token usage is 8 billion a day, you have a massive scale problem When your average daily token usage is 8 billion a day, you have a massive scale problem. This was the case at AT&T, and chief data officer Andy Mar Monitor developments in 8 for further updates.

📋 Quick Summary

When your average daily token usage is 8 billion a day, you have a massive scale problem This was th

Help us improve this article. Share your feedback and suggestions.

8 billion tokens a day forced AT&T to rethink AI orchestration — and cut costs by 90%

💡Analysis & Context

📋 Quick Summary

Related Articles

9to5Mac Daily: February 25, 2026 – Touchscreen MacBook Pro rumors, Apple Card latest

Revolut Ex-Employee Allegedly Tried to Extort a Customer for Crypto Ransom

Discord Is Delaying Age Checks Following User Backlash

Love retro speakers? You need to see these Marshall and JBL deals!

Big policy change coming to Amazon Wish Lists

You can now use Nano Banana to guide video generation in Google’s creative AI studio

Cookie Consent