Greetings, and a warm welcome to *Decoder*! I’m Hayden Field, serving as your host for today’s episode. I am ordinarily a senior reporter at *The Verge*, specializing in coverage of artificial intelligence, but in this instance I am stepping in as a guest host for the Thursday edition of the show. Over the course of the next several episodes, I’ll be filling in for Nilay Patel, and during this time I am particularly eager to explore the vast landscape of the AI industry—the promising innovations, the unsettling drawbacks, and the gray areas of uncertainty that continue to spark debate across the field.

In today’s conversation, I am joined by David Hershey, who heads the applied AI team at Anthropic. His work encompasses both practical and experimental responsibilities: not only does he routinely collaborate with startups to determine the most strategic and effective ways they can incorporate Anthropic’s technologies, but he also regularly stress-tests new AI models in order to better understand both their strengths and their inherent limitations.

The reason I wanted to bring David onto the show now is because Anthropic has just this week introduced a brand-new model, Claude Sonnet 4.5. This release has generated significant attention within the AI community. For anyone who may need a point of comparison, Claude functions for Anthropic roughly as ChatGPT does for OpenAI—it is their flagship conversational AI. The Sonnet 4.5 update, however, has been positioned as a particularly notable advancement, one that emphasizes autonomous or so-called “agentic” AI behavior, especially when it comes to performing programming tasks at scale.

To explain further: these agent-style models are designed not simply to respond to prompts in isolated interactions, but to independently take on long-term, multistep assignments. The notion is that you might assign a model an intricate project—such as constructing a software application entirely from the ground up—and then allow it to pursue that objective over the course of many continuous hours or even multiple days, with minimal to zero human supervision in between. According to Anthropic, Sonnet 4.5 has already demonstrated the capacity to sustain consistent work on a single task for as long as thirty hours straight without human guidance—a potentially transformative step in AI development if it proves scalable and reliable.

For context, the past year has seen companies such as Anthropic, Microsoft, and OpenAI put forth the vision that agentic technology will constitute the next major leap beyond the general-purpose chatbot format. Within this narrative, agents are framed as the feature that could finally unlock generative AI’s much-touted potential to deliver substantial productivity enhancements. There has indeed been measurable progress in this direction, but the reality is that, as of now, these systems are still in their formative stages. Consumers are not yet in the habit of dispatching agents to rove across the internet unassisted in pursuit of complex goals, nor are they routinely entrusting AI with day-long tasks that require utter autonomy from human oversight. In short, the promise of truly self-directed, hardworking AI agents remains more aspiration than everyday reality—for the moment, at least.

Nonetheless, tech companies large and small continue to invest heavily in the belief that such agents will become central in reshaping workflows, serving as pivotal tools to amplify human productivity or, in some cases, to substitute for human labor altogether. This conviction in agents as the next turning point explains the urgency and enthusiasm surrounding models like Sonnet 4.5.

That is why I wanted to speak with David specifically. Since he dedicates much of his time to precisely this kind of hands-on experimentation—pushing models to their limits in order to diagram where they excel and where they fall short—he is in a unique position to shed light on the actual current capabilities of agentic systems. My aim in this dialogue was to probe into the open questions about what these models are genuinely good at from a consumer’s perspective beyond just their promise in the coding domain, as well as to explore the trajectory for the next stage in their development.

For those interested in diving deeper into the subjects we discussed during this episode, I recommend exploring the following related reporting from *The Verge*: articles on Anthropic’s release of Claude Sonnet 4.5, the new instant-purchase button built into ChatGPT, OpenAI’s attempt to position its service as a daily-use product via ChatGPT Pulse, and even an example of Claude being applied for recreational purposes with Pokémon. You can also find critical perspectives examining why AI agents are still more science fiction than daily practicality, as well as analyses of how industry leaders increasingly see agents as crucial not only for innovation but for future profitability. And as a final industry note, even Amazon is placing its competitive bets on agentic AI as a pathway forward.

As always, if you have comments, questions, or viewpoints you’d like to share regarding this episode, we welcome your email at decoder@theverge.com—we genuinely read every single one. *Decoder with Nilay Patel* is a podcast created by *The Verge* that takes on big ideas and complicated challenges, and we invite you to subscribe and continue following along as these conversations evolve.

Stay tuned for more in-depth discussions, and keep connected by following the topics and authors highlighted above so that your personalized feed remains filled with the most relevant updates on this rapidly moving world of technology and artificial intelligence.

Sourse: https://www.theverge.com/podcast/789772/ai-agents-anthropic-claude-sonnet-autonomous-coding