Amazon is strategically depending on its proprietary artificial intelligence chips to drive what it hopes will become the next significant and profitable phase of its business growth. Yet, according to an internal corporate memo, the technology conglomerate’s cloud division remains substantially behind Nvidia, whose graphics processing units (GPUs) continue to dominate the industry. The internal report revealed that Cohere, an AI startup specializing in large language models, determined that Amazon’s Trainium 1 and Trainium 2 chips were performing below expectations when compared with Nvidia’s high-end H100 GPUs. The findings appeared in a confidential July document obtained by Business Insider. Cohere’s assessment also noted that access to the second-generation Trainium chips was severely constrained and that the system endured persistent service interruptions, both of which hampered reliability and availability for customers.

The document added that these so-called “performance challenges” were still being evaluated by Amazon and its internal semiconductor division, Annapurna Labs. Despite ongoing investigations, progress in resolving the reported deficiencies remained limited, highlighting persistent technical and operational bottlenecks. Stability AI, another prominent startup recognized for its AI-driven image generation technology, expressed nearly identical reservations. It concluded that Trainium 2 delivered slower responsiveness or higher latency than Nvidia’s H100, diminishing its competitiveness in terms of both processing speed and overall cost-effectiveness, the document warned.

Amazon’s development of the Trainium line of chips is a cornerstone of its broader strategy to remain at the forefront of the rapidly evolving AI cloud computing sector. This effort mirrors the company’s earlier success with Amazon Web Services (AWS), whose profitability historically relied on designing and manufacturing in-house data center silicon rather than continuously paying high acquisition prices to legacy suppliers such as Intel. Now, amid the generative-AI revolution, Amazon is aiming to replicate that earlier cost-saving advantage by substituting Nvidia’s expensive GPUs with its own Trainium chips, enabling it to deliver powerful AI infrastructure to clients while protecting margins.

However, the company faces a fundamental risk: if AWS customers continue to demand that their AI workloads be hosted on Nvidia hardware instead of Amazon’s homegrown processors, AWS’s potential profit margins will erode because the division will need to cover the considerably higher costs associated with sourcing GPUs externally. The private report’s references to customer dissatisfaction underscore the substantial obstacles Amazon confronts—not merely in matching Nvidia’s raw processing capabilities, but also in securing financially viable AI workloads at scale. These performance and adoption challenges further reflect AWS’s difficulties in maintaining enthusiasm among startups, a customer category that historically served as one of its strongest anchors within the cloud ecosystem.

In response, an Amazon spokesperson emphasized the company’s appreciation for client input, framing such feedback as an essential mechanism for refining and enhancing its chip offerings to ensure wider adoption across the market. The representative clarified that Cohere’s critiques referred to earlier circumstances that were “not current,” while asserting that both Trainium and the related Inferentia chip families have achieved notable results for clients including Ricoh, Datadog, and Metagenomi. The spokesperson further highlighted what Amazon views as encouraging momentum behind Trainium 2’s growth, although its deployment presently remains concentrated among a limited number of large-scale clients such as Anthropic—an influential AI research firm.

According to AWS, its internally designed AI chips already provide between 30% and 40% better price-to-performance ratios relative to the latest competing GPU offerings. Executives frequently point to the exceptional depth of Amazon’s internal engineering expertise in chip design and confirm that successive generations of these components are already under development. The company anticipates broader customer access with the impending release of Trainium 3, expected to be previewed later this year. In the company’s own words, progress will continue by “listening to our customers” and maintaining a culture that values constructive self-criticism—an attitude Amazon credits for its enduring capacity to innovate and to produce superior technology over time.

During a recent earnings call, Amazon’s chief executive officer Andy Jassy reported that the Trainium 2 lineup is now fully subscribed and generates revenue in the multibillion-dollar range, underscoring its commercial relevance despite operational hiccups. However, requests for comment from cohorts like Cohere and Stability AI reportedly went unanswered.

Additional customers, the internal July document notes, have voiced related frustrations. Typhoon, another startup, found that Nvidia’s older A100 GPUs were up to three times more cost-efficient than AWS’s Inferentia 2 chips across certain types of computations. Similarly, research organization AI Singapore concluded that Amazon’s G6 servers equipped with Nvidia GPUs offered superior cost performance when benchmarked against Inferentia 2 over several distinct use cases. As detailed in prior reports, this pattern of difficulty in adopting Amazon’s proprietary hardware has generated friction points that resulted in lower utilization rates than anticipated. The trend aligns with market data from Omdia, which places Nvidia’s market share at a commanding 78%, followed by Google and AMD with just over 4% each, and AWS’s custom chips trailing in sixth position with roughly 2% of the market.

The gap between Amazon’s ambitions and the industry’s realities is particularly evident in its new $38 billion partnership with OpenAI. Despite the scale of this deal, the joint servers employed for AI training and inference will rely exclusively on Nvidia GPUs, with no use of Trainium processors. Analysts from Mizuho noted that the absence of Trainium in such a high-profile partnership could be interpreted as a disappointing indicator, though they also acknowledged the practicality of OpenAI’s decision. Nvidia’s silicon not only delivers consistently higher performance metrics but is also backed by CUDA, a well-established software ecosystem familiar to countless developers. This combination of technical reliability and existing community expertise makes Nvidia the de facto standard for teams constructing complex, high-stakes AI systems.

Within Amazon, the internal July discussions revealed an explicit acknowledgment that the technical limitations and comparative deficiencies of its chips have evolved into “critical blockers” for customers contemplating transitions away from Nvidia. Financial analysts have echoed this concern. Bank of America, for example, issued cautious statements last month, remarking that investor confidence in Trainium’s potential remains tentative and noting that it’s uncertain whether demand will expand beyond current major customers such as Anthropic.

Anthropic represents Amazon’s most visible success story to date for Trainium’s deployment. This AI research lab, recognized for building the Claude language models, has become a crucial proving ground for Trainium performance. Amazon has initiated Project Rainier, a massive data center installation comprising half a million Trainium chips reserved exclusively for training Anthropic’s forthcoming generation of models. The startup reportedly aims to scale its use of Trainium 2 chips to more than one million units by the year’s end. Because Anthropic is considered one of the few laboratories capable of rivaling OpenAI in advanced model research, its ability to run efficiently on Trainium hardware could significantly reinforce Amazon’s strategy—though conclusive results have yet to emerge.

Recent developments complicate this narrative: Anthropic expanded its partnership with Google to incorporate that company’s Tensor Processing Units (TPUs), prompting Amazon’s share price to dip temporarily. While Anthropic insisted it would continue employing Trainium, the move highlighted the practical complexity of using multiple chip architectures from competing providers. In a September blog post, the startup openly described the difficulties it experienced managing such hybrid systems and the resulting service outages.

Amazon, responding to inquiries, reaffirmed via Business Insider that Anthropic remains committed to broadening its Trainium usage. The company reiterated that its guiding principle is customer choice: rather than attempting to displace Nvidia entirely, AWS intends to offer a diverse portfolio of hardware alternatives within its cloud computing platform. Jassy restated this philosophy during his earnings call, explaining that the long-term history of AWS demonstrates that no single provider can dominate every market niche indefinitely. Instead, the goal is to continuously supply clients with the variety and flexibility necessary to address different computational needs.

Following that call, Amazon’s stock rose as investors responded positively to its revenue results, which showed AWS income growing by 20% year-over-year to reach $33 billion last quarter—the fastest growth since 2022, even though Microsoft and Google Cloud still outpace AWS in percentage expansion. In essence, the increasingly competitive AI hardware landscape is pushing even the largest cloud providers to balance narrative, performance, and economics as they strive to secure their place in an industry undergoing unprecedented transformation.

Sourse: https://www.businessinsider.com/startups-amazon-ai-chips-less-competitive-nvidia-gpus-trainium-aws-2025-11