The once-essential practice of basic data labeling—the type of work that revolved around tagging vast quantities of images, categorizing snippets of text, or organizing straightforward pieces of information—is rapidly fading into obsolescence. This transformation was underscored by Jonathan Siddharth, the Chief Executive Officer of Turing, a company now valued at an impressive $2.2 billion and recognized as a leader within the AI training industry. In a recent episode of the podcast “20VC,” released on Monday, Siddharth declared emphatically that the era of traditional data-labeling enterprises has effectively ended, signaling a dramatic shift in how artificial intelligence systems are trained and refined.
Siddharth elaborated that the nature of data requirements in AI has undergone a profound and irreversible evolution. Whereas early machine learning models derived much of their intelligence from massive datasets annotated by armies of human labelers—who painstakingly identified objects in images, categorized customer reviews, or assigned classifications to text samples—contemporary systems are structured in an entirely different way. Models that employ reinforcement learning or operate as autonomous, agentic entities no longer depend on simple, surface-level examples. Instead, they thrive on complex, context-rich, and multidimensional data that better reflect the intricate ways in which real individuals reason, create, and perform professional tasks. In essence, AI development has moved from an era of mechanical repetition to one of cognitive emulation.
The Turing CEO emphasized that what AI systems now require is richer, more authentic input data—information that captures how humans perform knowledge work in genuine workplace settings. Such data go beyond simple annotations to encompass the subtleties of human judgment and the implicit logic that defines expertise across diverse domains. Siddharth noted further that major research laboratories and AI development hubs are increasingly seeking training partners that do more than just label; they want collaborators who can serve as proactive research accelerators, capable of shaping experiments, refining architectures, and co-developing next-generation training environments alongside them. According to Siddharth, this is not merely a change in operational model—it represents an entirely new paradigm in the relationship between AI researchers and the companies that supply the data on which these systems depend. He encapsulated this shift succinctly: “It’s now the era of research accelerators.”
To meet these evolving demands, Siddharth explained that AI training companies must prioritize building sophisticated reinforcement-learning environments—highly realistic, simulated ecosystems or “mini-worlds” that mimic the workflows, pressures, and decision-making processes that define everyday professional activities across industries. In doing so, AI models can practice problem-solving in lifelike scenarios, producing performance that more accurately mirrors human reasoning. Constructing these comprehensive simulation platforms, however, requires the recruitment of human experts with specialized knowledge in numerous sectors—from healthcare and finance to software engineering and education—whose expertise provides the cognitive scaffolding that guides model training.
Turing’s momentum in this high-stakes landscape reflects both its technical credibility and investor confidence. In June, the company announced that it had successfully raised $111 million in Series E funding, reaching a valuation of $2.2 billion. Earlier in the same year, it also disclosed that its annual revenue run rate had grown to $300 million for 2024—almost three times higher than the previous year—a vivid testament to the escalating demand for high-fidelity training solutions.
The rise and subsequent transformation of AI data-labeling companies illustrate a striking narrative. Over the past year, startups dedicated to labeling and structuring training data have commanded remarkable valuations, underscoring the immense capital inflows fueling this sector. In one notable example, Meta acquired a 49 percent stake in Scale AI, a transaction that placed Scale’s valuation at over $29 billion. Similarly, in October, the startup Mercor announced a funding deal valuing its business at $10 billion, reaffirming investor enthusiasm for platforms that serve as the backbone of machine learning development.
This surge in AI training demand has also reshaped the global labor market, particularly among freelancers and independent contractors. Reports from Business Insider in September revealed that a growing number of professionals around the world are earning several thousand dollars per month through AI training and labeling work. Yet, despite the lucrative potential, such employment can often be mentally taxing, unpredictable, and inconsistent. In interviews with more than 60 data labelers, Business Insider exposed both the opportunities and the unsettling realities of this new type of gig work, where participants are paid to feed intelligence to algorithms while managing emotionally straining or monotonous tasks.
The appetite for AI training capacity has even spawned an underground economy, one that thrives on unauthorized access to data-labeling and model-training platforms. Business Insider’s investigation uncovered more than 100 Facebook groups where illicit sales of genuine and counterfeit contractor accounts are openly conducted. These exchanges, although expressly prohibited by AI training firms, demonstrate the intensity of global competition for access to these income streams. Opportunistic individuals—and in some cases organized scammers—have taken advantage of this soaring demand, converting restricted platform credentials into a black-market commodity.
The trajectory described by Siddharth and reflected in these patterns paints a clear picture: the AI industry is moving decisively away from rote, low-skill data-tagging labor toward a future defined by sophisticated, domain-informed collaboration. As AI systems grow more capable, their needs become more reflective of real human cognition. Data is no longer just labeled—it is contextualized, embodied, and deeply entwined with the ways we think, work, and create. That shift marks not just the end of a business model, but the beginning of a new phase in how humanity teaches its machines to learn.
Sourse: https://www.businessinsider.com/data-labeling-ai-training-contractors-turing-jonathan-siddharth-specialist-research-2025-12