Mininyx Doodle/iStock/Getty Images Plus
Follow ZDNET:
Add us as a preferred source on Google.

ZDNET’s Expanded Key Takeaways:

A newly released and extensively detailed study has concluded that even the most sophisticated artificial intelligence systems — often described as the cutting-edge of machine learning progress — still fall short when it comes to performing freelance assignments at a commercially viable standard. The research evaluated notable AI agents such as Gemini 2.5 Pro, GPT-5, and others currently considered at the forefront of artificial intelligence evolution. Meanwhile, nearly half of the American workforce engaged in some form of freelance employment during 2025, marking this segment as a central component of the modern economy.

For freelancers who have grown increasingly anxious about the possibility of losing work to ever-improving automated systems, the findings offer a temporary reassurance. According to the study, which was jointly produced by Scale AI and the Center for AI Safety, these high-end AI agents were only able to automate fewer than 3% of the essential duties that a typical independent contractor performs. The study’s authors described the agents as unable to successfully complete the majority of projects to a standard that would meet a client’s realistic expectations in a professional freelancing context.

The Remote Labor Index, or RLI, becomes a centerpiece of this investigation. Posted recently on the preprint server arXiv — and pending peer review — the paper introduces the RLI as a standardized means of assessing how capable AI systems truly are when asked to perform economically valuable work. This benchmark arrives amid sweeping claims from many technology leaders regarding AI’s potential to reshape labor markets at an unprecedented pace. For instance, Anthropic CEO Dario Amodei remarked in May that the technology could potentially replace up to one-half of all white-collar positions within five years — a speculation that has fueled ongoing debates about the limits of automation and human adaptability.

As its title implies, the RLI was crafted specifically to evaluate AI’s effectiveness in automating remote and freelance labor. Freelancing inherently demands self-direction, efficient time management, and a strong balance between creativity and communication. It is also a booming sector of employment: by 2025, approximately seventy-three million Americans — about forty-three percent of the total workforce — were actively involved in freelance endeavors. The study thus grounds its analysis in both economic realism and the evolving social significance of flexible employment.

The researchers examined six leading AI agents, among them Google’s Gemini 2.5 Pro, OpenAI’s GPT-5, and Anthropic’s Sonnet 4.5. Unlike traditional chatbots that primarily generate text responses, these advanced agents can interact across digital tools such as browsers or design software, coordinating multiple steps to complete compound objectives. Developers frequently describe such systems as key milestones on the road to artificial general intelligence, or AGI.

However, AGI remains an elusive and loosely defined concept. Experts continue to debate what “general intelligence” in a machine would actually entail — whether it would simply mirror human cognitive capability or transcend it altogether. A broadly cited interpretation defines AGI as any system capable of matching or surpassing human proficiency across all economically meaningful tasks. Using that lens, the RLI’s results strongly indicate that AGI is still far from realization: none of the tested models came close to demonstrating the autonomous, multifaceted performance required for genuine remote employment. The authors candidly concluded that these systems are “far from capable of autonomously performing the diverse demands of remote labor.”

To ensure practical relevance, the study encompassed twenty-three categories of freelance work, including fields such as graphic and product design, computer-aided design (CAD), and game development. These skill categories were taken from real-world platforms like Upwork to ensure that the benchmark accurately represented the complexity, diversity, and economic value of actual freelance markets.

Each AI model received a detailed project brief along with supporting files necessary for project completion. The resulting deliverables were then evaluated manually by human reviewers and compared to equivalent human-produced outcomes. The fundamental goal was to determine whether the AI’s submission would be acceptable as a paid commission — that is, whether a reasonable client would approve it without hesitation as meeting contractual expectations. Performance scores were subsequently compared using an Elo-style rating metric. Among the competitors, Manus achieved the highest result, with a modest automation rate of 2.5%; Grok 4 and Claude Sonnet 2.5 followed, each slightly behind at 2.1%.

This empirical approach highlights how remote work encompasses far greater complexity than algorithmic problem-solving alone can capture. Despite popular narratives often suggesting that AI will soon replace vast swaths of human labor, the study underscores that human employment remains multi-dimensional — blending not just technical execution but creativity, problem-solving, interpretation, and nuanced interpersonal interaction. These qualities appear particularly resistant to substitution by even the most refined AI architectures currently available.

While certain job functions may lend themselves more readily to automation, the majority demand a composite of cognitive, technical, and emotional intelligence that remains uniquely human. The RLI’s <3% automation finding therefore reveals a striking disparity between the optimistic projections often made by corporate leaders and the demonstrable realities measured in empirical testing. The framework also does not account for essential yet intangible aspects of freelancing — such as client communication, feedback management, and negotiation — further illustrating the limitations of AI in replicating the full human work experience. Still, the study recognizes that technological development is advancing quickly. Agents are improving in sophistication every few months, with major companies investing massive resources to train new models capable of broader reasoning and tool use. It is plausible that in the next decade, organizations may routinely employ AI agents as semi-autonomous collaborators or even as freelance contractors. Yet for now, fears of an imminent AI takeover in the freelancing domain appear overblown. Humans remain firmly at the center of productive, creative, and client-driven freelance work. Want daily updates on stories like this? Subscribe to ZDNET’s Tech Update newsletter and get a concise overview of every morning’s most important technology developments delivered directly to your inbox. Sourse: https://www.zdnet.com/article/the-best-ai-agents-are-terrible-freelancers-for-now/