Agentic artificial intelligence, often abbreviated as agentic AI, is rapidly becoming a defining force in the broader landscape of modern machine learning, and its progress is creating both fascination and alarm in equal measure. This technological approach emphasizes autonomous systems—AI-driven programs capable of pursuing objectives and executing complex workflows without requiring step-by-step human intervention. Yet, despite its transformative promise, a recent array of research findings has revealed deep-rooted concerns about transparency, security, and accountability within this emerging field.
A crucial turning point came when OpenAI—one of the most prominent organizations shaping the AI frontier—announced the hiring of Peter Steinberg, the architect behind the open-source framework known as OpenClaw. This framework garnered extraordinary attention for empowering AI agents with remarkable abilities, such as autonomously sending and responding to emails on behalf of users. However, it simultaneously shocked cybersecurity professionals due to numerous structural vulnerabilities that could allow malicious actors to seize full control of a user’s computer system. The duality of OpenClaw’s capabilities—its boundless potential and its alarming risks—has become emblematic of the broader challenges facing agentic AI as it enters mainstream deployment.
In response to the surging interest in autonomous AI agents, researchers from the Massachusetts Institute of Technology, working in collaboration with teams from the University of Cambridge, Harvard University, the University of Pennsylvania, Stanford University, and other leading institutions, published an extensive survey of thirty widely used agentic AI systems. Their findings were unambiguous: the current state of agentic AI represents a profound security crisis characterized by secrecy, insufficient disclosure, and a worrying absence of even the most basic operational safety standards. The study, detailed in a comprehensive thirty-nine-page report titled “The 2025 AI Index: Documenting Sociotechnical Features of Deployed Agentic AI Systems,” underscores how nearly every critical component of agentic AI remains opaque to both regulators and end users.
One of the most alarming revelations involves the near-total lack of transparency among developers building these systems. Lead author Leon Staufer of the University of Cambridge and his collaborators noted pervasive omissions in public documentation, identifying eight major categories of disclosure that most AI agents fail to address entirely. These gaps include the absence of risk disclosure, third-party testing results, and clear methods for evaluating failure modes. In essence, the institutions responsible for creating these tools are providing the public with little to no information about how their systems operate or what could happen if they malfunction.
The researchers further illustrate the scale of the issue with examples from enterprise-grade systems. They found that in twelve of the thirty cases examined, usage monitoring either did not exist or was so limited that it provided only superficial alerts once computational or rate limits were reached. For large organizations, this absence of monitoring tools makes resource planning almost impossible and leaves them vulnerable to uncontrolled system behaviors. Perhaps most troubling of all, many of these AI agents do not even disclose their artificial nature when interacting with external systems or users. Without mechanisms such as watermarking or automated identification, people often cannot distinguish an AI-generated image, message, or action from one produced by a human being—a situation that undermines user trust and complicates oversight.
The study also exposes an unsettling lack of control mechanisms. Several prominent enterprise platforms, including Alibaba’s MobileAgent, HubSpot’s Breeze, IBM’s watsonx, and automations developed by Berlin-based company n8n, lack documented methods to stop or suspend autonomous operations once launched. The authors remark that, in some cases, the only available option is to halt *all* agents simultaneously, an extreme measure that could disrupt entire workflows and damage business continuity. The mere possibility of being unable to terminate a malfunctioning or rogue system starkly emphasizes the latent risks associated with giving AI systems significant autonomy.
Beyond individual examples, the MIT-led team warns that the governance problems plaguing agentic AI—fragmented ecosystems, ethical headaches surrounding online conduct, and the absence of standardized evaluation protocols—are likely to intensify as these systems become even more capable. Their outreach efforts revealed another revealing pattern: out of all the companies contacted for feedback during the four-week research process, only about one-quarter responded, and of those, a mere three provided comments rich enough to be included in the final analysis.
Agentic AI itself represents a new evolutionary stage within machine learning. Unlike traditional language models that merely interpret prompts and provide textual responses, agentic AI systems may be embedded into larger ecosystems where they carry out multistep operations—such as processing purchase orders, managing customer service interactions, or querying large datasets to support business decision-making. In everyday terms, these agents act less like digital assistants waiting for commands and more like semi-autonomous collaborators capable of taking initiative based on high-level goals.
To evaluate this growing domain, the researchers grouped the technologies into three broad categories: enhanced chatbots such as Anthropic’s Claude Code, AI-enabled web browsers and extensions like OpenAI’s Atlas, and enterprise-oriented tools exemplified by Microsoft’s Office 365 Copilot. Across nearly all examples, developers relied on a small collection of proprietary foundation models—primarily OpenAI’s GPT, Anthropic’s Claude, and Google’s Gemini—creating a concentrated system of dependencies that could magnify risks through shared vulnerabilities.
The report’s findings are supported through several illustrative case studies. OpenAI’s ChatGPT Agent emerged as one of the few positive examples because it uses cryptographic signatures to verify and track its web interactions, thereby enabling a degree of accountability other vendors entirely omit. In sharp contrast, the researchers deem Perplexity’s Comet browser a major liability; its documentation lacks safety evaluations, benchmark disclosures, and any evidence of sandboxing measures designed to contain errant agent behavior. Meanwhile, Amazon has accused Perplexity of misrepresenting its AI browser as a human user when accessing servers—precisely the kind of deceptive activity the study warns against.
HubSpot’s Breeze agents present a more complex picture. While they demonstrate impressive compliance with major regulatory standards such as SOC2, GDPR, and HIPAA, the report reveals that HubSpot provides no meaningful transparency about the actual testing conducted by its third-party auditor. The authors describe this approach—highlighting adherence to compliance frameworks while withholding methodological detail—as emblematic of most enterprise AI vendors’ public communication strategies.
At its core, the study emphasizes a crucial philosophical and operational truth: agentic AI does not spontaneously arise; it is deliberately engineered by human beings making conscious choices. Every design parameter—from transparency levels to shutdown mechanisms—reflects human priorities, values, and trade-offs. Consequently, developers and corporations such as OpenAI, Anthropic, Google, and others bear direct responsibility for ensuring that these systems are safe, auditable, and governed by ethical principles. The researchers warn that, unless these organizations take immediate steps to address the glaring deficiencies revealed in their survey, regulatory authorities will inevitably intervene, potentially imposing strict oversight on an industry that has so far thrived in self-directed experimentation.
In essence, the report’s message can be distilled into an urgent call to action: the extraordinary potential of agentic AI must be matched by an equally extraordinary commitment to transparency, accountability, and safety. Without those pillars, the promise of intelligent autonomy may instead give rise to a technological environment governed not by innovation, but by uncertainty and risk.
Sourse: https://www.zdnet.com/article/ai-agents-are-fast-loose-and-out-of-control-mit-study-find/