Cloud & Infra

AI's Shift from Assistant to Autonomous Actor Outpaced Security Measures

Published February 24, 2026 · 17:40 UTC 5 min read

Recent developments in enterprise AI have transitioned from initial pilot programs to fully operational systems that manage customer data, conduct business transactions, and integrate with essential infrastructure. This shift has highlighted a substantial disconnect between the capabilities of AI agents and the visibility and control that security teams possess.

A briefing released by the AIUC-1 Consortium, which includes insights from Stanford's Trustworthy AI Research Lab and over 40 security executives, outlines the security landscape that emerged in 2025 and anticipates the risks that organizations will face in 2026. Contributors to this document include Chief Information Security Officers (CISOs) from notable companies such as Confluent, Elastic, UiPath, and Deutsche Börse, alongside security researchers and advisors from MIT Sloan, Scale AI, and Databricks.

According to a survey conducted by EY, as referenced in the briefing, 64% of companies with annual revenues exceeding $1 billion have suffered losses of more than $1 million due to AI-related failures. Additionally, one in five organizations reported experiencing a breach attributed to unauthorized AI usage, often referred to as shadow AI.

Key Security Risks Identified

The briefing identifies three primary categories of risk currently confronting security professionals.

The first is the agent challenge. AI systems have progressed from simple assistants to autonomous agents capable of executing complex tasks, utilizing external tools, and making decisions without requiring human approval for each action. This evolution creates potential failure scenarios that can occur without external interference. An agent with excessive permissions and inadequate containment measures can inadvertently cause harm through its regular operations. According to the survey, 80% of organizations reported encountering risky agent behaviors, such as unauthorized system access and improper data exposure. Alarmingly, only 21% of executives reported having complete visibility into agent permissions, tool usage, or data access patterns.

Omar Khawaja, Vice President and Field CISO at Databricks, highlighted that AI components are in a constant state of change across the supply chain. Existing security controls, which are based on the assumption of static assets, result in blind spots when these behaviors shift.

The second challenge is visibility. In 2025, 63% of employees who used AI tools inadvertently shared sensitive company data, including source code and customer records, with personal chatbot accounts. Companies typically have an estimated 1,200 unofficial AI applications in use, and 86% of organizations reported lacking visibility into their AI data flows. Shadow AI breaches are notably costly, with an average price tag of $670,000 more than standard security incidents, primarily due to delayed detection and challenges in assessing the extent of exposure.

The third challenge is trust. Prompt injection, once confined to academic circles, emerged as a recurring issue in production environments during 2025. OWASP’s 2025 LLM Top 10 list identified prompt injection as the leading vulnerability. This issue arises because large language models (LLMs) struggle to accurately differentiate between instructions and data inputs. Presently, 53% of companies utilize retrieval-augmented generation or agentic pipelines, both of which introduce new vulnerabilities to injection attacks.

Inadequate Frameworks for Agent-Specific Risks

Existing frameworks such as the NIST AI Risk Management Framework and ISO 42001 offer governance structures that include risk committees and documentation requirements. However, these frameworks do not address the specific technical controls that CISOs require for agent deployments, such as validating tool call parameters, logging prompt injections, or conducting containment tests for systems involving multiple agents.

Sanmi Koyejo, the head of Stanford's Trustworthy AI Research Lab, acknowledged that there are currently no large-scale longitudinal studies comparing incident rates between organizations using technically specific frameworks and those relying on broader governance. He explained that the AIUC-1 is still in its early adoption phase, making it premature for such comparisons. Research from Koyejo's lab indicated that model-level guardrails alone were inadequate, as fine-tuning attacks bypassed Claude Haiku 72% of the time and GPT-4o 57% of the time. Technically specific controls, which include input validation, action-level guardrails, and visibility into reasoning chains, are necessary to address gaps that model-level safety misses. Koyejo compared this situation to the adoption of multi-factor authentication in traditional cybersecurity, where specific, auditable technical controls significantly reduced breach risks compared to high-level policy commitments.

According to Koyejo, early adopters of technically grounded AI security standards are experiencing quicker procurement cycles, enhanced audit readiness, and less friction when deploying agents in regulated environments. A case study published by Virtue AI, co-founded by Koyejo, outlines the application of structured AI security controls at AllianceBernstein, a financial services firm.

Implementing Continuous Adversarial Testing

The briefing suggests that organizations should incorporate continuous red-teaming into their agent operations. Nancy Wang, the CTO of 1Password, emphasized that enterprises lacking in-house AI security expertise should adopt a model combining platform defaults, automation, and targeted expertise instead of relying solely on large specialized teams.

"Baseline guardrails must be embedded within the platforms themselves," Wang stated. "Sandboxed tool execution, scoped and short-lived credentials, runtime policy enforcement, and comprehensive audit logging should not necessitate custom engineering." She advocated for integrating adversarial testing into continuous integration and release workflows so that updates to models, prompt changes, or agent reconfigurations automatically prompt predetermined attack simulations. This allows human experts to focus on significant changes rather than manually rerunning entire testing protocols.

Wang recommended categorizing agents by risk level. Those with access to sensitive data or production systems should undergo continuous adversarial testing and have stronger review processes, while lower-risk agents can follow standardized controls and periodic assessments. "The objective is to incorporate continuous validation into the engineering lifecycle," she stated.

Koyejo's lab has directly addressed the automation issue. Research into what they term AutoRedTeamer has shown that automated attack selection can reduce computational costs by 42% to 58% compared to conventional methods, while also offering broader vulnerability coverage. He advised resource-constrained organizations to begin with automated continuous testing linked to deployment pipelines, implement runtime guardrails for any agents with access to sensitive data or real-world tools before they go live, and utilize human red-teaming selectively for high-stakes deployments.

In the realms of identity and cloud security, Wang pointed out that shifting from high-level policy statements to enforceable controls, such as least privilege, short-lived credentials, and scoped tokens, has significantly reduced lateral movement and mitigated impacts during incidents. "Agents with tightly scoped capabilities and time-bound credentials simply cannot access what they were never granted," she explained. "This creates a clear and observable difference."