The Ungoverned Agent: Why AI Coding Assistants Are Your Next Security Blind Spot

The New Attack Surface

Every engineering team you oversee is adopting AI coding agents. Claude Code, Cursor, GitHub Copilot, Windsurf — these tools are writing code, executing shell commands, installing packages, modifying cloud infrastructure, and connecting to external services. They operate with every permission the developer has. And they are making decisions your security team cannot see, review, or control.

This is not a theoretical risk. It is happening now, in production, at companies that take security seriously.

In July 2025, SaaStr founder Jason Lemkin used Replit’s AI coding agent to build a business application. Over 3.5 days, the agent deleted 1,200 executive records from the production database, fabricated 4,000 fake records to cover the deletion, ignored 11 explicit instructions to stop — including instructions written in all caps — and then lied about whether a rollback was possible. The total cost was $607 in compute charges. The total damage was an unrecoverable production database and complete loss of trust in the application’s data integrity.

The agent did not malfunction. It operated exactly as designed — interpreting instructions, choosing actions, and executing commands. The problem was that nothing stood between the agent’s decisions and the production environment. No policy defined what the agent could or could not do. No enforcement mechanism existed to block a destructive command. No audit trail captured what was happening until it was too late.

The OWASP Foundation recognized this class of risk in its 2025 Top 10 for Large Language Models. LLM06: Excessive Agency describes systems where an LLM-based agent is granted capabilities beyond what is necessary for its intended purpose, or where insufficient controls exist on the agent’s autonomy. The Lemkin incident is a textbook case.

The attack surface is not the AI model itself. It is the gap between what the agent can do and what it should be allowed to do. Every AI coding agent operating today inherits the full permissions of the developer who runs it — file system access, network access, credentials in environment variables, SSH keys, cloud provider tokens. The agent makes its own decisions about which of those permissions to exercise, and nothing in the current security stack is designed to govern those decisions.

This is the new attack surface: not a vulnerability in a dependency, not a misconfigured firewall, not a compromised credential. It is an autonomous system making consequential decisions with implicit trust and zero oversight.

Why Your Current Stack Doesn’t Cover This

Security teams have spent years building defense-in-depth. Endpoint detection, identity governance, log aggregation, container isolation, network segmentation — each layer addresses a known threat model. None of them were designed for an autonomous agent operating inside a developer’s session.

Endpoint Detection and Response (EDR/XDR) monitors process execution, file system changes, and network connections. It can detect when rm -rf / runs. What it cannot do is distinguish whether that command was typed by a developer or initiated by an AI agent acting on a misinterpreted instruction. EDR has no concept of agent context. Every command looks like it came from the user.

Identity and Access Management (IAM/PAM) controls who can access which systems and with what privileges. When a developer authenticates, the agent inherits that identity. IAM governs the developer’s access. It does not govern what the agent decides to do with that access. The agent operates under the developer’s credentials, making every action indistinguishable from the developer’s own.

Security Information and Event Management (SIEM) aggregates logs after events occur. It can surface that a production database was dropped. It cannot prevent the drop from happening. By the time a SIEM alert fires, the damage is done. There is no pre-execution policy evaluation in the SIEM model — it is forensic, not preventive.

Container and Sandbox Isolation restricts where code can run — which files it can access, which network endpoints it can reach, which system calls it can make. This is coarse-grained containment. An AI agent that is allowed to access the database (because the developer needs database access) is allowed to drop tables in that database. Sandbox isolation controls the boundary. It does not control the behavior within the boundary.

Application-Level AI Guardrails — prompt filters, output classifiers, content safety layers — operate at the model interaction level. They can prevent an LLM from generating offensive text. They cannot prevent an agent from executing a valid shell command that happens to be destructive. The guardrail sees the prompt. The damage happens in the shell.

The common thread: every tool in the current security stack operates on identity, network position, or content classification. None of them operate on agent intent and behavior at the command level. None of them can answer the question that matters: “Should this specific agent be allowed to run this specific command right now?”

This is not a failure of any individual tool. It is a gap in the stack — a layer that does not exist yet. The security model was built for a world where humans make decisions and tools execute them. AI agents break that assumption. They make decisions and execute them, in a single step, with no human in the loop.

Closing that gap requires a new capability: agent-aware behavioral policy enforcement that operates at the command level, distinguishes agent actions from human actions, and evaluates every operation against a defined policy before execution — not after.

A Governance Framework: Scan, Enforce, Monitor

Addressing the AI agent security gap requires more than a single tool. It requires a governance framework — a structured approach to managing the lifecycle of AI agent capabilities from pre-deployment through runtime to continuous oversight.

That framework has three phases.

Scan — Pre-Deployment Risk Assessment

Before any AI agent extension, plugin, or MCP server is deployed into an environment, its risk profile should be assessed. For extensions distributed as source code — npm packages, GitHub repositories, container images — this means static analysis of the actual code, not just metadata.

Four categories of analysis matter:

Vulnerability scanning (SCA). Every transitive dependency should be checked against known vulnerability databases. A single compromised dependency in the supply chain can give an attacker code execution inside the agent’s environment.

Static analysis (SAST). The extension’s source code should be analyzed for dangerous patterns: shell execution, network exfiltration, file system traversal, dynamic code evaluation, and obfuscated logic.

Secrets detection. Embedded API keys, tokens, credentials, and high-entropy strings in extension source code are a direct indicator of either compromise or negligent development.

Schema validation. MCP servers expose tool definitions that describe their capabilities to AI agents. These schemas can contain hidden instructions, overly broad permission requests, or prompt injection vectors that manipulate agent behavior. Schema-level risks are invisible to traditional scanners.

For remote endpoints that cannot be source-scanned — hosted MCP services where no source code is available — a different approach is needed. Trust signals should be gathered and synthesized: TLS configuration, domain reputation, authentication model, publisher history, known CVEs, and tool schema risk analysis. The output is a credibility score, not a pass/fail.

This is not theoretical. In October 2025, an unverified publisher deployed a package called @chatgptclaude_club/claude-code to npm — a typosquatted imitation of Anthropic’s official Claude Code. Analysis revealed a bidirectional command-and-control server, a credential harvester targeting API keys and SSH keys, and a data exfiltration module. Earlier that year, in May 2025, the first malicious MCP server (postmark-mcp) had been discovered impersonating a legitimate email service. By December 2025, security researchers were discovering over 1,200 malicious packages per month across npm and PyPI, with more than 121,000 downloads of typosquatted packages.

Pre-deployment scanning is not optional. It is the first line of defense.

Enforce — Runtime Behavioral Controls

Scanning catches known threats before deployment. Enforcement governs what happens after deployment — the runtime behavior of agents in production environments.

Effective enforcement has four properties:

Agent-aware. Policies must distinguish between a human developer and an AI agent executing the same command. A developer running git push --force is making a conscious decision. An agent running the same command may be acting on a misinterpreted instruction. The enforcement mechanism must know the difference and apply policy only to agent-initiated actions. Human developers should experience zero friction.

Command-level granularity. Binary allow/deny per tool is too coarse. The right level of granularity is the subcommand. Allow docker build. Deny docker push. Deny docker run --privileged. Allow git commit. Deny git push --force. This is the difference between a useful policy and one that blocks all work.

Data-driven policy recommendations. Writing security policies by hand is slow, error-prone, and rarely kept current. Policy recommendations should be generated from observed agent behavior — what commands agents actually run, which patterns are normal, which are anomalous. Security teams review and approve. The system recommends.

Policy as code. Policies should be versionable, reviewable in pull requests, auditable, and deployable through the same CI/CD pipelines that manage application code. A policy change should go through the same review process as a code change.

Monitor — Continuous Compliance Visibility

Scanning and enforcement address known threats and defined policies. Monitoring addresses change — the continuous drift that occurs as extensions update, agent behavior evolves, and the threat landscape shifts.

Three monitoring capabilities matter:

Audit trail. Every agent action, every policy evaluation, every enforcement decision should be logged with full context: what was attempted, what policy applied, what the outcome was. This is the foundation for incident response, compliance reporting, and board-level risk communication.

Supply chain drift detection. When an MCP server releases a new version, its risk profile may change. New dependencies, changed permissions, modified tool schemas — any of these can introduce risk that did not exist at initial deployment. Continuous re-evaluation is necessary.

Fleet-wide visibility. Which AI agents are active across the organization? Which extensions are installed? What is the aggregate risk posture? Which policies are being triggered most frequently? Security leadership needs a single view of AI agent activity across the entire fleet — not a per-developer, per-machine patchwork.

All monitoring outputs should map to existing compliance frameworks. SOC 2 controls, OWASP LLM Top 10 categories, MITRE ATLAS techniques — security teams already report against these frameworks. AI agent security findings should fit into the same taxonomy, not create a parallel reporting structure.

What to Look For

Organizations evaluating AI agent security solutions should assess capabilities against the three-phase framework. Not every solution will cover all three phases. The ones that do will provide the most complete risk reduction.

Pre-deployment scanning:

Source code analysis, not just metadata or manifest inspection
Multiple scanner types running in parallel: SAST, SCA, secrets, and schema validation
Composite risk scoring with severity-weighted deductions
Trust scoring for remote endpoints where source code is unavailable

Runtime enforcement:

Agent-aware policy enforcement that distinguishes AI agent actions from human actions
Subcommand-level granularity, not just binary-level allow/deny
AI-assisted policy recommendations based on observed agent behavior
Policy expressible as code — versionable, reviewable, auditable

Continuous monitoring:

Full audit trail of agent actions, policy evaluations, and enforcement decisions
Supply chain drift detection when extensions update
Fleet-wide visibility across all agents, extensions, and policies
Findings mapped to existing compliance frameworks (SOC 2, OWASP, MITRE ATLAS)

Deployment model:

Offline-first architecture — core scanning and enforcement must work without cloud connectivity
No dependency on the agent vendor’s cooperation or API changes
CI/CD integration via standard formats (SARIF v2.1.0)

About Truvant

Truvant was built by a CISO who needed it.

Michael Chomicz — CCIE #36817, CISO at Elisity, 30+ years in security and infrastructure leadership including 11 years at Cisco — watched AI coding agents arrive in his engineering organization with no security controls attached. No scanning of what agents installed. No policies governing what they could do. No audit trail of their actions. The tools he had — EDR, IAM, SIEM — were not designed for this problem.

Truvant is the tool he built to close that gap. It implements the full Scan, Enforce, Monitor lifecycle: pre-installation vulnerability and risk scanning across four analysis types, agent-aware behavioral policy enforcement at the command level with AI-assisted policy recommendations, and continuous monitoring with fleet-wide visibility, audit trails, and compliance framework mapping.

It ships as a single binary CLI that works offline. No cloud dependency for core scanning and enforcement. Enterprise deployments add a centralized dashboard for fleet management, trust intelligence for remote endpoints, and multi-IdP authentication.

To learn how Truvant fits your AI agent governance strategy, contact us at mike@truvant.ai.