LLMs and coding agents are increasing the “attack surface” for potential vulnerabilities. LLMs are prone to “prompt injection” attacks where a malicious user provides input to get an AI system to take actions the developer didn’t intend. In one recent example, someone convinced ChatGPT to reveal Windows product keys, including one used by Wells Fargo bank, by asking it to play a guessing game. Red teamers from one security firm warned that “GPT-5’s raw model is nearly unusable for enterprise out of the box.” Perplexity’s Comet browser also had a vulnerability that allowed unauthorized access to users’ sensitive data via indirect prompt injection; the browser’s AI summaries work by feeding a part of a webpage into its LLM without distinguishing between the user’s instruction and untrusted content from the webpage.

The growing popularity of agents increases the opportunities for these kinds of attacks, with coding agents of particular concern. These agents used by software developers are often granted considerable authority, which translates into immense security risks. Cursor, a popular AI-powered coding tool, has an “Auto-Run Mode” that permits the Cursor agent to execute commands and write girls without asking for human confirmation. Coding agents are especially susceptible to prompt injection because they often access public code repositories like GitHub where an attacker can leave malicious instructions for a coding agent to find. Researchers have even been able to hide malicious code in a way that isn’t visible to a human user. Cursor is working with startup RunSybil to provide ways of probing vibe-coded applications for weaknesses. 

Cybersecurity expert Nathan Hamiel proposes a “Refrain, Restrict, Trap” (RRT) strategy for mitigating prompt-injection attacks: Refrain from using LLMs in high-risk of safety-critical scenarios; restrict the execution, permissions, and levels of access; and trap inputs and outputs to the system, looking for leaks of sensitive data.

Questions to consider

  1. How are companies assessing the security risks of the AI products they are adopting? How are they guarding against unauthorized employee use? What steps are being taken to mitigate AI-related security risks? How are they assessing the effectiveness of these strategies? Are they disclosing this analysis?

Keep Reading

No posts found