New Tool Uncovers Safety Risks in Human-AI Exchanges

Marylee WilliamsTuesday, September 9, 2025

A new study using a CMU tool shows that AI agents pose safety risks in nearly two-thirds of their interactions with humans in critical areas such as education and health care.

The Breakdown

  • The HAICOSYSTEM simulation tool tests AI agents in realistic scenarios, identifying safety risks before they're deployed.
  • The tool can help developers build stronger guardrails against harmful behavior between the AI agent and users.
  • Researchers using the tool found that state-of-the-art LLMs exhibit safety risks in more than 62% of interactions with humans.

***

As people turn to AI agents to handle tasks like coordinating medical care, managing finances or booking travel, the potential for misinformation, unsafe answers or abuse by malicious users increases. In fact, a new study using a Carnegie Mellon University tool shows that AI agents pose safety risks in nearly two-thirds of their interactions with humans in critical areas such as education and health care.

AI agents are tools that can complete tasks for users — not just simple chatbots — and interactions with them create opportunities for safety risks. For example, someone using a patient-management tool could direct an AI agent to access another patient's medical information and the AI agent could comply. Similarly, someone looking to obtain an opioid for illicit use could try to acquire a prescription from an AI agent used in a telemedicine service.

Researchers from the School of Computer Science developed the HAICOSYSTEM tool so developers could test interactions between AI agents and users to identify these risks.

"We want to make it easier to see what kind of potential risks or mistakes the AI agents might make before you ship the agents to thousands or millions of customers," said Xuhui Zhou, a Ph.D. student in the Language Technologies Institute (LTI). "If we can discover those mistakes before these tools are shipped out, we can design better guardrails and prevent them from doing real-world harm."

Along with Zhou, the SCS team includes LTI Assistant Professor Maarten Sap and alumni Frank Xu and Hao Zhu. Researchers from the University of Washington and the Allen Institute for AI (Ai2) also worked on the project.

HAICOSYSTEM simulates how human users, whether good or malicious, interact with AI agents across a range of situations, from daily life to professional settings. It also models how the AI agent interacts in these varied environments. The simulation happens in a controlled environment and, when it's over, the agents' safety risks are examined based on these interactions.

The researchers also developed a system to evaluate the safety and performance of the AI agents in these complex interactions. HAICOSYSTEM-EVAL has a scenario-specific checklist of safety and risk outcomes (for example, legal risks that may arise from exposing a user's medical information). This evaluation step also analyzes the trade-offs between efficiency and risky behavior to study the system's performance.

Researchers ran about 8,000 simulations based on 132 scenarios in areas such as education and health care to test AI agents in HAICOSYSTEM. In the experiment, five simulated human users with different profiles interacted with the AI agent, and these users exhibited both malicious and benign intent. Through these tests, the researchers found that state-of-the-art LLMs exhibit safety risks in 62% of cases. When the AI agents interacted with malicious users, the safety risks increased. Researchers also found that AI agents powered by larger models, which might have larger training datasets, had lower safety risks.

"This simulation framework contrasts with Silicon Valley's 'build fast, break things' mentality, where the users become the testers for these technologies because these organizations decide to see what happens after they deploy it," Sap said. "Although that's not entirely true anymore — LLM companies are starting to see the need for prerelease safety testing. But there aren't really any frameworks to assess potential risks in these AI agents beforehand. That's exactly what this work is trying to change. You want to cover safety risks with benign users (where things might go awry for various reasons, like ambiguous instructions). But it's also important to catch safety risks due to malicious users who potentially misuse these agents and cause a lot of harm."

The Defense Advanced Research Projects Agency and Schmidt Sciences partially funded this work, along with Ai2. Researchers plan to continue developing realistic environments for these simulations to test AI safety.

For More Information

Aaron Aupperlee | 412-268-9068 | aaupperlee@cmu.edu