Agent security testing. Framework optional.

Find the vulnerabilities
your tests miss.

Khaos reveals security vulnerabilities, resilience failures, and behavioral drift in your AI agents—with automated comparison showing exactly what each change introduced.

Agent Behavior Intelligence. Framework optional.

Get Started Try Demo View Documentation

Works with your stack

OpenAI

Anthropic

Gemini

Mistral

Cohere

LangGraph

CrewAI

+ More

Test at the I/O boundary — where reliability actually matters.

The Impact Report

Everything you need to ship with confidence

Khaos dashboard showing agent evaluation scores for security, resilience, and goal completion

The Problem

Stop flying blind

Without Khaos

"Did my prompt change break anything?"
"Why did costs spike this month?"
"What happens when the API times out?"
"Is this model swap safe to deploy?"

With Khaos

"3 outputs diverged. Here they are."
"Cost increased 12% due to retry logic."
"Agent recovers in 1.2s with fallback."
"All security tests pass. Ship it."

The Four Lenses

Agent Behavior Intelligence across four lenses

Structural

How does it behave?

Tool call patterns
Latency & cost
Token usage

Resilience

How does it handle failure?

Fault injection
Recovery testing
Graceful degradation

The Missing Layer

Security

Can it be exploited?

Prompt injection
Leakage detection
Guardrail testing

Functional

Did output quality change?

Output comparison
Divergence detection
Human review

Security Testing

See exactly how your agent responds to attacks

Browse every test case — baseline, resilience, and security — with the full conversation trace and verdict.

Khaos evaluation detail showing security attack test cases with agent conversation traces

Quick Start

Three commands to confidence

# Install Khaos
pip install khaos-agent

# Discover decorated agents
khaos discover

# Run a pack by agent name (sync optional)
khaos run my-agent --eval quickstart --sync

Khaos CLI evaluation output showing baseline, resilience, and security test results

Why Khaos

What makes Khaos different

Chaos Engineering Built-In

Active fault injection for resilience testing. No other tool does this.

Comparison-First

See what changed between versions. Not just single-run traces.

Impact Reports

Shareable artifacts for PRs and team reviews. Ship with confidence.

Framework Agnostic

OpenAI, Anthropic, Gemini, Mistral, Cohere, LangGraph, CrewAI—test any agent with the same scenarios.

Stop shipping blind.

Know exactly what your agent change introduced—security gaps, resilience regressions, and behavioral drift—before you ship.

Get Started Free

Find the vulnerabilitiesyour tests miss.