Agent security testing. Framework optional.

Find the vulnerabilities
your tests miss.

Khaos reveals security vulnerabilities, resilience failures, and behavioral drift in your AI agents—with automated comparison showing exactly what each change introduced.

Agent Behavior Intelligence. Framework optional.

Works with your stack

OpenAI
Anthropic
Gemini
Mistral
Cohere
LangGraph
CrewAI

Test at the I/O boundary — where reliability actually matters.

Everything you need to ship with confidence

Khaos dashboard showing agent evaluation scores for security, resilience, and goal completion

Stop flying blind

Without Khaos
  • "Did my prompt change break anything?"
  • "Why did costs spike this month?"
  • "What happens when the API times out?"
  • "Is this model swap safe to deploy?"
With Khaos
  • "3 outputs diverged. Here they are."
  • "Cost increased 12% due to retry logic."
  • "Agent recovers in 1.2s with fallback."
  • "All security tests pass. Ship it."

Agent Behavior Intelligence across four lenses

Structural

How does it behave?

  • Tool call patterns
  • Latency & cost
  • Token usage

Resilience

How does it handle failure?

  • Fault injection
  • Recovery testing
  • Graceful degradation

Functional

Did output quality change?

  • Output comparison
  • Divergence detection
  • Human review

See exactly how your agent responds to attacks

Browse every test case — baseline, resilience, and security — with the full conversation trace and verdict.

Khaos evaluation detail showing security attack test cases with agent conversation traces

Three commands to confidence

# Install Khaos
pip install khaos-agent

# Discover decorated agents
khaos discover

# Run a pack by agent name (sync optional)
khaos run my-agent --eval quickstart --sync
Khaos CLI evaluation output showing baseline, resilience, and security test results

What makes Khaos different

Chaos Engineering Built-In

Active fault injection for resilience testing. No other tool does this.

Comparison-First

See what changed between versions. Not just single-run traces.

Impact Reports

Shareable artifacts for PRs and team reviews. Ship with confidence.

Stop shipping blind.

Know exactly what your agent change introduced—security gaps, resilience regressions, and behavioral drift—before you ship.

Get Started Free