Python-first agent testing. Framework optional.

See exactly what your change
did to your agent.

Khaos captures every dimension of agent behavior—cost, latency, tool usage, resilience, security, and outputs—and shows you exactly what changed.

Agent Behavior Intelligence. Framework optional.

Works with your stack

OpenAI
Anthropic
Gemini
Mistral
Cohere
LangGraph
CrewAI

Test at the I/O boundary — where reliability actually matters.

Everything you need to ship with confidence

Impact Reportv2.1.0 vs main
Ready to ship
Cost+12%
$0.042$0.047
Latency-23%
2.4s1.8s
Resilience+15%
7283
Security0%
9494
Output Comparison47 identical, 3 divergent

Stop flying blind

Without Khaos
  • "Did my prompt change break anything?"
  • "Why did costs spike this month?"
  • "What happens when the API times out?"
  • "Is this model swap safe to deploy?"
With Khaos
  • "3 outputs diverged. Here they are."
  • "Cost increased 12% due to retry logic."
  • "Agent recovers in 1.2s with fallback."
  • "All security tests pass. Ship it."

Agent Behavior Intelligence across four lenses

Structural

How does it behave?

  • Tool call patterns
  • Latency & cost
  • Token usage

Security

Can it be exploited?

  • Prompt injection
  • Leakage detection
  • Guardrail testing

Functional

Did output quality change?

  • Output comparison
  • Divergence detection
  • Human review

Three commands to confidence

# Install Khaos
pip install khaos

# Discover decorated agents
khaos discover

# Run a pack by agent name (sync optional)
khaos run my-agent --eval quickstart --sync

What makes Khaos different

Chaos Engineering Built-In

Active fault injection for resilience testing. No other tool does this.

Comparison-First

See what changed between versions. Not just single-run traces.

Impact Reports

Shareable artifacts for PRs and team reviews. Ship with confidence.

Stop flying blind.

Know exactly what your agent change did before you ship.

Get Started Free