Quickstart

Get started with Khaos in under 5 minutes. Test your AI agent for security vulnerabilities, resilience issues, and behavioral regressions with a single @khaosagent decorator.

What is Khaos?

Khaos is a multi-dimensional evaluation platform for AI agents. It answers the question: "What did my agent change actually do?"

With a single command, Khaos evaluates your agent across four dimensions:

  • Structural - Cost, latency, token usage, tool patterns
  • Resilience - How your agent handles failures (chaos engineering)
  • Security - Vulnerability to prompt injection and data leakage
  • Functional - Output quality comparison between versions

Prerequisites

  • Python 3.9+ - Khaos requires Python 3.9 or higher
  • pip or uv - Package manager for installation
  • LLM API Key - API key for your LLM provider (OpenAI, Anthropic, etc.)

1. Install Khaos

Terminal
pip install khaos

# Or with uv (recommended for faster installs)
uv pip install khaos

Verify the installation:

Terminal
khaos --version

2. Add the @khaosagent Decorator

Khaos runs your agent through a decorated handler (no protocol plumbing required).

Python
from khaos import khaosagent

@khaosagent(name="my-agent", version="1.0.0")
def handle(message):
    prompt = (message.get("payload") or {}).get("text", "")
    # Call your framework/LLM here and return {"text": "..."}.
    return {"text": f"Hello! You said: {prompt}"}
Need more options?
See @khaosagent Decorator for all parameters, async handlers, and multi-agent patterns.

This is the only required integration step. Your agent logic stays the same; Khaos handles runtime instrumentation.

3. Discover Your Agent

Register your decorated agent(s) so you can run by name:

Terminal
khaos discover
Multiple agents in one file?
Give each agent a unique @khaosagent(name=...) and run by name.

4. Run an Evaluation

Run by agent name (recommended):

Terminal
khaos run my-agent --eval quickstart --sync

The quickstart pack includes baseline + resilience + security.

5. Choose an Evaluation

Khaos provides 4 built-in evaluations for different use cases:

Terminal
# Quick baseline observation (~1 min)
khaos run <agent-name> --eval baseline

# Default: balanced evaluation (~2 min)
khaos run <agent-name> --eval quickstart

# Comprehensive evaluation (~10-15 min)
khaos run <agent-name> --eval full-eval

# Security-focused testing (~5-8 min)
khaos run <agent-name> --eval security

See Evaluations for details on what each eval tests.

6. Understand Your Results

Khaos provides beautiful real-time progress during evaluation:

TEXT
Running eval: quickstart v1.0

 ⠹ Baseline  4/6 (67%)
     ✓ math_addition 1450ms
     ✓ instruction_follow 890ms
     ✓ knowledge_capital 1200ms
     ✓ text_uppercase 650ms

   Resilience  waiting...
   Security    waiting...

After completion, you get clear pass/fail results:

TEXT
✓ Baseline: 6/6 passed
✓ Resilience: 5/6 passed
! Security: 43/50 defended

When issues are found, Khaos provides actionable explanations:

TEXT
What Failed

Security Vulnerabilities:
  🟡 MEDIUM Prompt Injection (3 instances)

Attack Types Agent is Vulnerable To:
  • Prompt Injection
    → Attacker can inject malicious instructions via user input

Recommended Actions:
  1. Review Security Findings
     → 3 potential vulnerabilities found
     → Consider adding guardrails for sensitive operations
Security is ON by default
Unlike other testing tools, Khaos runs security probes by default. Use --no-security to skip security testing if needed.

7. Add to CI/CD

Khaos integrates directly into your CI/CD pipeline with threshold-based gating:

Terminal
# CI mode: run + threshold check + JUnit output
khaos ci <agent-name> --security-threshold 80 --resilience-threshold 70

# Generate JUnit XML for test reporting
khaos ci <agent-name> --format junit --output-file results.xml

Exit codes: 0 = pass, 1 = security fail, 2 = resilience fail, 3 = both fail.

See CI/CD Integration for GitHub Actions and GitLab CI examples.

8. Cloud Sync

Sync your evaluation results to the Khaos dashboard for historical tracking and team visibility:

Terminal
# Authenticate with Khaos cloud
khaos sync --login

# Run with automatic cloud sync
khaos run my-agent --sync
Info
Cloud sync is local-first: runs are stored locally and only uploaded when you explicitly sync.

Next Steps

Now that you've run your first evaluation, explore these topics to get the most out of Khaos:

Core Concepts

Testing & Security

Integration

Reference