Quickstart
Get started with Khaos in under 5 minutes. Test your AI agent for security vulnerabilities, resilience issues, and behavioral regressions with a single @khaosagent decorator.
What is Khaos?
Khaos is a multi-dimensional evaluation platform for AI agents. It answers the question: "What did my agent change actually do?"
With a single command, Khaos evaluates your agent across four dimensions:
- Structural - Cost, latency, token usage, tool patterns
- Resilience - How your agent handles failures (chaos engineering)
- Security - Vulnerability to prompt injection and data leakage
- Functional - Output quality comparison between versions
Prerequisites
- Python 3.9+ - Khaos requires Python 3.9 or higher
- pip or uv - Package manager for installation
- LLM API Key - API key for your LLM provider (OpenAI, Anthropic, etc.)
1. Install Khaos
pip install khaos
# Or with uv (recommended for faster installs)
uv pip install khaosVerify the installation:
khaos --version2. Add the @khaosagent Decorator
Khaos runs your agent through a decorated handler (no protocol plumbing required).
from khaos import khaosagent
@khaosagent(name="my-agent", version="1.0.0")
def handle(message):
prompt = (message.get("payload") or {}).get("text", "")
# Call your framework/LLM here and return {"text": "..."}.
return {"text": f"Hello! You said: {prompt}"}This is the only required integration step. Your agent logic stays the same; Khaos handles runtime instrumentation.
3. Discover Your Agent
Register your decorated agent(s) so you can run by name:
khaos discover@khaosagent(name=...) and run by name.4. Run an Evaluation
Run by agent name (recommended):
khaos run my-agent --eval quickstart --syncThe quickstart pack includes baseline + resilience + security.
5. Choose an Evaluation
Khaos provides 4 built-in evaluations for different use cases:
# Quick baseline observation (~1 min)
khaos run <agent-name> --eval baseline
# Default: balanced evaluation (~2 min)
khaos run <agent-name> --eval quickstart
# Comprehensive evaluation (~10-15 min)
khaos run <agent-name> --eval full-eval
# Security-focused testing (~5-8 min)
khaos run <agent-name> --eval securitySee Evaluations for details on what each eval tests.
6. Understand Your Results
Khaos provides beautiful real-time progress during evaluation:
Running eval: quickstart v1.0
⠹ Baseline 4/6 (67%)
✓ math_addition 1450ms
✓ instruction_follow 890ms
✓ knowledge_capital 1200ms
✓ text_uppercase 650ms
Resilience waiting...
Security waiting...After completion, you get clear pass/fail results:
✓ Baseline: 6/6 passed
✓ Resilience: 5/6 passed
! Security: 43/50 defendedWhen issues are found, Khaos provides actionable explanations:
What Failed
Security Vulnerabilities:
🟡 MEDIUM Prompt Injection (3 instances)
Attack Types Agent is Vulnerable To:
• Prompt Injection
→ Attacker can inject malicious instructions via user input
Recommended Actions:
1. Review Security Findings
→ 3 potential vulnerabilities found
→ Consider adding guardrails for sensitive operations--no-security to skip security testing if needed.7. Add to CI/CD
Khaos integrates directly into your CI/CD pipeline with threshold-based gating:
# CI mode: run + threshold check + JUnit output
khaos ci <agent-name> --security-threshold 80 --resilience-threshold 70
# Generate JUnit XML for test reporting
khaos ci <agent-name> --format junit --output-file results.xmlExit codes: 0 = pass, 1 = security fail, 2 = resilience fail, 3 = both fail.
See CI/CD Integration for GitHub Actions and GitLab CI examples.
8. Cloud Sync
Sync your evaluation results to the Khaos dashboard for historical tracking and team visibility:
# Authenticate with Khaos cloud
khaos sync --login
# Run with automatic cloud sync
khaos run my-agent --syncNext Steps
Now that you've run your first evaluation, explore these topics to get the most out of Khaos:
Core Concepts
- @khaosagent Decorator - All decorator options, async handlers, and multi-agent setups
- Evaluations - Choose the right evaluation for your needs
- Metrics - Understanding scores and comparing runs
Testing & Security
- Security Testing - OWASP-aligned vulnerability detection
Integration
- Framework Support - LangChain, CrewAI, OpenAI, Anthropic, and more
- CI/CD Integration - GitHub Actions and GitLab CI setup
- Cloud Sync - Team collaboration and historical tracking
Reference
- CLI Reference - Complete command documentation
- Troubleshooting - Common issues and solutions