CLI Reference
The Khaos CLI provides commands for discovering, testing, comparing, and monitoring your AI agents. All commands use bare khaos syntax after installation.
Core Commands
| Command | Description |
|---|---|
khaos discover | Register @khaosagent handlers for running by name |
khaos run | Run agent with full evaluation (security + resilience) |
khaos playground | Interactive agent testing with live fault injection |
khaos ci | CI/CD pipeline mode with thresholds and reporting |
khaos test | Run @khaostest agent tests with JUnit/JSON/Markdown output |
khaos compare | Compare two runs to detect regressions |
khaos gate | Check if run meets quality thresholds |
khaos sync | Sync results to cloud dashboard |
khaos evals list | List available evaluations |
khaos attacks | Inspect attack categories, stats, and individual attack details |
khaos export | Export run artifacts as JSON bundle |
khaos doctor | Diagnose installation, credentials, and agent discovery issues |
khaos discover
Register agents decorated with @khaosagent so they can be run by name. This is typically run once after creating or modifying agent files.
# Discover agents in current directory
khaos discover
# Discover in a specific path
khaos discover ./agents/
# List discovered agents
khaos discover --listkhaos run
The primary command. Runs your @khaosagent handler with automatic LLM telemetry capture, security attack testing, and resilience evaluation.
# First, discover your agents (run once after changes)
khaos discover
# Then run by agent name
khaos run <agent-name>
# Run with a specific evaluation
khaos run <agent-name> --eval quickstart
khaos run <agent-name> --eval full-eval
khaos run <agent-name> --eval security
khaos run <agent-name> --eval baseline
# Run a single test from an evaluation
khaos run <agent-name> --eval full-eval --test math-reasoning
# Disable security tests
khaos run <agent-name> --no-security
# Reproducible run with fixed seed
khaos run <agent-name> --seed 12345
# Custom inputs (raw prompt OR YAML/JSON file)
khaos run <agent-name> --input "Hello, how are you?"
khaos run <agent-name> --inputs tests.yaml
# Custom inputs + eval (override eval's built-in prompts)
khaos run <agent-name> --eval quickstart --inputs prod-prompts.yamlKey Flags
| Flag | Description |
|---|---|
--eval, -e | Evaluation to run: quickstart, full-eval, security, baseline |
--test, -t | Run a specific test from the evaluation by its ID |
--security/--no-security | Enable/disable security attack tests (default: ON) |
--attacks | Custom attack payloads YAML file |
--input/--inputs, -i | Custom inputs: raw prompt string or path to YAML/JSON (list or {inputs: [...] }). With --eval, overrides the eval's built-in inputs. |
--json, -j | Output results as JSON |
--verbose, -v | Show full debug output |
--timeout | Maximum execution time in seconds (default: 120) |
--env | Environment variables (KEY=VALUE) |
--seed | Random seed for reproducibility (records in artifacts) |
--no-security to disable.@khaosagent handler. See @khaosagent.khaos attacks
Inspect the security attack catalog and run targeted audits by tier, category, and severity.
# List all attacks
khaos attacks list
# High-value AGENT tier attacks
khaos attacks list --tier agent
# Category + severity filter
khaos attacks list --category file_content_injection --severity high
# Show one attack in detail
khaos attacks show fci-readme-poison
# Catalog summaries
khaos attacks categories
khaos attacks statskhaos ci
Run evaluations in CI/CD pipelines with thresholds, structured output, and cloud sync. Supports JUnit XML, JSON, and Markdown output formats.
# Basic evaluation
khaos ci my-agent --eval quickstart
# With thresholds and output
khaos ci my-agent --eval full-eval \
--security-threshold 85 --resilience-threshold 75 \
--format junit -o results.xml --json-file results.json
# GA mode exit codes (0/1/2)
khaos ci my-agent --exit-code-mode ga --sync
# Also run @khaostest tests
khaos ci my-agent --eval quickstart --test
# Baseline comparison
khaos ci my-agent --baseline main --fail-on-regressionKey Flags
| Flag | Description |
|---|---|
--eval, -e | Evaluation pack (quickstart, full-eval, security, baseline) |
--security-threshold | Minimum security score to pass (default: 80) |
--resilience-threshold | Minimum resilience score to pass (default: 70) |
--format, -f | Output: text, json, junit, markdown, or all |
--output-file, -o | Write primary output to file |
--json-file | Write JSON sidecar file |
--sync / --no-sync | Upload results to dashboard |
--exit-code-mode | ga (0/1/2) or detailed (multi-code) |
--test | Also run @khaostest tests in the same pipeline |
--test-path | Search paths for @khaostest tests (with --test) |
--baseline, -b | Compare against named baseline |
--fail-on-regression | Fail if regression detected vs baseline |
--preflight-only | Validate credentials without running evaluation |
See CI/CD Integration for full examples with GitHub Actions, GitLab CI, and CircleCI.
khaos test
Run @khaostest-decorated Python tests with auto-discovery, fault injection, and machine-readable output formats.
# Auto-discover and run all tests
khaos test
# Run specific file
khaos test tests/test_security.py
# Filter by name or tag
khaos test -k "resilience"
# Verbose with tracebacks
khaos test -v
# CI output formats
khaos test --format junit -o results.xml
khaos test --format json -o results.json
khaos test --format markdown -o results.md
khaos test --format all -o khaos-testsKey Flags
| Flag | Description |
|---|---|
--verbose, -v | Show tracebacks for failures |
--filter, -k | Filter by name or tag pattern |
--fail-fast, -x | Stop on first failure |
--timeout | Per-test timeout in ms (default: 30000) |
--format, -f | Output: text, json, junit, markdown, or all |
--output-file, -o | Write output to file (format inferred from extension) |
--json-file | Write JSON sidecar file |
See Agent Testing for how to write @khaostest tests.
khaos compare
Compare metrics, costs, and behavior between two runs to identify regressions or improvements.
# Show recent runs (easy to copy names/IDs)
khaos compare --recent
# Compare two runs by name
khaos compare my-baseline my-candidate
# Compare two runs by ID
khaos compare run-abc123 run-def456
# Compare against stored baseline
khaos compare my-candidate --baseline
# Show cost comparison with projections
khaos compare baseline candidate --cost
# JSON output for automation
khaos compare baseline candidate --jsonOptions
| Flag | Description |
|---|---|
--recent, -r | Show recent runs in a table for easy selection |
--baseline, -b | Compare against the stored baseline run |
--cost, -c | Show cost comparison with monthly/annual projections |
--limit, -l | Number of runs to show when listing (default: 10) |
--json, -j | Output results as JSON |
khaos run <agent-name> --name my-test to name runs for easier comparison.khaos gate
Check if the last run (or a specific run) meets quality thresholds. Designed for CI/CD pipelines.
# Check last run with default thresholds (security: 80, resilience: 70)
khaos gate
# Custom thresholds
khaos gate --security 90 --resilience 85
# Check specific run
khaos gate --run run-abc123
# Use in CI/CD scripts
khaos gate --security 80 || exit 1khaos sync
Sync run results to the Khaos cloud dashboard for historical tracking and team visibility.
# Login to cloud
khaos login
# Sync all pending runs
khaos sync
# Check sync status
khaos sync --status
# Sync a specific run
khaos sync --run run-abc123
# Sync and cleanup local files
khaos sync --cleanup
# Logout
khaos logoutSee Cloud Sync for full authentication and sync details.
khaos evals list
List all available evaluations with descriptions:
khaos evals listReturns a table of evaluations with their descriptions and estimated run times. See Evaluations for details.
khaos export
Export a run's local artifacts (trace, metrics, stderr) as a single JSON bundle:
# Export to a file (recommended for CI artifacts)
khaos export <run-id> --out artifacts/<run-id>.json
# Print to stdout (pipe into jq, etc.)
khaos export <run-id>Useful for archiving run data or transferring artifacts between environments.
Environment Variables
Configure Khaos behavior via environment variables:
| Variable | Description |
|---|---|
KHAOS_STATE_DIR | Results and cache storage directory |
KHAOS_AUTO_SYNC | Set to 1 to auto-sync after runs |
KHAOS_CONTENT_MODE | Privacy control: full, summary, or redacted |
KHAOS_AGENT_INPUT | Input message passed to agent during evaluation |
KHAOS_SECURITY_ATTACK_LIMIT | Cap the number of security attacks per run (cost/time control) |
KHAOS_CACHE_DIR | Custom directory for evaluation and registry cache |
Reproducibility
Khaos supports fully reproducible runs via the --seed flag. When specified, the seed is recorded in all artifacts for provenance.
# Run with a fixed seed for reproducibility
khaos run <agent-name> --seed 12345
# The same seed produces deterministic fault scheduling
khaos run <agent-name> --seed 12345 --eval full-eval
# In CI: pin the eval and keep env/config stable
khaos ci my-agent --eval quickstartWhat Gets Recorded
When you specify a seed, it's recorded in:
- Run manifest (
manifest-*.json) - Metrics file (
metrics-*.json) - Evaluation reports for baseline comparisons
--seed in CI pipelines to ensure deterministic, reproducible results across runs.Artifacts
Runs can produce artifacts stored in ~/.khaos/runs/. Artifacts include trace and metrics data which persist when using --sync.
| Artifact | Description |
|---|---|
manifest-*.json | Run provenance (seed, config hash, versions) |
trace-*.json | Full execution trace with LLM events (pack runs use khaos.pack_trace.v1 for per-case viewing) |
metrics-*.json | Evaluation metrics and resilience report |
llm-events-*.jsonl | Raw LLM call events (JSONL format) |
Override the storage location with KHAOS_STATE_DIR:
export KHAOS_STATE_DIR=/path/to/custom/state
khaos run <agent-name>