CLI Reference
The Khaos CLI provides commands for discovering, testing, comparing, and monitoring your AI agents. All commands use bare khaos syntax after installation.
Core Commands
| Command | Description |
|---|---|
khaos discover | Register @khaosagent handlers for running by name |
khaos run | Run agent with full evaluation (security + resilience) |
khaos ci | CI/CD pipeline mode with thresholds and reporting |
khaos compare | Compare two runs to detect regressions |
khaos gate | Check if run meets quality thresholds |
khaos sync | Sync results to cloud dashboard |
khaos evals list | List available evaluations |
khaos export | Export run artifacts as JSON bundle |
khaos discover
Register agents decorated with @khaosagent so they can be run by name. This is typically run once after creating or modifying agent files.
# Discover agents in current directory
khaos discover
# Discover in a specific path
khaos discover ./agents/
# List discovered agents
khaos agents listkhaos run
The primary command. Runs your @khaosagent handler with automatic LLM telemetry capture, security attack testing, and resilience evaluation.
# First, discover your agents (run once after changes)
khaos discover
# Then run by agent name
khaos run <agent-name>
# Run with a specific evaluation
khaos run <agent-name> --eval quickstart
khaos run <agent-name> --eval full-eval
khaos run <agent-name> --eval security
khaos run <agent-name> --eval baseline
# Run a single test from an evaluation
khaos run <agent-name> --eval full-eval --test math-reasoning
# Disable security tests
khaos run <agent-name> --no-security
# Reproducible run with fixed seed
khaos run <agent-name> --seed 12345
# Custom inputs (raw prompt OR YAML/JSON file)
khaos run <agent-name> --input "Hello, how are you?"
khaos run <agent-name> --inputs tests.yaml
# Custom inputs + eval (override eval's built-in prompts)
khaos run <agent-name> --eval quickstart --inputs prod-prompts.yamlKey Flags
| Flag | Description |
|---|---|
--eval, -e | Evaluation to run: quickstart, full-eval, security, baseline |
--test, -t | Run a specific test from the evaluation by its ID |
--security/--no-security | Enable/disable security attack tests (default: ON) |
--attacks | Custom attack payloads YAML file |
--input/--inputs, -i | Custom inputs: raw prompt string or path to YAML/JSON (list or {inputs: [...] }). With --eval, overrides the eval's built-in inputs. |
--json, -j | Output results as JSON |
--verbose, -v | Show full debug output |
--timeout, -t | Maximum execution time in seconds (default: 120) |
--env, -e | Environment variables (KEY=VALUE) |
--seed | Random seed for reproducibility (records in artifacts) |
--no-security to disable.@khaosagent handler. See @khaosagent.khaos ci
Single command for CI/CD pipelines. Combines run, gate checks, and reporting with clear exit codes for automation.
# Basic CI run with default thresholds
khaos ci <agent-name>
# Custom thresholds
khaos ci <agent-name> --security-threshold 90 --resilience-threshold 85
# Generate JUnit XML for test reporting
khaos ci <agent-name> --format junit --output-file results.xml
# Compare against baseline and fail on regression
khaos ci <agent-name> --baseline main --fail-on-regression
# Save this run as a named baseline
khaos ci <agent-name> --save-baseline main
# Full evaluation with all output formats
khaos ci <agent-name> --eval full-eval --format all --output-file resultsExit Codes
| Code | Meaning |
|---|---|
0 | All gates passed |
1 | Security threshold not met |
2 | Resilience threshold not met |
3 | Both thresholds failed |
4 | Baseline tests failed |
5 | Regression detected vs baseline |
See CI/CD Integration for GitHub Actions and GitLab CI examples.
khaos compare
Compare metrics, costs, and behavior between two runs to identify regressions or improvements.
# Show recent runs (easy to copy names/IDs)
khaos compare --recent
# Compare two runs by name
khaos compare my-baseline my-candidate
# Compare two runs by ID
khaos compare run-abc123 run-def456
# Compare against stored baseline
khaos compare my-candidate --baseline
# Show cost comparison with projections
khaos compare baseline candidate --cost
# JSON output for automation
khaos compare baseline candidate --jsonOptions
| Flag | Description |
|---|---|
--recent, -r | Show recent runs in a table for easy selection |
--baseline, -b | Compare against the stored baseline run |
--cost, -c | Show cost comparison with monthly/annual projections |
--limit, -l | Number of runs to show when listing (default: 10) |
--json, -j | Output results as JSON |
khaos run agent.py --name my-test to name runs for easier comparison.khaos gate
Check if the last run (or a specific run) meets quality thresholds. Designed for CI/CD pipelines.
# Check last run with default thresholds (security: 80, resilience: 70)
khaos gate
# Custom thresholds
khaos gate --security 90 --resilience 85
# Check specific run
khaos gate --run run-abc123
# Use in CI/CD scripts
khaos gate --security 80 || exit 1khaos sync
Sync run results to the Khaos cloud dashboard for historical tracking and team visibility.
# Login to cloud
khaos sync --login
# Sync all pending runs
khaos sync
# Check sync status
khaos sync --status
# Sync a specific run
khaos sync --run run-abc123
# Sync and cleanup local files
khaos sync --cleanup
# Logout
khaos sync --logoutSee Cloud Sync for full authentication and sync details.
khaos evals list
List all available evaluations with descriptions:
khaos evals listReturns a table of evaluations with their descriptions and estimated run times. See Evaluations for details.
khaos export
Export a run's local artifacts (trace, metrics, stderr) as a single JSON bundle:
# Export to a file (recommended for CI artifacts)
khaos export <run-id> --out artifacts/<run-id>.json
# Print to stdout (pipe into jq, etc.)
khaos export <run-id>Useful for archiving run data or transferring artifacts between environments.
Environment Variables
Configure Khaos behavior via environment variables:
| Variable | Description |
|---|---|
KHAOS_STATE_DIR | Results and cache storage directory |
KHAOS_AUTO_SYNC | Set to 1 to auto-sync after runs |
KHAOS_CONTENT_MODE | Privacy control: full, summary, or redacted |
KHAOS_AGENT_INPUT | Input message passed to agent during evaluation |
KHAOS_SECURITY_ATTACK_LIMIT | Cap the number of security attacks per run (cost/time control) |
KHAOS_CACHE_DIR | Custom directory for evaluation and registry cache |
Reproducibility
Khaos supports fully reproducible runs via the --seed flag. When specified, the seed is recorded in all artifacts for provenance.
# Run with a fixed seed for reproducibility
khaos run <agent-name> --seed 12345
# The same seed produces deterministic fault scheduling
khaos run <agent-name> --seed 12345 --eval full-eval
# In CI: use consistent seeds across environments
khaos ci <agent-name> --seed 42 --eval quickstartWhat Gets Recorded
When you specify a seed, it's recorded in:
- Run manifest (
manifest-*.json) - Metrics file (
metrics-*.json) - Evaluation reports for baseline comparisons
--seed in CI pipelines to ensure deterministic, reproducible results across runs.Artifacts
Runs can produce artifacts stored in ~/.khaos/runs/. Artifacts include trace and metrics data which persist when using --sync.
| Artifact | Description |
|---|---|
manifest-*.json | Run provenance (seed, config hash, versions) |
trace-*.json | Full execution trace with LLM events (pack runs use khaos.pack_trace.v1 for per-case viewing) |
metrics-*.json | Evaluation metrics and resilience report |
llm-events-*.jsonl | Raw LLM call events (JSONL format) |
Override the storage location with KHAOS_STATE_DIR:
export KHAOS_STATE_DIR=/path/to/custom/state
khaos run <agent-name>