CLI Reference

The Khaos CLI provides commands for discovering, testing, comparing, and monitoring your AI agents. All commands use bare khaos syntax after installation.

Core Commands

CommandDescription
khaos discoverRegister @khaosagent handlers for running by name
khaos runRun agent with full evaluation (security + resilience)
khaos ciCI/CD pipeline mode with thresholds and reporting
khaos compareCompare two runs to detect regressions
khaos gateCheck if run meets quality thresholds
khaos syncSync results to cloud dashboard
khaos evals listList available evaluations
khaos exportExport run artifacts as JSON bundle

khaos discover

Register agents decorated with @khaosagent so they can be run by name. This is typically run once after creating or modifying agent files.

Terminal
# Discover agents in current directory
khaos discover

# Discover in a specific path
khaos discover ./agents/

# List discovered agents
khaos agents list

khaos run

The primary command. Runs your @khaosagent handler with automatic LLM telemetry capture, security attack testing, and resilience evaluation.

Terminal
# First, discover your agents (run once after changes)
khaos discover

# Then run by agent name
khaos run <agent-name>

# Run with a specific evaluation
khaos run <agent-name> --eval quickstart
khaos run <agent-name> --eval full-eval
khaos run <agent-name> --eval security
khaos run <agent-name> --eval baseline

# Run a single test from an evaluation
khaos run <agent-name> --eval full-eval --test math-reasoning

# Disable security tests
khaos run <agent-name> --no-security

# Reproducible run with fixed seed
khaos run <agent-name> --seed 12345

# Custom inputs (raw prompt OR YAML/JSON file)
khaos run <agent-name> --input "Hello, how are you?"
khaos run <agent-name> --inputs tests.yaml

# Custom inputs + eval (override eval's built-in prompts)
khaos run <agent-name> --eval quickstart --inputs prod-prompts.yaml

Key Flags

FlagDescription
--eval, -eEvaluation to run: quickstart, full-eval, security, baseline
--test, -tRun a specific test from the evaluation by its ID
--security/--no-securityEnable/disable security attack tests (default: ON)
--attacksCustom attack payloads YAML file
--input/--inputs, -iCustom inputs: raw prompt string or path to YAML/JSON (list or {inputs: [...] }). With --eval, overrides the eval's built-in inputs.
--json, -jOutput results as JSON
--verbose, -vShow full debug output
--timeout, -tMaximum execution time in seconds (default: 120)
--env, -eEnvironment variables (KEY=VALUE)
--seedRandom seed for reproducibility (records in artifacts)
Security is ON by default
Khaos runs security attack tests automatically. Use --no-security to disable.
@khaosagent required
If you run a script directly, make sure it defines at least one @khaosagent handler. See @khaosagent.

khaos ci

Single command for CI/CD pipelines. Combines run, gate checks, and reporting with clear exit codes for automation.

Terminal
# Basic CI run with default thresholds
khaos ci <agent-name>

# Custom thresholds
khaos ci <agent-name> --security-threshold 90 --resilience-threshold 85

# Generate JUnit XML for test reporting
khaos ci <agent-name> --format junit --output-file results.xml

# Compare against baseline and fail on regression
khaos ci <agent-name> --baseline main --fail-on-regression

# Save this run as a named baseline
khaos ci <agent-name> --save-baseline main

# Full evaluation with all output formats
khaos ci <agent-name> --eval full-eval --format all --output-file results

Exit Codes

CodeMeaning
0All gates passed
1Security threshold not met
2Resilience threshold not met
3Both thresholds failed
4Baseline tests failed
5Regression detected vs baseline

See CI/CD Integration for GitHub Actions and GitLab CI examples.

khaos compare

Compare metrics, costs, and behavior between two runs to identify regressions or improvements.

Terminal
# Show recent runs (easy to copy names/IDs)
khaos compare --recent

# Compare two runs by name
khaos compare my-baseline my-candidate

# Compare two runs by ID
khaos compare run-abc123 run-def456

# Compare against stored baseline
khaos compare my-candidate --baseline

# Show cost comparison with projections
khaos compare baseline candidate --cost

# JSON output for automation
khaos compare baseline candidate --json

Options

FlagDescription
--recent, -rShow recent runs in a table for easy selection
--baseline, -bCompare against the stored baseline run
--cost, -cShow cost comparison with monthly/annual projections
--limit, -lNumber of runs to show when listing (default: 10)
--json, -jOutput results as JSON
Name your runs
Use khaos run agent.py --name my-test to name runs for easier comparison.

khaos gate

Check if the last run (or a specific run) meets quality thresholds. Designed for CI/CD pipelines.

Terminal
# Check last run with default thresholds (security: 80, resilience: 70)
khaos gate

# Custom thresholds
khaos gate --security 90 --resilience 85

# Check specific run
khaos gate --run run-abc123

# Use in CI/CD scripts
khaos gate --security 80 || exit 1
Exit codes for automation
Exit code 1 = security failed, 2 = resilience failed, 3 = both failed. Use in CI scripts.

khaos sync

Sync run results to the Khaos cloud dashboard for historical tracking and team visibility.

Terminal
# Login to cloud
khaos sync --login

# Sync all pending runs
khaos sync

# Check sync status
khaos sync --status

# Sync a specific run
khaos sync --run run-abc123

# Sync and cleanup local files
khaos sync --cleanup

# Logout
khaos sync --logout

See Cloud Sync for full authentication and sync details.

khaos evals list

List all available evaluations with descriptions:

Terminal
khaos evals list

Returns a table of evaluations with their descriptions and estimated run times. See Evaluations for details.

khaos export

Export a run's local artifacts (trace, metrics, stderr) as a single JSON bundle:

Terminal
# Export to a file (recommended for CI artifacts)
khaos export <run-id> --out artifacts/<run-id>.json

# Print to stdout (pipe into jq, etc.)
khaos export <run-id>

Useful for archiving run data or transferring artifacts between environments.

Environment Variables

Configure Khaos behavior via environment variables:

VariableDescription
KHAOS_STATE_DIRResults and cache storage directory
KHAOS_AUTO_SYNCSet to 1 to auto-sync after runs
KHAOS_CONTENT_MODEPrivacy control: full, summary, or redacted
KHAOS_AGENT_INPUTInput message passed to agent during evaluation
KHAOS_SECURITY_ATTACK_LIMITCap the number of security attacks per run (cost/time control)
KHAOS_CACHE_DIRCustom directory for evaluation and registry cache

Reproducibility

Khaos supports fully reproducible runs via the --seed flag. When specified, the seed is recorded in all artifacts for provenance.

Terminal
# Run with a fixed seed for reproducibility
khaos run <agent-name> --seed 12345

# The same seed produces deterministic fault scheduling
khaos run <agent-name> --seed 12345 --eval full-eval

# In CI: use consistent seeds across environments
khaos ci <agent-name> --seed 42 --eval quickstart

What Gets Recorded

When you specify a seed, it's recorded in:

  • Run manifest (manifest-*.json)
  • Metrics file (metrics-*.json)
  • Evaluation reports for baseline comparisons
CI/CD Best Practice
Always use --seed in CI pipelines to ensure deterministic, reproducible results across runs.

Artifacts

Runs can produce artifacts stored in ~/.khaos/runs/. Artifacts include trace and metrics data which persist when using --sync.

ArtifactDescription
manifest-*.jsonRun provenance (seed, config hash, versions)
trace-*.jsonFull execution trace with LLM events (pack runs use khaos.pack_trace.v1 for per-case viewing)
metrics-*.jsonEvaluation metrics and resilience report
llm-events-*.jsonlRaw LLM call events (JSONL format)

Override the storage location with KHAOS_STATE_DIR:

Terminal
export KHAOS_STATE_DIR=/path/to/custom/state
khaos run <agent-name>