CLI Reference

The Khaos CLI provides commands for discovering, testing, comparing, and monitoring your AI agents. All commands use bare khaos syntax after installation.

Core Commands

CommandDescription
khaos discoverRegister @khaosagent handlers for running by name
khaos runRun agent with full evaluation (security + resilience)
khaos playgroundInteractive agent testing with live fault injection
khaos ciCI/CD pipeline mode with thresholds and reporting
khaos testRun @khaostest agent tests with JUnit/JSON/Markdown output
khaos compareCompare two runs to detect regressions
khaos gateCheck if run meets quality thresholds
khaos syncSync results to cloud dashboard
khaos evals listList available evaluations
khaos attacksInspect attack categories, stats, and individual attack details
khaos exportExport run artifacts as JSON bundle
khaos doctorDiagnose installation, credentials, and agent discovery issues

khaos discover

Register agents decorated with @khaosagent so they can be run by name. This is typically run once after creating or modifying agent files.

Terminal
# Discover agents in current directory
khaos discover

# Discover in a specific path
khaos discover ./agents/

# List discovered agents
khaos discover --list

khaos run

The primary command. Runs your @khaosagent handler with automatic LLM telemetry capture, security attack testing, and resilience evaluation.

Terminal
# First, discover your agents (run once after changes)
khaos discover

# Then run by agent name
khaos run <agent-name>

# Run with a specific evaluation
khaos run <agent-name> --eval quickstart
khaos run <agent-name> --eval full-eval
khaos run <agent-name> --eval security
khaos run <agent-name> --eval baseline

# Run a single test from an evaluation
khaos run <agent-name> --eval full-eval --test math-reasoning

# Disable security tests
khaos run <agent-name> --no-security

# Reproducible run with fixed seed
khaos run <agent-name> --seed 12345

# Custom inputs (raw prompt OR YAML/JSON file)
khaos run <agent-name> --input "Hello, how are you?"
khaos run <agent-name> --inputs tests.yaml

# Custom inputs + eval (override eval's built-in prompts)
khaos run <agent-name> --eval quickstart --inputs prod-prompts.yaml

Key Flags

FlagDescription
--eval, -eEvaluation to run: quickstart, full-eval, security, baseline
--test, -tRun a specific test from the evaluation by its ID
--security/--no-securityEnable/disable security attack tests (default: ON)
--attacksCustom attack payloads YAML file
--input/--inputs, -iCustom inputs: raw prompt string or path to YAML/JSON (list or {inputs: [...] }). With --eval, overrides the eval's built-in inputs.
--json, -jOutput results as JSON
--verbose, -vShow full debug output
--timeoutMaximum execution time in seconds (default: 120)
--envEnvironment variables (KEY=VALUE)
--seedRandom seed for reproducibility (records in artifacts)
Security is ON by default
Khaos runs security attack tests automatically. Use --no-security to disable.
@khaosagent required
If you run a script directly, make sure it defines at least one @khaosagent handler. See @khaosagent.

khaos attacks

Inspect the security attack catalog and run targeted audits by tier, category, and severity.

Terminal
# List all attacks
khaos attacks list

# High-value AGENT tier attacks
khaos attacks list --tier agent

# Category + severity filter
khaos attacks list --category file_content_injection --severity high

# Show one attack in detail
khaos attacks show fci-readme-poison

# Catalog summaries
khaos attacks categories
khaos attacks stats

khaos ci

Run evaluations in CI/CD pipelines with thresholds, structured output, and cloud sync. Supports JUnit XML, JSON, and Markdown output formats.

Terminal
# Basic evaluation
khaos ci my-agent --eval quickstart

# With thresholds and output
khaos ci my-agent --eval full-eval \
  --security-threshold 85 --resilience-threshold 75 \
  --format junit -o results.xml --json-file results.json

# GA mode exit codes (0/1/2)
khaos ci my-agent --exit-code-mode ga --sync

# Also run @khaostest tests
khaos ci my-agent --eval quickstart --test

# Baseline comparison
khaos ci my-agent --baseline main --fail-on-regression

Key Flags

FlagDescription
--eval, -eEvaluation pack (quickstart, full-eval, security, baseline)
--security-thresholdMinimum security score to pass (default: 80)
--resilience-thresholdMinimum resilience score to pass (default: 70)
--format, -fOutput: text, json, junit, markdown, or all
--output-file, -oWrite primary output to file
--json-fileWrite JSON sidecar file
--sync / --no-syncUpload results to dashboard
--exit-code-modega (0/1/2) or detailed (multi-code)
--testAlso run @khaostest tests in the same pipeline
--test-pathSearch paths for @khaostest tests (with --test)
--baseline, -bCompare against named baseline
--fail-on-regressionFail if regression detected vs baseline
--preflight-onlyValidate credentials without running evaluation

See CI/CD Integration for full examples with GitHub Actions, GitLab CI, and CircleCI.

khaos test

Run @khaostest-decorated Python tests with auto-discovery, fault injection, and machine-readable output formats.

Terminal
# Auto-discover and run all tests
khaos test

# Run specific file
khaos test tests/test_security.py

# Filter by name or tag
khaos test -k "resilience"

# Verbose with tracebacks
khaos test -v

# CI output formats
khaos test --format junit -o results.xml
khaos test --format json -o results.json
khaos test --format markdown -o results.md
khaos test --format all -o khaos-tests

Key Flags

FlagDescription
--verbose, -vShow tracebacks for failures
--filter, -kFilter by name or tag pattern
--fail-fast, -xStop on first failure
--timeoutPer-test timeout in ms (default: 30000)
--format, -fOutput: text, json, junit, markdown, or all
--output-file, -oWrite output to file (format inferred from extension)
--json-fileWrite JSON sidecar file

See Agent Testing for how to write @khaostest tests.

khaos compare

Compare metrics, costs, and behavior between two runs to identify regressions or improvements.

Terminal
# Show recent runs (easy to copy names/IDs)
khaos compare --recent

# Compare two runs by name
khaos compare my-baseline my-candidate

# Compare two runs by ID
khaos compare run-abc123 run-def456

# Compare against stored baseline
khaos compare my-candidate --baseline

# Show cost comparison with projections
khaos compare baseline candidate --cost

# JSON output for automation
khaos compare baseline candidate --json

Options

FlagDescription
--recent, -rShow recent runs in a table for easy selection
--baseline, -bCompare against the stored baseline run
--cost, -cShow cost comparison with monthly/annual projections
--limit, -lNumber of runs to show when listing (default: 10)
--json, -jOutput results as JSON
Name your runs
Use khaos run <agent-name> --name my-test to name runs for easier comparison.

khaos gate

Check if the last run (or a specific run) meets quality thresholds. Designed for CI/CD pipelines.

Terminal
# Check last run with default thresholds (security: 80, resilience: 70)
khaos gate

# Custom thresholds
khaos gate --security 90 --resilience 85

# Check specific run
khaos gate --run run-abc123

# Use in CI/CD scripts
khaos gate --security 80 || exit 1
Exit codes for automation
Exit code 1 = security failed, 2 = resilience failed, 3 = both failed. Use in CI scripts.

khaos sync

Sync run results to the Khaos cloud dashboard for historical tracking and team visibility.

Terminal
# Login to cloud
khaos login

# Sync all pending runs
khaos sync

# Check sync status
khaos sync --status

# Sync a specific run
khaos sync --run run-abc123

# Sync and cleanup local files
khaos sync --cleanup

# Logout
khaos logout

See Cloud Sync for full authentication and sync details.

khaos evals list

List all available evaluations with descriptions:

Terminal
khaos evals list

Returns a table of evaluations with their descriptions and estimated run times. See Evaluations for details.

khaos export

Export a run's local artifacts (trace, metrics, stderr) as a single JSON bundle:

Terminal
# Export to a file (recommended for CI artifacts)
khaos export <run-id> --out artifacts/<run-id>.json

# Print to stdout (pipe into jq, etc.)
khaos export <run-id>

Useful for archiving run data or transferring artifacts between environments.

Environment Variables

Configure Khaos behavior via environment variables:

VariableDescription
KHAOS_STATE_DIRResults and cache storage directory
KHAOS_AUTO_SYNCSet to 1 to auto-sync after runs
KHAOS_CONTENT_MODEPrivacy control: full, summary, or redacted
KHAOS_AGENT_INPUTInput message passed to agent during evaluation
KHAOS_SECURITY_ATTACK_LIMITCap the number of security attacks per run (cost/time control)
KHAOS_CACHE_DIRCustom directory for evaluation and registry cache

Reproducibility

Khaos supports fully reproducible runs via the --seed flag. When specified, the seed is recorded in all artifacts for provenance.

Terminal
# Run with a fixed seed for reproducibility
khaos run <agent-name> --seed 12345

# The same seed produces deterministic fault scheduling
khaos run <agent-name> --seed 12345 --eval full-eval

# In CI: pin the eval and keep env/config stable
khaos ci my-agent --eval quickstart

What Gets Recorded

When you specify a seed, it's recorded in:

  • Run manifest (manifest-*.json)
  • Metrics file (metrics-*.json)
  • Evaluation reports for baseline comparisons
CI/CD Best Practice
Always use --seed in CI pipelines to ensure deterministic, reproducible results across runs.

Artifacts

Runs can produce artifacts stored in ~/.khaos/runs/. Artifacts include trace and metrics data which persist when using --sync.

ArtifactDescription
manifest-*.jsonRun provenance (seed, config hash, versions)
trace-*.jsonFull execution trace with LLM events (pack runs use khaos.pack_trace.v1 for per-case viewing)
metrics-*.jsonEvaluation metrics and resilience report
llm-events-*.jsonlRaw LLM call events (JSONL format)

Override the storage location with KHAOS_STATE_DIR:

Terminal
export KHAOS_STATE_DIR=/path/to/custom/state
khaos run <agent-name>