CLI Reference

The Khaos CLI provides commands for discovering, testing, comparing, and monitoring your AI agents. All commands use bare khaos syntax after installation.

Core Commands

Command	Description
`khaos discover`	Register @khaosagent handlers for running by name
`khaos run`	Run agent with full evaluation (security + resilience)
`khaos playground`	Interactive agent testing with live fault injection
`khaos ci`	CI/CD pipeline mode with thresholds and reporting
`khaos test`	Run @khaostest agent tests with JUnit/JSON/Markdown output
`khaos compare`	Compare two runs to detect regressions
`khaos gate`	Check if run meets quality thresholds
`khaos sync`	Sync results to cloud dashboard
`khaos evals list`	List available evaluations
`khaos attacks`	Inspect attack categories, stats, and individual attack details
`khaos export`	Export run artifacts as JSON bundle
`khaos doctor`	Diagnose installation, credentials, and agent discovery issues

khaos discover

Register agents decorated with @khaosagent so they can be run by name. This is typically run once after creating or modifying agent files.

Terminal

# Discover agents in current directory
khaos discover

# Discover in a specific path
khaos discover ./agents/

# List discovered agents
khaos discover --list

khaos run

The primary command. Runs your @khaosagent handler with automatic LLM telemetry capture, security attack testing, and resilience evaluation.

Terminal

# First, discover your agents (run once after changes)
khaos discover

# Then run by agent name
khaos run <agent-name>

# Run with a specific evaluation
khaos run <agent-name> --eval quickstart
khaos run <agent-name> --eval full-eval
khaos run <agent-name> --eval security
khaos run <agent-name> --eval baseline

# Run a single test from an evaluation
khaos run <agent-name> --eval full-eval --test math-reasoning

# Disable security tests
khaos run <agent-name> --no-security

# Reproducible run with fixed seed
khaos run <agent-name> --seed 12345

# Custom inputs (raw prompt OR YAML/JSON file)
khaos run <agent-name> --input "Hello, how are you?"
khaos run <agent-name> --inputs tests.yaml

# Custom inputs + eval (override eval's built-in prompts)
khaos run <agent-name> --eval quickstart --inputs prod-prompts.yaml

Key Flags

Flag	Description
`--eval, -e`	Evaluation to run: quickstart, full-eval, security, baseline
`--test, -t`	Run a specific test from the evaluation by its ID
`--security/--no-security`	Enable/disable security attack tests (default: ON)
`--attacks`	Custom attack payloads YAML file
`--input/--inputs, -i`	Custom inputs: raw prompt string or path to YAML/JSON (list or {inputs: [...] }). With `--eval`, overrides the eval's built-in inputs.
`--json, -j`	Output results as JSON
`--verbose, -v`	Show full debug output
`--timeout`	Maximum execution time in seconds (default: 120)
`--env`	Environment variables (KEY=VALUE)
`--seed`	Random seed for reproducibility (records in artifacts)

Security is ON by default

Khaos runs security attack tests automatically. Use --no-security to disable.

@khaosagent required

If you run a script directly, make sure it defines at least one @khaosagent handler. See @khaosagent.

khaos attacks

Inspect the security attack catalog and run targeted audits by tier, category, and severity.

Terminal

# List all attacks
khaos attacks list

# High-value AGENT tier attacks
khaos attacks list --tier agent

# Category + severity filter
khaos attacks list --category file_content_injection --severity high

# Show one attack in detail
khaos attacks show fci-readme-poison

# Catalog summaries
khaos attacks categories
khaos attacks stats

khaos ci

Run evaluations in CI/CD pipelines with thresholds, structured output, and cloud sync. Supports JUnit XML, JSON, and Markdown output formats.

Terminal

# Basic evaluation
khaos ci my-agent --eval quickstart

# With thresholds and output
khaos ci my-agent --eval full-eval \
  --security-threshold 85 --resilience-threshold 75 \
  --format junit -o results.xml --json-file results.json

# GA mode exit codes (0/1/2)
khaos ci my-agent --exit-code-mode ga --sync

# Also run @khaostest tests
khaos ci my-agent --eval quickstart --test

# Baseline comparison
khaos ci my-agent --baseline main --fail-on-regression

Key Flags

Flag	Description
`--eval, -e`	Evaluation pack (quickstart, full-eval, security, baseline)
`--security-threshold`	Minimum security score to pass (default: 80)
`--resilience-threshold`	Minimum resilience score to pass (default: 70)
`--format, -f`	Output: text, json, junit, markdown, or all
`--output-file, -o`	Write primary output to file
`--json-file`	Write JSON sidecar file
`--sync / --no-sync`	Upload results to dashboard
`--exit-code-mode`	`ga` (0/1/2) or `detailed` (multi-code)
`--test`	Also run @khaostest tests in the same pipeline
`--test-path`	Search paths for @khaostest tests (with --test)
`--baseline, -b`	Compare against named baseline
`--fail-on-regression`	Fail if regression detected vs baseline
`--preflight-only`	Validate credentials without running evaluation

See CI/CD Integration for full examples with GitHub Actions, GitLab CI, and CircleCI.

khaos test

Run @khaostest-decorated Python tests with auto-discovery, fault injection, and machine-readable output formats.

Terminal

# Auto-discover and run all tests
khaos test

# Run specific file
khaos test tests/test_security.py

# Filter by name or tag
khaos test -k "resilience"

# Verbose with tracebacks
khaos test -v

# CI output formats
khaos test --format junit -o results.xml
khaos test --format json -o results.json
khaos test --format markdown -o results.md
khaos test --format all -o khaos-tests

Key Flags

Flag	Description
`--verbose, -v`	Show tracebacks for failures
`--filter, -k`	Filter by name or tag pattern
`--fail-fast, -x`	Stop on first failure
`--timeout`	Per-test timeout in ms (default: 30000)
`--format, -f`	Output: text, json, junit, markdown, or all
`--output-file, -o`	Write output to file (format inferred from extension)
`--json-file`	Write JSON sidecar file

See Agent Testing for how to write @khaostest tests.

khaos compare

Compare metrics, costs, and behavior between two runs to identify regressions or improvements.

Terminal

# Show recent runs (easy to copy names/IDs)
khaos compare --recent

# Compare two runs by name
khaos compare my-baseline my-candidate

# Compare two runs by ID
khaos compare run-abc123 run-def456

# Compare against stored baseline
khaos compare my-candidate --baseline

# Show cost comparison with projections
khaos compare baseline candidate --cost

# JSON output for automation
khaos compare baseline candidate --json

Options

Flag	Description
`--recent, -r`	Show recent runs in a table for easy selection
`--baseline, -b`	Compare against the stored baseline run
`--cost, -c`	Show cost comparison with monthly/annual projections
`--limit, -l`	Number of runs to show when listing (default: 10)
`--json, -j`	Output results as JSON

Name your runs

Use khaos run <agent-name> --name my-test to name runs for easier comparison.

khaos gate

Check if the last run (or a specific run) meets quality thresholds. Designed for CI/CD pipelines.

Terminal

# Check last run with default thresholds (security: 80, resilience: 70)
khaos gate

# Custom thresholds
khaos gate --security 90 --resilience 85

# Check specific run
khaos gate --run run-abc123

# Use in CI/CD scripts
khaos gate --security 80 || exit 1

Exit codes for automation

Exit code 1 = security failed, 2 = resilience failed, 3 = both failed. Use in CI scripts.

khaos sync

Sync run results to the Khaos cloud dashboard for historical tracking and team visibility.

Terminal

# Login to cloud
khaos login

# Sync all pending runs
khaos sync

# Check sync status
khaos sync --status

# Sync a specific run
khaos sync --run run-abc123

# Sync and cleanup local files
khaos sync --cleanup

# Logout
khaos logout

See Cloud Sync for full authentication and sync details.

khaos evals list

List all available evaluations with descriptions:

Terminal

khaos evals list

Returns a table of evaluations with their descriptions and estimated run times. See Evaluations for details.

khaos export

Export a run's local artifacts (trace, metrics, stderr) as a single JSON bundle:

Terminal

# Export to a file (recommended for CI artifacts)
khaos export <run-id> --out artifacts/<run-id>.json

# Print to stdout (pipe into jq, etc.)
khaos export <run-id>

Useful for archiving run data or transferring artifacts between environments.

Environment Variables

Configure Khaos behavior via environment variables:

Variable	Description
`KHAOS_STATE_DIR`	Results and cache storage directory
`KHAOS_AUTO_SYNC`	Set to 1 to auto-sync after runs
`KHAOS_CONTENT_MODE`	Privacy control: full, summary, or redacted
`KHAOS_AGENT_INPUT`	Input message passed to agent during evaluation
`KHAOS_SECURITY_ATTACK_LIMIT`	Cap the number of security attacks per run (cost/time control)
`KHAOS_CACHE_DIR`	Custom directory for evaluation and registry cache

Reproducibility

Khaos supports fully reproducible runs via the --seed flag. When specified, the seed is recorded in all artifacts for provenance.

Terminal

# Run with a fixed seed for reproducibility
khaos run <agent-name> --seed 12345

# The same seed produces deterministic fault scheduling
khaos run <agent-name> --seed 12345 --eval full-eval

# In CI: pin the eval and keep env/config stable
khaos ci my-agent --eval quickstart

What Gets Recorded

When you specify a seed, it's recorded in:

Run manifest (manifest-*.json)
Metrics file (metrics-*.json)
Evaluation reports for baseline comparisons

CI/CD Best Practice

Always use --seed in CI pipelines to ensure deterministic, reproducible results across runs.

Artifacts

Runs can produce artifacts stored in ~/.khaos/runs/. Artifacts include trace and metrics data which persist when using --sync.

Artifact	Description
`manifest-*.json`	Run provenance (seed, config hash, versions)
`trace-*.json`	Full execution trace with LLM events (pack runs use `khaos.pack_trace.v1` for per-case viewing)
`metrics-*.json`	Evaluation metrics and resilience report
`llm-events-*.jsonl`	Raw LLM call events (JSONL format)

Override the storage location with KHAOS_STATE_DIR:

Terminal

export KHAOS_STATE_DIR=/path/to/custom/state
khaos run <agent-name>

Configuration

Evaluations