CI/CD Integration
Integrate Khaos into your CI/CD pipeline to catch security vulnerabilities and resilience issues before they reach production. Khaos provides two complementary CI commands:
khaos ci— Run evaluation packs with thresholds, reporting, and cloud synckhaos test— Run@khaostest-decorated Python tests with JUnit/JSON/Markdown output
Both work in any CI environment — no custom actions or templates required.
Quick Start
Three environment variables and one command are all you need for khaos ci:
# Set credentials (required for khaos ci with --sync)
export KHAOS_API_TOKEN=your-project-token
export KHAOS_PROJECT_SLUG=owner/project
export KHAOS_API_URL=https://api.khaos.exordex.com
# Install and run evaluation
python3 -m pip install khaos-agent
khaos ci my-agent --eval quickstart --sync
# Or run @khaostest tests (no credentials required)
khaos test --format junit -o results.xmlingest:write, runs:read).Environment Variables
| Variable | Required | Description |
|---|---|---|
KHAOS_API_TOKEN | khaos ci --sync | Project-scoped API token for authentication |
KHAOS_PROJECT_SLUG | khaos ci --sync | Project identifier (e.g. myteam/my-agent) |
KHAOS_API_URL | khaos ci --sync | API endpoint (https://api.khaos.exordex.com) |
KHAOS_STATE_DIR | No | Local artifact storage (default: ~/.khaos). Set to workspace dir in CI for easy artifact collection. |
khaos test runs locally against your @khaosagent handlers and does not require any API credentials. Only khaos ci --sync needs the token and project slug.khaos ci — Evaluation Pipeline
The khaos ci command runs an evaluation pack against your agent, checks thresholds, and generates reports. It supports multiple output formats and integrates with GitHub Actions step summaries.
# Basic CI run
khaos ci my-agent --eval quickstart
# With thresholds and output
khaos ci my-agent \
--eval full-eval \
--security-threshold 85 \
--resilience-threshold 75 \
--format junit --output-file results.xml \
--json-file results.json
# GA mode: simplified exit codes (0=pass, 1=threshold fail, 2=infra error)
khaos ci my-agent --exit-code-mode ga --sync
# Validate credentials before running (dry run)
khaos ci my-agent --preflight-only
# Also run @khaostest tests in the same pipeline
khaos ci my-agent --eval quickstart --test
khaos ci my-agent --eval quickstart --test --test-path tests/integration/
# Baseline comparison and regression detection
khaos ci my-agent --baseline main --fail-on-regression
khaos ci my-agent --save-baseline mainKey Flags
| Flag | Description | Default |
|---|---|---|
--eval, -e | Evaluation pack to run | quickstart |
--security-threshold | Minimum security score (0-100) | 80 |
--resilience-threshold | Minimum resilience score (0-100) | 70 |
--format, -f | Output format: text, json, junit, markdown, all | text |
--output-file, -o | Write primary output to file (format inferred from extension) | - |
--json-file | Write JSON results to file (in addition to primary format) | - |
--sync / --no-sync | Upload results to dashboard | Auto in GitHub Actions |
--exit-code-mode | ga (0/1/2) or detailed (multi-code) | ga in GHA |
--test | Also run @khaostest tests and merge results into output | false |
--test-path | Paths to search for @khaostest tests (used with --test) | tests/ |
--baseline, -b | Compare against a named baseline | - |
--save-baseline | Save this run as a named baseline | - |
--fail-on-regression | Exit non-zero if regression detected | false |
--preflight-only | Validate credentials without running evaluation | false |
khaos test — Agent Tests in CI
Run your @khaostest-decorated Python tests with machine-readable output. No cloud credentials required — tests run locally against your agent handlers. See Agent Testing for how to write tests.
# JUnit XML for CI test reporters
khaos test --format junit -o khaos-test-results.xml
# JSON for scripting
khaos test --format json -o khaos-test-results.json
# Both at once
khaos test --format junit -o results.xml --json-file results.json
# All formats (writes .xml, .json, .md)
khaos test --format all -o khaos-testskhaos test automatically writes a Markdown report to $GITHUB_STEP_SUMMARY and outputs total, passed, failed, and verdict to $GITHUB_OUTPUT.Exit Codes
Both commands return meaningful exit codes for pipeline control:
GA Mode (--exit-code-mode ga, default in GitHub Actions)
| Code | Meaning | Action |
|---|---|---|
0 | All gates passed | Continue pipeline |
1 | Threshold or test failure | Fail build |
2 | Infrastructure / config error | Investigate setup |
Detailed Mode (--exit-code-mode detailed)
| Code | Meaning |
|---|---|
0 | All gates passed |
1 | Security threshold not met |
2 | Resilience threshold not met |
3 | Both security and resilience failed |
4 | Baseline tests failed |
5 | Regression detected vs baseline |
6 | @khaostest tests failed (when using --test) |
10 | Configuration error |
11 | Runtime error |
khaos test (standalone) uses simple exit codes: 0 = all passed, 1 = any failed.
GitHub Actions
A complete workflow running both evaluation and @khaostest tests:
name: Khaos Evaluation
on:
push:
branches: [main]
pull_request:
jobs:
# Job 1: Run evaluation packs
evaluate:
runs-on: ubuntu-latest
env:
KHAOS_API_TOKEN: ${{ secrets.KHAOS_API_TOKEN }}
KHAOS_PROJECT_SLUG: ${{ secrets.KHAOS_PROJECT_SLUG }}
KHAOS_API_URL: ${{ secrets.KHAOS_API_URL }}
KHAOS_STATE_DIR: ${{ github.workspace }}/.khaos
KHAOS_PACK: ${{ github.event_name == 'pull_request' && 'quickstart' || 'full-eval' }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Run Khaos CI
id: khaos
run: |
pip install "khaos>=1.0.0,<2"
khaos ci path/to/agent.py \
--eval "$KHAOS_PACK" \
--sync \
--exit-code-mode ga \
--format junit --output-file khaos-results.xml \
--json-file khaos-results.json
- uses: actions/upload-artifact@v4
if: always()
with:
name: khaos-results
path: |
khaos-results.json
khaos-results.xml
# Job 2: Run @khaostest tests (no credentials needed)
khaos-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- run: pip install "khaos>=1.0.0,<2"
- name: Run @khaostest tests
run: |
khaos test \
--format junit \
--output-file khaos-test-results.xml \
--json-file khaos-test-results.json
- uses: mikepenz/action-junit-report@v4
if: always()
with:
report_paths: 'khaos-test-results.xml'
check_name: 'Khaos @khaostest Results'Add KHAOS_API_TOKEN and KHAOS_PROJECT_SLUG to your repository secrets under Settings > Secrets and variables > Actions.
.github/actions/khaos-test/action.yml with inputs for agent path, pack, thresholds, baseline comparison, and run-khaostests to include @khaostest results.GitLab CI
Add these jobs to your .gitlab-ci.yml:
stages: [khaos]
# Evaluation job
khaos:ci:
stage: khaos
image: python:3.11
variables:
KHAOS_API_TOKEN: $KHAOS_API_TOKEN
KHAOS_PROJECT_SLUG: $CI_PROJECT_PATH
KHAOS_API_URL: https://api.khaos.exordex.com
KHAOS_CI: "1"
KHAOS_PACK: "quickstart"
rules:
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
variables: { KHAOS_PACK: "quickstart" }
- if: '$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH'
variables: { KHAOS_PACK: "full-eval" }
script:
- pip install "khaos>=1.0.0,<2"
- khaos ci path/to/agent.py --eval "$KHAOS_PACK" --sync --exit-code-mode ga
--format junit --output-file khaos-results.xml
artifacts:
when: always
expire_in: 30 days
paths: [khaos-results.xml]
# @khaostest job
khaos:test:
stage: khaos
image: python:3.11
rules:
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
- if: '$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH'
script:
- pip install "khaos>=1.0.0,<2"
- khaos test --format junit --output-file khaos-test-results.xml
artifacts:
when: always
reports:
junit: khaos-test-results.xmlAdd KHAOS_API_TOKEN as a CI/CD variable under Settings > CI/CD > Variables (masked and protected).
CircleCI
Add these jobs to your .circleci/config.yml:
version: 2.1
jobs:
khaos-ci:
docker:
- image: cimg/python:3.11
environment:
KHAOS_CI: "1"
steps:
- checkout
- run:
name: Run Khaos evaluation
command: |
pip install "khaos>=1.0.0,<2"
khaos ci path/to/agent.py \
--eval quickstart --sync --exit-code-mode ga \
--format junit --output-file khaos-results.xml \
--json-file khaos-results.json
- store_artifacts:
path: khaos-results.xml
- store_artifacts:
path: khaos-results.json
khaos-test:
docker:
- image: cimg/python:3.11
steps:
- checkout
- run:
name: Run @khaostest tests
command: |
pip install "khaos>=1.0.0,<2"
khaos test --format junit --output-file khaos-test-results.xml
- store_test_results:
path: khaos-test-results.xml
workflows:
khaos:
jobs:
- khaos-ci
- khaos-testOther CI Systems
For any CI system (Jenkins, Buildkite, Azure Pipelines, etc.), install the CLI and use the appropriate output format:
#!/bin/bash
pip install "khaos>=1.0.0,<2"
# Run evaluation with JUnit output
khaos ci path/to/agent.py \
--eval quickstart \
--format junit --output-file results.xml \
--json-file results.json
# Run @khaostest tests separately
khaos test --format junit --output-file test-results.xml
# Exit code indicates pass/fail
exit $?Preflight Validation
Use --preflight-only to validate credentials and connectivity before running a full evaluation. This is useful as a separate CI step to fail fast on configuration issues.
# Validate setup without running an evaluation
khaos ci my-agent --preflight-only --sync
# In CI: add as a separate step before the real run
# Step 1: Preflight
khaos ci my-agent --preflight-only --sync
# Step 2: Real evaluation
khaos ci my-agent --eval full-eval --syncChoosing an Evaluation
Select the right evaluation for your pipeline stage:
| Evaluation | Use Case | Duration |
|---|---|---|
quickstart | Fast smoke test for every PR | ~2 min |
security-standard | Security-focused evaluation | ~5 min |
full-eval | Comprehensive evaluation before release | ~10 min |
quickstart + khaos test on every PR for fast feedback, and full-eval on merges to main for comprehensive coverage.Viewing Results
CI runs with --sync are automatically synced to the dashboard. After a run completes:
- Open your project in the Dashboard
- Navigate to Evaluations to see the run
- Click Compare to generate a 4-lens impact report against any previous run
- Share the comparison URL in your PR for team review
Deployment Gating
Use the gate API to block deployments when scores fall below your threshold. The endpoint returns HTTP 200 when all scores pass, or HTTP 422 with details when any score fails the gate.
# Add to your GitHub Actions workflow
- name: Check Khaos Gate
env:
DASHBOARD_URL: ${{ vars.KHAOS_DASHBOARD_URL }}
RUN_ID: ${{ steps.khaos.outputs.run_id }}
KHAOS_API_TOKEN: ${{ secrets.KHAOS_API_TOKEN }}
run: |
GATE=$(curl -sf "${DASHBOARD_URL}/api/runs/${RUN_ID}/gate?threshold=70" \
-H "x-webhook-secret: ${KHAOS_API_TOKEN}")
echo "$GATE" | jq -e '.passed'?threshold=80 for stricter gating. The default threshold is 70 (the warning level). Scores checked: overall_score, security_score, and resilience_score.PR Comments
Post a Khaos summary directly to your pull request. The summary endpoint returns a markdown report with scores, comparisons, and links to the full dashboard.
# Add to your GitHub Actions workflow
- name: Post Khaos Report
if: github.event_name == 'pull_request'
env:
DASHBOARD_URL: ${{ vars.KHAOS_DASHBOARD_URL }}
RUN_ID: ${{ steps.khaos.outputs.run_id }}
KHAOS_API_TOKEN: ${{ secrets.KHAOS_API_TOKEN }}
run: |
SUMMARY=$(curl -sf "${DASHBOARD_URL}/api/runs/${RUN_ID}/summary" \
-H "x-webhook-secret: ${KHAOS_API_TOKEN}")
gh pr comment ${{ github.event.pull_request.number }} --body "$SUMMARY"?format=json to get the summary wrapped in a JSON object ({ markdown: "..." }) for programmatic use.Best Practices
- Run on every PR — Catch issues before merge
- Use quickstart + khaos test for PRs — Fast feedback (~2 min)
- Use full-eval for main — Comprehensive before release
- Use
--testflag — Include@khaostestresults alongside evaluations - Use JUnit output — Native integration with GitHub, GitLab, CircleCI, and Jenkins test reporters
- Save baselines on main — Use
--save-baseline mainafter successful merges,--baseline main --fail-on-regressionon PRs - Compare runs in the dashboard — Generate impact reports to understand what changed
- Share comparison URLs in PRs — Give reviewers a direct link to the diff
- Validate setup with
--preflight-only— Fail fast on credential issues - Rotate tokens periodically — Generate new tokens and revoke old ones