CI/CD Integration

Integrate Khaos into your CI/CD pipeline to catch security vulnerabilities and resilience issues before they reach production. Khaos provides two complementary CI commands:

  • khaos ci — Run evaluation packs with thresholds, reporting, and cloud sync
  • khaos test — Run @khaostest-decorated Python tests with JUnit/JSON/Markdown output

Both work in any CI environment — no custom actions or templates required.

Quick Start

Three environment variables and one command are all you need for khaos ci:

Terminal
# Set credentials (required for khaos ci with --sync)
export KHAOS_API_TOKEN=your-project-token
export KHAOS_PROJECT_SLUG=owner/project
export KHAOS_API_URL=https://api.khaos.exordex.com

# Install and run evaluation
python3 -m pip install khaos-agent
khaos ci my-agent --eval quickstart --sync

# Or run @khaostest tests (no credentials required)
khaos test --format junit -o results.xml
Project Tokens
Generate a project-scoped API token from your project's Tokens page in the Dashboard. Tokens are scoped with granular permissions (e.g. ingest:write, runs:read).

Environment Variables

VariableRequiredDescription
KHAOS_API_TOKENkhaos ci --syncProject-scoped API token for authentication
KHAOS_PROJECT_SLUGkhaos ci --syncProject identifier (e.g. myteam/my-agent)
KHAOS_API_URLkhaos ci --syncAPI endpoint (https://api.khaos.exordex.com)
KHAOS_STATE_DIRNoLocal artifact storage (default: ~/.khaos). Set to workspace dir in CI for easy artifact collection.
khaos test needs no credentials
khaos test runs locally against your @khaosagent handlers and does not require any API credentials. Only khaos ci --sync needs the token and project slug.

khaos ci — Evaluation Pipeline

The khaos ci command runs an evaluation pack against your agent, checks thresholds, and generates reports. It supports multiple output formats and integrates with GitHub Actions step summaries.

Terminal
# Basic CI run
khaos ci my-agent --eval quickstart

# With thresholds and output
khaos ci my-agent \
  --eval full-eval \
  --security-threshold 85 \
  --resilience-threshold 75 \
  --format junit --output-file results.xml \
  --json-file results.json

# GA mode: simplified exit codes (0=pass, 1=threshold fail, 2=infra error)
khaos ci my-agent --exit-code-mode ga --sync

# Validate credentials before running (dry run)
khaos ci my-agent --preflight-only

# Also run @khaostest tests in the same pipeline
khaos ci my-agent --eval quickstart --test
khaos ci my-agent --eval quickstart --test --test-path tests/integration/

# Baseline comparison and regression detection
khaos ci my-agent --baseline main --fail-on-regression
khaos ci my-agent --save-baseline main

Key Flags

FlagDescriptionDefault
--eval, -eEvaluation pack to runquickstart
--security-thresholdMinimum security score (0-100)80
--resilience-thresholdMinimum resilience score (0-100)70
--format, -fOutput format: text, json, junit, markdown, alltext
--output-file, -oWrite primary output to file (format inferred from extension)-
--json-fileWrite JSON results to file (in addition to primary format)-
--sync / --no-syncUpload results to dashboardAuto in GitHub Actions
--exit-code-modega (0/1/2) or detailed (multi-code)ga in GHA
--testAlso run @khaostest tests and merge results into outputfalse
--test-pathPaths to search for @khaostest tests (used with --test)tests/
--baseline, -bCompare against a named baseline-
--save-baselineSave this run as a named baseline-
--fail-on-regressionExit non-zero if regression detectedfalse
--preflight-onlyValidate credentials without running evaluationfalse

khaos test — Agent Tests in CI

Run your @khaostest-decorated Python tests with machine-readable output. No cloud credentials required — tests run locally against your agent handlers. See Agent Testing for how to write tests.

Terminal
# JUnit XML for CI test reporters
khaos test --format junit -o khaos-test-results.xml

# JSON for scripting
khaos test --format json -o khaos-test-results.json

# Both at once
khaos test --format junit -o results.xml --json-file results.json

# All formats (writes .xml, .json, .md)
khaos test --format all -o khaos-tests
GitHub Actions Auto-Detection
When running in GitHub Actions, khaos test automatically writes a Markdown report to $GITHUB_STEP_SUMMARY and outputs total, passed, failed, and verdict to $GITHUB_OUTPUT.

Exit Codes

Both commands return meaningful exit codes for pipeline control:

GA Mode (--exit-code-mode ga, default in GitHub Actions)

CodeMeaningAction
0All gates passedContinue pipeline
1Threshold or test failureFail build
2Infrastructure / config errorInvestigate setup

Detailed Mode (--exit-code-mode detailed)

CodeMeaning
0All gates passed
1Security threshold not met
2Resilience threshold not met
3Both security and resilience failed
4Baseline tests failed
5Regression detected vs baseline
6@khaostest tests failed (when using --test)
10Configuration error
11Runtime error

khaos test (standalone) uses simple exit codes: 0 = all passed, 1 = any failed.

GitHub Actions

A complete workflow running both evaluation and @khaostest tests:

.github/workflows/khaos.yml
name: Khaos Evaluation

on:
  push:
    branches: [main]
  pull_request:

jobs:
  # Job 1: Run evaluation packs
  evaluate:
    runs-on: ubuntu-latest
    env:
      KHAOS_API_TOKEN: ${{ secrets.KHAOS_API_TOKEN }}
      KHAOS_PROJECT_SLUG: ${{ secrets.KHAOS_PROJECT_SLUG }}
      KHAOS_API_URL: ${{ secrets.KHAOS_API_URL }}
      KHAOS_STATE_DIR: ${{ github.workspace }}/.khaos
      KHAOS_PACK: ${{ github.event_name == 'pull_request' && 'quickstart' || 'full-eval' }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Run Khaos CI
        id: khaos
        run: |
          pip install "khaos>=1.0.0,<2"
          khaos ci path/to/agent.py \
            --eval "$KHAOS_PACK" \
            --sync \
            --exit-code-mode ga \
            --format junit --output-file khaos-results.xml \
            --json-file khaos-results.json

      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: khaos-results
          path: |
            khaos-results.json
            khaos-results.xml

  # Job 2: Run @khaostest tests (no credentials needed)
  khaos-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - run: pip install "khaos>=1.0.0,<2"

      - name: Run @khaostest tests
        run: |
          khaos test \
            --format junit \
            --output-file khaos-test-results.xml \
            --json-file khaos-test-results.json

      - uses: mikepenz/action-junit-report@v4
        if: always()
        with:
          report_paths: 'khaos-test-results.xml'
          check_name: 'Khaos @khaostest Results'

Add KHAOS_API_TOKEN and KHAOS_PROJECT_SLUG to your repository secrets under Settings > Secrets and variables > Actions.

Reusable Action
Khaos also provides a reusable composite action at .github/actions/khaos-test/action.yml with inputs for agent path, pack, thresholds, baseline comparison, and run-khaostests to include @khaostest results.

GitLab CI

Add these jobs to your .gitlab-ci.yml:

.gitlab-ci.yml
stages: [khaos]

# Evaluation job
khaos:ci:
  stage: khaos
  image: python:3.11
  variables:
    KHAOS_API_TOKEN: $KHAOS_API_TOKEN
    KHAOS_PROJECT_SLUG: $CI_PROJECT_PATH
    KHAOS_API_URL: https://api.khaos.exordex.com
    KHAOS_CI: "1"
    KHAOS_PACK: "quickstart"
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
      variables: { KHAOS_PACK: "quickstart" }
    - if: '$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH'
      variables: { KHAOS_PACK: "full-eval" }
  script:
    - pip install "khaos>=1.0.0,<2"
    - khaos ci path/to/agent.py --eval "$KHAOS_PACK" --sync --exit-code-mode ga
        --format junit --output-file khaos-results.xml
  artifacts:
    when: always
    expire_in: 30 days
    paths: [khaos-results.xml]

# @khaostest job
khaos:test:
  stage: khaos
  image: python:3.11
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
    - if: '$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH'
  script:
    - pip install "khaos>=1.0.0,<2"
    - khaos test --format junit --output-file khaos-test-results.xml
  artifacts:
    when: always
    reports:
      junit: khaos-test-results.xml

Add KHAOS_API_TOKEN as a CI/CD variable under Settings > CI/CD > Variables (masked and protected).

CircleCI

Add these jobs to your .circleci/config.yml:

.circleci/config.yml
version: 2.1

jobs:
  khaos-ci:
    docker:
      - image: cimg/python:3.11
    environment:
      KHAOS_CI: "1"
    steps:
      - checkout
      - run:
          name: Run Khaos evaluation
          command: |
            pip install "khaos>=1.0.0,<2"
            khaos ci path/to/agent.py \
              --eval quickstart --sync --exit-code-mode ga \
              --format junit --output-file khaos-results.xml \
              --json-file khaos-results.json
      - store_artifacts:
          path: khaos-results.xml
      - store_artifacts:
          path: khaos-results.json

  khaos-test:
    docker:
      - image: cimg/python:3.11
    steps:
      - checkout
      - run:
          name: Run @khaostest tests
          command: |
            pip install "khaos>=1.0.0,<2"
            khaos test --format junit --output-file khaos-test-results.xml
      - store_test_results:
          path: khaos-test-results.xml

workflows:
  khaos:
    jobs:
      - khaos-ci
      - khaos-test

Other CI Systems

For any CI system (Jenkins, Buildkite, Azure Pipelines, etc.), install the CLI and use the appropriate output format:

Terminal
#!/bin/bash
pip install "khaos>=1.0.0,<2"

# Run evaluation with JUnit output
khaos ci path/to/agent.py \
  --eval quickstart \
  --format junit --output-file results.xml \
  --json-file results.json

# Run @khaostest tests separately
khaos test --format junit --output-file test-results.xml

# Exit code indicates pass/fail
exit $?

Preflight Validation

Use --preflight-only to validate credentials and connectivity before running a full evaluation. This is useful as a separate CI step to fail fast on configuration issues.

Terminal
# Validate setup without running an evaluation
khaos ci my-agent --preflight-only --sync

# In CI: add as a separate step before the real run
# Step 1: Preflight
khaos ci my-agent --preflight-only --sync
# Step 2: Real evaluation
khaos ci my-agent --eval full-eval --sync

Choosing an Evaluation

Select the right evaluation for your pipeline stage:

EvaluationUse CaseDuration
quickstartFast smoke test for every PR~2 min
security-standardSecurity-focused evaluation~5 min
full-evalComprehensive evaluation before release~10 min
Pipeline Strategy
Run quickstart + khaos test on every PR for fast feedback, and full-eval on merges to main for comprehensive coverage.

Viewing Results

CI runs with --sync are automatically synced to the dashboard. After a run completes:

  • Open your project in the Dashboard
  • Navigate to Evaluations to see the run
  • Click Compare to generate a 4-lens impact report against any previous run
  • Share the comparison URL in your PR for team review

Deployment Gating

Use the gate API to block deployments when scores fall below your threshold. The endpoint returns HTTP 200 when all scores pass, or HTTP 422 with details when any score fails the gate.

YAML
# Add to your GitHub Actions workflow
- name: Check Khaos Gate
  env:
    DASHBOARD_URL: ${{ vars.KHAOS_DASHBOARD_URL }}
    RUN_ID: ${{ steps.khaos.outputs.run_id }}
    KHAOS_API_TOKEN: ${{ secrets.KHAOS_API_TOKEN }}
  run: |
    GATE=$(curl -sf "${DASHBOARD_URL}/api/runs/${RUN_ID}/gate?threshold=70" \
      -H "x-webhook-secret: ${KHAOS_API_TOKEN}")
    echo "$GATE" | jq -e '.passed'
Custom Thresholds
Pass ?threshold=80 for stricter gating. The default threshold is 70 (the warning level). Scores checked: overall_score, security_score, and resilience_score.

PR Comments

Post a Khaos summary directly to your pull request. The summary endpoint returns a markdown report with scores, comparisons, and links to the full dashboard.

YAML
# Add to your GitHub Actions workflow
- name: Post Khaos Report
  if: github.event_name == 'pull_request'
  env:
    DASHBOARD_URL: ${{ vars.KHAOS_DASHBOARD_URL }}
    RUN_ID: ${{ steps.khaos.outputs.run_id }}
    KHAOS_API_TOKEN: ${{ secrets.KHAOS_API_TOKEN }}
  run: |
    SUMMARY=$(curl -sf "${DASHBOARD_URL}/api/runs/${RUN_ID}/summary" \
      -H "x-webhook-secret: ${KHAOS_API_TOKEN}")
    gh pr comment ${{ github.event.pull_request.number }} --body "$SUMMARY"
JSON Format
Add ?format=json to get the summary wrapped in a JSON object ({ markdown: "..." }) for programmatic use.

Best Practices

  • Run on every PR — Catch issues before merge
  • Use quickstart + khaos test for PRs — Fast feedback (~2 min)
  • Use full-eval for main — Comprehensive before release
  • Use --test flag — Include @khaostest results alongside evaluations
  • Use JUnit output — Native integration with GitHub, GitLab, CircleCI, and Jenkins test reporters
  • Save baselines on main — Use --save-baseline main after successful merges, --baseline main --fail-on-regression on PRs
  • Compare runs in the dashboard — Generate impact reports to understand what changed
  • Share comparison URLs in PRs — Give reviewers a direct link to the diff
  • Validate setup with --preflight-only — Fail fast on credential issues
  • Rotate tokens periodically — Generate new tokens and revoke old ones