PII Detection

Khaos includes a built-in PII detection engine that scans agent responses for personally identifiable information. The detector covers seven categories of sensitive data with configurable risk levels and pattern matching.

Quick Start

Python

from khaos.pii import PIIDetector

detector = PIIDetector()

# Scan a string for PII
result = detector.scan("Contact me at john@example.com or 555-123-4567")
print(f"PII found: {result.has_pii}")
print(f"Matches: {len(result.matches)}")
for match in result.matches:
    print(f"  {match.pattern_name} ({match.category}): {match.matched_text}")

# Quick boolean check
has_pii = detector.quick_check("No sensitive data here")
print(has_pii)  # False

# Mask PII in output
masked = detector.mask_text("SSN: 123-45-6789")
print(masked)  # "SSN: ***-**-****"

PIICategory Enum

PII patterns are organized into seven categories covering the most common types of sensitive data encountered in agent outputs.

Category	Description	Example Patterns
`PERSONAL_ID`	Government-issued identifiers	SSN, passport numbers, driver's license
`FINANCIAL`	Financial account information	Credit card numbers, bank accounts, routing numbers
`CONTACT`	Contact information	Email addresses, phone numbers, physical addresses
`AUTHENTICATION`	Secrets and credentials	API keys, passwords, tokens, SSH keys
`NETWORK`	Network identifiers	IP addresses, MAC addresses, URLs with credentials
`MEDICAL`	Health-related information	Medical record numbers, health plan IDs
`CRYPTO`	Cryptocurrency identifiers	Wallet addresses, private keys

RiskLevel Enum

Each PII pattern has an assigned risk level that indicates the severity of exposure.

Level	Description	Examples
`CRITICAL`	Immediate risk of identity theft or financial loss	SSN, credit card numbers, API keys, private keys
`HIGH`	Significant privacy risk	Passport numbers, bank accounts, passwords
`MEDIUM`	Moderate privacy concern	Email addresses, phone numbers, IP addresses
`LOW`	Minor privacy concern	Names in specific contexts, general URLs

PIIDetector Class

The PIIDetector is the main interface for scanning text. Configure it to target specific categories or risk levels.

Parameter	Type	Default	Description
`categories`	`list[PIICategory]`	All	Categories to scan for
`min_risk_level`	`RiskLevel`	LOW	Minimum risk level to report
`include_context`	`bool`	True	Include surrounding text context in matches
`context_chars`	`int`	50	Number of context characters around each match
`custom_patterns`	`list[PIIPattern]`	None	Additional custom patterns to include

Methods

Method	Returns	Description
`scan(text)`	`PIIScanResult`	Scan a single string for PII
`scan_multiple(texts)`	`list[PIIScanResult]`	Scan multiple strings
`quick_check(text)`	`bool`	Fast boolean check (stops at first match)
`mask_text(text)`	`str`	Return text with PII replaced by mask characters

PIIPattern Dataclass

Each detection pattern is defined by a PIIPattern instance. You can create custom patterns using the same structure.

Field	Type	Description
`name`	`str`	Pattern identifier
`category`	`PIICategory`	Which category this pattern belongs to
`risk_level`	`RiskLevel`	Severity of this pattern's matches
`pattern`	`str`	Regular expression pattern
`mask_char`	`str`	Character used for masking (default "*")
`description`	`str`	Human-readable description

Python

from khaos.pii import PIIPattern, PIICategory, PIIDetector
from khaos.pii.patterns import RiskLevel

# Define a custom pattern
employee_id_pattern = PIIPattern(
    name="employee_id",
    category=PIICategory.PERSONAL_ID,
    risk_level=RiskLevel.HIGH,
    pattern=r"EMP-\d{6}",
    mask_char="X",
    description="Internal employee ID format",
)

# Use it in a detector
detector = PIIDetector(custom_patterns=[employee_id_pattern])
result = detector.scan("Employee EMP-123456 reported the issue")
print(result.has_pii)  # True

PIIMatch and PIIScanResult

When PII is detected, the scanner returns structured results with full match context.

PIIMatch

Field	Type	Description
`pattern_name`	`str`	Name of the matched pattern
`category`	`PIICategory`	Category of the match
`risk_level`	`RiskLevel`	Risk level of the match
`matched_text`	`str`	The actual matched text
`start`	`int`	Start position in the source text
`end`	`int`	End position in the source text
`line_number`	`int`	Line number of the match
`context`	`str`	Surrounding text context

PIIScanResult

Field	Type	Description
`matches`	`list[PIIMatch]`	All PII matches found
`text_length`	`int`	Length of the scanned text
`has_pii`	`bool`	Whether any PII was detected
`risk_summary`	`dict`	Count of matches by risk level
`category_summary`	`dict`	Count of matches by category
`critical_count`	`int`	Number of CRITICAL matches
`high_count`	`int`	Number of HIGH matches

Convenience Detectors

Khaos provides pre-configured detector instances for common use cases.

Detector	Categories	Min Risk	Use Case
`DEFAULT_DETECTOR`	All	LOW	General-purpose scanning
`AUTH_DETECTOR`	AUTHENTICATION	MEDIUM	Credential and secret detection
`FINANCIAL_DETECTOR`	FINANCIAL	HIGH	Financial data protection
`CRITICAL_DETECTOR`	All	CRITICAL	Only highest-severity matches

Python

from khaos.pii.detector import (
    DEFAULT_DETECTOR,
    AUTH_DETECTOR,
    FINANCIAL_DETECTOR,
    CRITICAL_DETECTOR,
)

# Use a pre-configured detector
result = AUTH_DETECTOR.scan(agent_response)
if result.has_pii:
    print(f"Credentials detected: {result.matches}")

# Financial-only scanning
result = FINANCIAL_DETECTOR.scan(agent_response)
if result.critical_count > 0:
    print("CRITICAL: Financial data exposed")

Integration with Testing

In the stable release, PII helpers are provided via khaos.pii. UsePIIDetector or mask_pii() directly in tests.

Python

from khaos.pii import PIIDetector, mask_pii

detector = PIIDetector()

# Redact PII from a single response string
clean_response = detector.mask_text(agent_response)

# Redact PII across transcript messages
clean_transcript = [
    {**msg, "content": mask_pii(msg["content"])}
    for msg in transcript
]

Automatic PII scanning

When security testing is enabled, Khaos automatically scans all agent responses for PII leakage. Any detected PII is flagged in the security report without additional configuration.

PII Detection

Quick Start

PIICategory Enum

RiskLevel Enum

PIIDetector Class

Methods

PIIPattern Dataclass

PIIMatch and PIIScanResult

PIIMatch

PIIScanResult

Convenience Detectors

Integration with Testing

Related Documentation