Security scan

Every LLM request and response body passes through Spanlens' scan pipeline. Two classes of concern are flagged automatically: PII leaks (users pasting social security numbers into a chatbot) and prompt injection (users trying to override your system prompt). Flagged traffic shows up in /security with masked samples, and you can optionally block injections at the proxy or receive instant email alerts.

Why it matters

PII in LLM calls is the #1 thing enterprise security teams ask about. If your chatbot receives a user's credit card number and that request body lands in OpenAI's training data (or your logs, or your support ticket queue), you have a GDPR/PCI incident on your hands. Catching it at the proxy layer, before it hits the provider, is the cheapest mitigation point.

Prompt injection is the other side: malicious users trying to hijack your assistant with “ignore previous instructions and...”. When blocking mode is on, Spanlens returns a 422 before the request ever reaches the LLM. When it's off, the flag is recorded so you can audit which traffic source needs hardening.

How it works

PII rules (7 patterns)

Regex-based, deliberately conservative (structural shape rather than keyword match) to minimize false positives on normal prose:

RulePatternExample match
ssn-krKorean resident registration number (6-7 digits)900101-1234567
ibanIBAN, EU 27 + UK, CH, NO and 30+ countries. Compact and spaced forms. mod-97 validated.GB82WEST12345698765432
ssn-usUS SSN (3-2-4)123-45-6789
credit-card13–19 digit card number (Luhn-passing)4532 0151 1283 0366
emailEmail addressesjane@example.com
phoneE.164 + common international formats+1 (555) 123-4567
passportGeneric letter+digit passport (6–9 chars)M12345678

Prompt injection rules (8 patterns)

Well-known social-engineering phrases used to override system prompts. English rules use case-insensitive word-boundary matches; Korean rules use Unicode-aware substring matching.

RuleWhat it catches
ignore-previous“ignore/disregard/forget (all) previous/prior/above instructions/prompts/rules”
reveal-system-prompt“what/show/reveal/print your system/initial/hidden prompt”
role-override“you are now / from now on / act as / pretend to be...”
developer-mode“developer mode / debug mode / jailbreak / DAN / do anything now”
token-smuggleControl tokens pasted as text: <|system|>, <|im_start|>, etc.
ignore-previous-koKorean equivalents of “ignore/forget all previous instructions/commands/prompts”
reveal-system-koKorean equivalents of “tell/show me your system/initial/hidden prompt/instructions”
role-override-koKorean equivalents of “from now on you are ... / pretend to be / act as ...”

What gets stored

The scan runs on both the request body and the LLM response body inside logRequestAsync(). For every match, a compact flag is stored in JSONB:

{
  "type": "pii",
  "pattern": "ssn-us",
  "sample": "12*****89"
}
json

Request flags → requests.flags. Response flags → requests.response_flags. The sample is a masked 6-character excerpt around the match , just enough for you to audit what was flagged without storing raw PII back into the database. The original match is never persisted in readable form.

Features

Blocking mode (per-project)

When blocking is enabled for a project, any request that contains an injection-type flag is rejected at the proxy with a 422 Unprocessable Entity before it ever reaches the LLM provider:

{
  "error": "Request blocked by Spanlens security policy: prompt injection detected.",
  "code": "INJECTION_BLOCKED"
}
json

PII flags are never blocked, PII may be legitimate user data (e.g. a healthcare app). Only injection patterns trigger blocking. Toggle it in the /security dashboard under Per-project blocking.

Alert emails (org-wide)

Enable alert emails to receive an immediate notification whenever a request or response is flagged. The email is sent to the workspace owner and includes:

  • Flag direction (Request / Response), type (pii / injection), pattern, and masked sample
  • A link to the /security dashboard

Alerts are rate-limited to once every 5 minutes per organization to prevent inbox flooding during high-volume attacks. Toggle in the Security dashboard under Alert emails.

Response scanning

Spanlens scans both directions: the request body (user input) and the LLM response body (model output). This catches cases where the model itself leaks PII it was given in context, for example, echoing a credit card number back in a summary. Response flags are stored separately in requests.response_flags and shown in the dashboard with a prefix to distinguish them from request flags.

Using it

Dashboard

/security has two panes plus a settings section:

  • Settings, Alert email toggle (org-wide) + per-project blocking toggles
  • Summary, Counts per rule over the selected window (24h / 7d / 30d)
  • Flagged, Paginated list of flagged requests with masked samples and direction labels (request vs response), direct link to the full /requests row for context

API

# Flagged requests (paginated)
GET /api/v1/security/flagged?limit=50&offset=0

# Flag counts by type/pattern over a time window
GET /api/v1/security/summary?hours=24

# Org alert + per-project block settings
GET /api/v1/security/settings

# Toggle org-level alert emails
PATCH /api/v1/security/alert
{ "enabled": true }

# Toggle per-project injection blocking
PATCH /api/v1/security/projects/{projectId}/block
{ "enabled": true }
bash

Stored-body sanitization (defense in depth)

Separately from the request-time scan above, every body that lands in ClickHouse passes through a pattern-based key scrubber first. The goal is narrow: catch API keys that accidentally end up in prompts, tool output, or error messages, so a compromised Spanlens row never leaks a customer's upstream credentials.

ProviderPrefix matchedReplacement
Spanlenssl_live_*sl_live_***
Anthropicsk-ant-*sk-ant-***
OpenAI project keyssk-proj-*sk-proj-***
OpenAI (legacy)sk-*sk-***
Google (Gemini)AIza*AIza***

Each pattern requires at least 12 characters after the prefix so short identifiers that share the prefix don't produce false positives. The masker runs against request_body, response_body, and error_message. Source: apps/server/src/lib/pii-mask.ts.

Body retention modes, logBody

Pattern masking covers structured secrets. It does notredact natural-language PII (names, emails, addresses, medical information) that the regex rules above also can't reliably catch. For PII-heavy workloads, the right answer is to not store the body at all:

ModeBodies storedTokens / cost / latency / modeluser_id / session_id
full (default)Yes (with key masking above)YesYes
metaNoYesYes
noneNoYesNo (null)

Set per-call via the SDK helper withLogBody(mode) or the x-spanlens-log-body header. The server falls back to full on any unrecognized value, so a malformed header never silently disables logging.

Spanlens does NOT ship automatic natural-language PII redaction. Pattern matching on free text produces too many false positives/negatives to be the default, we'd rather give you a clean opt-out and let your prompts that need full bodies keep them. Enterprise customers needing in-place redaction (medical / financial), reach out.

Limitations

  • No custom rules. Rule set is hard-coded today. Custom regex + custom webhook alerts are planned post-launch.
  • Blocking covers injection only, not PII. PII is detect-and-alert only. A policy engine for rewriting or blocking PII is on the roadmap.
  • National ID coverage is limited.Only US SSN and Korean RRN are currently recognized as national identifiers. Other country-specific ID formats (IBAN, UK NI, German Personalausweis, etc.) aren't yet covered. PRs welcome.
  • No LLM-based secondary check.For high-stakes workloads you'll want a classifier on top. Integrations with Llama Guard / Prompt Guard are under consideration.
  • Regex is not ML. A sufficiently motivated attacker can rephrase injection phrases to slip through. What we catch is the long tail of accidentally bad inputs and low-effort attacks, which covers 90%+ of real incidents.
  • Natural-language PII is not auto-redacted. The key scrubber only catches structured patterns like API keys. Names, emails, card numbers in free-form prompts pass through. Use logBody: 'meta' to skip body storage entirely for those workloads.

Related: Anomalies (statistical spike detection), /security dashboard. Source: apps/server/src/lib/security-scan.ts, apps/server/src/api/security.ts.