Security scan
Every LLM request and response body passes through Spanlens' scan pipeline. Two classes of concern are flagged automatically: PII leaks (users pasting social security numbers into a chatbot) and prompt injection (users trying to override your system prompt). Flagged traffic shows up in /security with masked samples, and you can optionally block injections at the proxy or receive instant email alerts.
Why it matters
PII in LLM calls is the #1 thing enterprise security teams ask about. If your chatbot receives a user's credit card number and that request body lands in OpenAI's training data (or your logs, or your support ticket queue), you have a GDPR/PCI incident on your hands. Catching it at the proxy layer, before it hits the provider, is the cheapest mitigation point.
Prompt injection is the other side: malicious users trying to hijack your assistant with “ignore previous instructions and...”. When blocking mode is on, Spanlens returns a 422 before the request ever reaches the LLM. When it's off, the flag is recorded so you can audit which traffic source needs hardening.
How it works
PII rules (7 patterns)
Regex-based, deliberately conservative (structural shape rather than keyword match) to minimize false positives on normal prose:
| Rule | Pattern | Example match |
|---|---|---|
ssn-kr | Korean resident registration number (6-7 digits) | 900101-1234567 |
iban | IBAN, EU 27 + UK, CH, NO and 30+ countries. Compact and spaced forms. mod-97 validated. | GB82WEST12345698765432 |
ssn-us | US SSN (3-2-4) | 123-45-6789 |
credit-card | 13–19 digit card number (Luhn-passing) | 4532 0151 1283 0366 |
email | Email addresses | jane@example.com |
phone | E.164 + common international formats | +1 (555) 123-4567 |
passport | Generic letter+digit passport (6–9 chars) | M12345678 |
Prompt injection rules (8 patterns)
Well-known social-engineering phrases used to override system prompts. English rules use case-insensitive word-boundary matches; Korean rules use Unicode-aware substring matching.
| Rule | What it catches |
|---|---|
ignore-previous | “ignore/disregard/forget (all) previous/prior/above instructions/prompts/rules” |
reveal-system-prompt | “what/show/reveal/print your system/initial/hidden prompt” |
role-override | “you are now / from now on / act as / pretend to be...” |
developer-mode | “developer mode / debug mode / jailbreak / DAN / do anything now” |
token-smuggle | Control tokens pasted as text: <|system|>, <|im_start|>, etc. |
ignore-previous-ko | Korean equivalents of “ignore/forget all previous instructions/commands/prompts” |
reveal-system-ko | Korean equivalents of “tell/show me your system/initial/hidden prompt/instructions” |
role-override-ko | Korean equivalents of “from now on you are ... / pretend to be / act as ...” |
What gets stored
The scan runs on both the request body and the LLM response body inside logRequestAsync(). For every match, a compact flag is stored in JSONB:
{
"type": "pii",
"pattern": "ssn-us",
"sample": "12*****89"
}jsonRequest flags → requests.flags. Response flags → requests.response_flags. The sample is a masked 6-character excerpt around the match , just enough for you to audit what was flagged without storing raw PII back into the database. The original match is never persisted in readable form.
Features
Blocking mode (per-project)
When blocking is enabled for a project, any request that contains an injection-type flag is rejected at the proxy with a 422 Unprocessable Entity before it ever reaches the LLM provider:
{
"error": "Request blocked by Spanlens security policy: prompt injection detected.",
"code": "INJECTION_BLOCKED"
}jsonPII flags are never blocked, PII may be legitimate user data (e.g. a healthcare app). Only injection patterns trigger blocking. Toggle it in the /security dashboard under Per-project blocking.
Alert emails (org-wide)
Enable alert emails to receive an immediate notification whenever a request or response is flagged. The email is sent to the workspace owner and includes:
- Flag direction (Request / Response), type (pii / injection), pattern, and masked sample
- A link to the /security dashboard
Alerts are rate-limited to once every 5 minutes per organization to prevent inbox flooding during high-volume attacks. Toggle in the Security dashboard under Alert emails.
Response scanning
Spanlens scans both directions: the request body (user input) and the LLM response body (model output). This catches cases where the model itself leaks PII it was given in context, for example, echoing a credit card number back in a summary. Response flags are stored separately in requests.response_flags and shown in the dashboard with a ↩ prefix to distinguish them from request flags.
Using it
Dashboard
/security has two panes plus a settings section:
- Settings, Alert email toggle (org-wide) + per-project blocking toggles
- Summary, Counts per rule over the selected window (24h / 7d / 30d)
- Flagged, Paginated list of flagged requests with masked samples and direction labels (request vs response), direct link to the full /requests row for context
API
# Flagged requests (paginated)
GET /api/v1/security/flagged?limit=50&offset=0
# Flag counts by type/pattern over a time window
GET /api/v1/security/summary?hours=24
# Org alert + per-project block settings
GET /api/v1/security/settings
# Toggle org-level alert emails
PATCH /api/v1/security/alert
{ "enabled": true }
# Toggle per-project injection blocking
PATCH /api/v1/security/projects/{projectId}/block
{ "enabled": true }bashStored-body sanitization (defense in depth)
Separately from the request-time scan above, every body that lands in ClickHouse passes through a pattern-based key scrubber first. The goal is narrow: catch API keys that accidentally end up in prompts, tool output, or error messages, so a compromised Spanlens row never leaks a customer's upstream credentials.
| Provider | Prefix matched | Replacement |
|---|---|---|
| Spanlens | sl_live_* | sl_live_*** |
| Anthropic | sk-ant-* | sk-ant-*** |
| OpenAI project keys | sk-proj-* | sk-proj-*** |
| OpenAI (legacy) | sk-* | sk-*** |
| Google (Gemini) | AIza* | AIza*** |
Each pattern requires at least 12 characters after the prefix so short identifiers that share the prefix don't produce false positives. The masker runs against request_body, response_body, and error_message. Source: apps/server/src/lib/pii-mask.ts.
Body retention modes, logBody
Pattern masking covers structured secrets. It does notredact natural-language PII (names, emails, addresses, medical information) that the regex rules above also can't reliably catch. For PII-heavy workloads, the right answer is to not store the body at all:
| Mode | Bodies stored | Tokens / cost / latency / model | user_id / session_id |
|---|---|---|---|
full (default) | Yes (with key masking above) | Yes | Yes |
meta | No | Yes | Yes |
none | No | Yes | No (null) |
Set per-call via the SDK helper withLogBody(mode) or the x-spanlens-log-body header. The server falls back to full on any unrecognized value, so a malformed header never silently disables logging.
Spanlens does NOT ship automatic natural-language PII redaction. Pattern matching on free text produces too many false positives/negatives to be the default, we'd rather give you a clean opt-out and let your prompts that need full bodies keep them. Enterprise customers needing in-place redaction (medical / financial), reach out.
Limitations
- No custom rules. Rule set is hard-coded today. Custom regex + custom webhook alerts are planned post-launch.
- Blocking covers injection only, not PII. PII is detect-and-alert only. A policy engine for rewriting or blocking PII is on the roadmap.
- National ID coverage is limited.Only US SSN and Korean RRN are currently recognized as national identifiers. Other country-specific ID formats (IBAN, UK NI, German Personalausweis, etc.) aren't yet covered. PRs welcome.
- No LLM-based secondary check.For high-stakes workloads you'll want a classifier on top. Integrations with Llama Guard / Prompt Guard are under consideration.
- Regex is not ML. A sufficiently motivated attacker can rephrase injection phrases to slip through. What we catch is the long tail of accidentally bad inputs and low-effort attacks, which covers 90%+ of real incidents.
- Natural-language PII is not auto-redacted. The key scrubber only catches structured patterns like API keys. Names, emails, card numbers in free-form prompts pass through. Use
logBody: 'meta'to skip body storage entirely for those workloads.
Related: Anomalies (statistical spike detection), /security dashboard. Source: apps/server/src/lib/security-scan.ts, apps/server/src/api/security.ts.