Security scan
Every LLM request body passes through Spanlens' scan pipeline before it's logged. Two classes of concern are flagged automatically: PII leaks (users pasting social security numbers into a chatbot) and prompt injection (users trying to override your system prompt). Flagged requests show up in /security with masked samples and rule names.
Why it matters
PII in LLM calls is the #1 thing enterprise security teams ask about. If your chatbot receives a user's credit card number and that request body lands in OpenAI's training data (or your logs, or your support ticket queue), you have a GDPR/PCI incident on your hands. Catching it at the proxy layer — before it hits the provider — is the cheapest mitigation point.
Prompt injection is the other side: malicious users trying to hijack your assistant with “ignore previous instructions and...”. Spanlens can't stop the attack, but it can surface patterns so you know which traffic source needs hardening.
How it works
PII rules (6 patterns)
Regex-based, deliberately conservative (structural shape rather than keyword match) to minimize false positives on normal prose:
| Rule | Pattern | Example match |
|---|---|---|
ssn-kr | Korean resident registration number (6-7 digits) | 900101-1234567 |
ssn-us | US SSN (3-2-4) | 123-45-6789 |
credit-card | 13–19 digit card number (Luhn-passing) | 4532 0151 1283 0366 |
email | Email addresses | jane@example.com |
phone | E.164 + common international formats | +1 (555) 123-4567 |
passport | Generic letter+digit passport (6–9 chars) | M12345678 |
Prompt injection rules (5 patterns)
Well-known social-engineering phrases used to override system prompts. Case-insensitive, word-boundary matches only.
| Rule | What it catches |
|---|---|
ignore-previous | “ignore/disregard/forget (all) previous/prior/above instructions/prompts/rules” |
reveal-system-prompt | “what/show/reveal/print your system/initial/hidden prompt” |
role-override | “you are now / from now on / act as / pretend to be...” |
developer-mode | “developer mode / debug mode / jailbreak / DAN / do anything now” |
token-smuggle | Control tokens pasted as text: <|system|>, <|im_start|>, etc. |
What gets stored
The scan runs on the serialized request body inside logRequestAsync(). For every match, a compact flag is appended to requests.flags (JSONB):
{
"type": "pii",
"pattern": "ssn-us",
"sample": "12*****89"
}jsonThe sample is a masked 6-character excerpt around the match — just enough for you to audit what was flagged without storing raw PII back into the database. The original match is never persisted in readable form.
Using it
Dashboard
/security has two panes:
- Summary — counts per rule over the selected window (24h / 7d / 30d)
- Flagged — paginated list of flagged requests with masked samples, direct link to the full /requests row for context
API
GET /api/v1/security/summary?sinceHours=168
# → { pii: { email: 42, "ssn-us": 3, ... }, injection: { "ignore-previous": 12, ... } }
GET /api/v1/security/flagged?limit=50&offset=0&type=pii
# → paginated list of flagged requestsbashZero setup
There's nothing to configure. The scan runs on every request that flows through the Spanlens proxy. No CPU budget to tune, no rules to enable, no accuracy knobs.
What this is not
Honest disclaimer: this is a detection layer, not a prevention layer.
- Flagged requests still reach the LLM provider. Spanlens doesn't block them — it reports them. Blocking would require a latency tradeoff and user-configurable policy, both of which we want to do carefully rather than ship half-baked.
- Regex is not ML. A sufficiently motivated attacker can always rephrase “ignore previous instructions” in a way that slips through. What we catch is the long tail of accidentally bad inputs and low-effort attacks — which covers 90%+ of real incidents.
- No hashing or tokenization is applied pre-storage. If your threat model requires encrypted request bodies at rest, self-host with additional disk encryption.
Limitations & roadmap
- No custom rules. Rule set is hard-coded today. Custom regex + custom webhook alerts planned post-launch.
- No blocking mode. Currently detect-only. Policy engine to
block/rewrite/alerton match is on the roadmap. - English + Korean optimized. Patterns work on other languages but PII shapes (SSN-like structures in other countries) aren't yet covered. PRs welcome.
- No LLM-based secondary check. For high-stakes workloads you'll want a classifier on top. Integrations with Llama Guard / Prompt Guard are under consideration.
Related: Anomalies (statistical spike detection), /security dashboard. Source: apps/server/src/lib/security-scan.ts.