Security scan

Every LLM request body passes through Spanlens' scan pipeline before it's logged. Two classes of concern are flagged automatically: PII leaks (users pasting social security numbers into a chatbot) and prompt injection (users trying to override your system prompt). Flagged requests show up in /security with masked samples and rule names.

Why it matters

PII in LLM calls is the #1 thing enterprise security teams ask about. If your chatbot receives a user's credit card number and that request body lands in OpenAI's training data (or your logs, or your support ticket queue), you have a GDPR/PCI incident on your hands. Catching it at the proxy layer — before it hits the provider — is the cheapest mitigation point.

Prompt injection is the other side: malicious users trying to hijack your assistant with “ignore previous instructions and...”. Spanlens can't stop the attack, but it can surface patterns so you know which traffic source needs hardening.

How it works

PII rules (6 patterns)

Regex-based, deliberately conservative (structural shape rather than keyword match) to minimize false positives on normal prose:

Rule	Pattern	Example match
`ssn-kr`	Korean resident registration number (6-7 digits)	`900101-1234567`
`ssn-us`	US SSN (3-2-4)	`123-45-6789`
`credit-card`	13–19 digit card number (Luhn-passing)	`4532 0151 1283 0366`
`email`	Email addresses	`jane@example.com`
`phone`	E.164 + common international formats	`+1 (555) 123-4567`
`passport`	Generic letter+digit passport (6–9 chars)	`M12345678`

Prompt injection rules (5 patterns)

Well-known social-engineering phrases used to override system prompts. Case-insensitive, word-boundary matches only.

Rule	What it catches
`ignore-previous`	“ignore/disregard/forget (all) previous/prior/above instructions/prompts/rules”
`reveal-system-prompt`	“what/show/reveal/print your system/initial/hidden prompt”
`role-override`	“you are now / from now on / act as / pretend to be...”
`developer-mode`	“developer mode / debug mode / jailbreak / DAN / do anything now”
`token-smuggle`	Control tokens pasted as text: `<\|system\|>`, `<\|im_start\|>`, etc.

What gets stored

The scan runs on the serialized request body inside logRequestAsync(). For every match, a compact flag is appended to requests.flags (JSONB):

{
  "type": "pii",
  "pattern": "ssn-us",
  "sample": "12*****89"
}

json

The sample is a masked 6-character excerpt around the match — just enough for you to audit what was flagged without storing raw PII back into the database. The original match is never persisted in readable form.

Using it

Dashboard

/security has two panes:

Summary — counts per rule over the selected window (24h / 7d / 30d)
Flagged — paginated list of flagged requests with masked samples, direct link to the full /requests row for context

API

GET /api/v1/security/summary?sinceHours=168
# → { pii: { email: 42, "ssn-us": 3, ... }, injection: { "ignore-previous": 12, ... } }

GET /api/v1/security/flagged?limit=50&offset=0&type=pii
# → paginated list of flagged requests

bash

Zero setup

There's nothing to configure. The scan runs on every request that flows through the Spanlens proxy. No CPU budget to tune, no rules to enable, no accuracy knobs.

What this is not

Honest disclaimer: this is a detection layer, not a prevention layer.

Flagged requests still reach the LLM provider. Spanlens doesn't block them — it reports them. Blocking would require a latency tradeoff and user-configurable policy, both of which we want to do carefully rather than ship half-baked.
Regex is not ML. A sufficiently motivated attacker can always rephrase “ignore previous instructions” in a way that slips through. What we catch is the long tail of accidentally bad inputs and low-effort attacks — which covers 90%+ of real incidents.
No hashing or tokenization is applied pre-storage. If your threat model requires encrypted request bodies at rest, self-host with additional disk encryption.

Limitations & roadmap

No custom rules. Rule set is hard-coded today. Custom regex + custom webhook alerts planned post-launch.
No blocking mode. Currently detect-only. Policy engine to block / rewrite / alert on match is on the roadmap.
English + Korean optimized. Patterns work on other languages but PII shapes (SSN-like structures in other countries) aren't yet covered. PRs welcome.
No LLM-based secondary check. For high-stakes workloads you'll want a classifier on top. Integrations with Llama Guard / Prompt Guard are under consideration.

Related: Anomalies (statistical spike detection), /security dashboard. Source: apps/server/src/lib/security-scan.ts.