Requests
Every LLM call that flows through the Spanlens proxy produces one row in the requests table, backed by ClickHouse for fast analytical reads. /requests is the viewer: filter, sort, drill down, and read the actual request and response bodies. This is the raw substrate every other feature (Traces, Anomalies, Savings, etc.) aggregates from.
Why it matters
Aggregate views summarize, they smooth over individual outliers. When something goes wrong , a user reports a wrong answer, a cost spike is unaccounted for, a prompt injection slips through, you need to see the actual bytes that went out and came back. Requests gives you that exact record.
What gets logged
For every proxied call, Spanlens stores:
| Field | Description |
|---|---|
provider | openai / anthropic / gemini / azure |
model | The dated variant returned by the provider (e.g. gpt-4o-mini-2024-07-18), not the alias you requested |
prompt_tokens | Gross input tokens parsed from the provider response (or streamed deltas). Includes any cached portion. |
completion_tokens | Output tokens generated by the model. |
total_tokens | Sum of prompt + completion. Convenience for billing queries. |
cache_read_tokens | Subset of prompt_tokensserved from the provider's prompt cache (Anthropic / OpenAI). Charged at reduced rate by cost tracking. 0 for rows before 2026-05-14. |
cache_write_tokens | Portion of prompt_tokens written to the cache (Anthropic). Billed at the cache-write premium. |
cost_usd | Computed via cost tracking |
latency_ms | Time from our proxy receiving the request to last byte sent |
status_code | HTTP status from the provider (200, 429, 500, etc.) |
request_body | Outgoing payload sent to the provider, up to 64KB. Authorization headers stripped before storage. |
response_body | Incoming payload from the provider, up to 64KB. Reconstructed into non-streaming shape for SSE responses. |
project_id | Scoped to the API key used (or X-Spanlens-Project header) |
provider_key_id | Which provider key was used to make the call (name shown in the drawer) |
trace_id | Set when the call ran inside an SDK observe() wrapper. Groups related calls into a Trace. |
span_id | Identifies this specific call within the trace tree. |
prompt_version_id | Set when the call carried x-spanlens-prompt-version header. Links to a Prompts version row. |
user_id | Set from x-spanlens-user header. Customer-supplied end-user ID for attribution (Spanlens does not interpret the value). |
session_id | Set from x-spanlens-session header. Groups requests from one conversation or workflow. |
flags | PII / injection flags (JSONB array) |
created_at | When the request arrived at the proxy |
Dashboard
Stat strip
Above the list, a five-cell strip shows real-time 24-hour metrics: total requests, average latency, spend, error rate, and active anomaly count. Each cell includes a mini spark chart. Cells turn accent-colored when a metric exceeds a threshold (latency > 1 s, error rate > 1%, any anomaly present).
Traffic bars
A 30-day bar chart sits below the stat strip. Bar height corresponds to request volume; bars with at least one error flip to the error color. Hover a bar to see the date label.
List view & filters
The list auto-refreshes every 10 seconds so new requests appear without a page reload. A manual ↻ button in the toolbar forces an immediate refetch.
The main table is paginated (up to 100 rows/page) with these filters. Filter state is synced to the URL, copy and share the URL to hand off a pre-filtered view, or use the browser's back button to restore a previous filter state.
- Provider, exact match (openai / anthropic / gemini / azure)
- Model, partial, case-insensitive match (e.g. searching “mini” matches
gpt-4o-mini-2024-07-18) - Provider key, dropdown of your registered keys, to isolate traffic by key
- Status, All / OK (2xx) / 4xx / 5xx
- Date range, from / to
URL-only filters
These filters are applied when navigating here via a drilldown from another page. An active filter banner appears at the top of the page and can be cleared with Clear ×.
| URL param | Meaning | Primary entry point |
|---|---|---|
?promptVersionId=<uuid> | Only calls that used a specific prompt version | Prompts → Calls tab row click |
?userId=<str> | Only calls from a specific end-user | Request detail User field click |
?sessionId=<str> | Only calls from a specific session | Request detail Session field click |
The user_id / session_id columns are only populated when the caller sends the x-spanlens-user / x-spanlens-session headers. See SDK helpers withUser() / withSession().
Column headers for Latency, Cost, Tokens, and Age are clickable to sort ascending or descending. The default sort is newest-first by created_at.
Hovering the Age cell shows a tooltip with the full timestamp.
Replay
Every request detail page has a Replay button. It opens a modal where you can re-run the original call against a different model and compare the result inline , without touching your application code.
- Model selector. A dropdown pre-populated with models for the same provider. The original model is always available as the first option. Changing the model resets any previous result.
- Run. Executes the replay server-side via
POST /api/v1/requests/:id/replay/run. Spanlens decrypts your provider key, strips anystream: trueflag, forwards the original request body with the new model, and returns a result card showing latency, token counts, and cost. The replayed call is also logged as a new row in /requests. - Copy curl. Fetches a ready-to-run
curlsnippet fromPOST /api/v1/requests/:id/replayand copies it to the clipboard. The snippet is also displayed in the modal so you can inspect or edit it before running.
Detail drawer
Clicking any row opens a 480 px right-side drawer, no page navigation. The drawer shows:
- Request ID, timestamp, and error badge (if applicable)
- Metadata grid: Model, Provider, Status code, Provider key name, Prompt tokens, Completion tokens
- Trace / Span IDs with inline links and copy buttons. Trace ID links directly to the Traces waterfall view.
- Metrics row: Latency, Cost, Total tokens (with prompt / completion breakdown)
- Prev / Next navigation buttons, step through the current result set one row at a time. When you reach the end of a page the drawer automatically loads the next page and jumps to the first (or last) row. An Open → link opens the standalone detail page
/requests/[id]if you need a shareable URL.
Drawer tabs
| Tab | Content |
|---|---|
| Request | Formatted message view. OpenAI and Anthropic messages[] are rendered as a conversation. Anthropic system strings/arrays are shown in a separate block above the messages. Gemini contents[].parts[] are normalized into the same layout. A copy button exports the raw JSON. |
| Response | Response body JSON when captured. Streaming responses are not buffered server-side (they pass through directly to your app), so this tab shows a note in that case. |
| Trace | Mini span list from the parent trace (up to 8 spans with type badges and durations) + a link to open the full waterfall. Shows a help note when the request has no associated trace. |
| Raw | Full request_body and response_body as pretty-printed JSON, each with a copy button. |
| Error | Conditionally shown when error_message is set. Displays the raw error string from the provider. |
API
# List requests, paginated, sortable, filterable
GET /api/v1/requests
?projectId=<uuid> # filter by project
&provider=openai # exact match
&model=mini # partial match (case-insensitive)
&providerKeyId=<uuid> # filter by provider key
&promptVersionId=<uuid> # filter by prompt version
&userId=<str> # filter by x-spanlens-user header value
&sessionId=<str> # filter by x-spanlens-session header value
&status=ok # ok | 4xx | 5xx
&from=2024-01-01T00:00:00Z
&to=2024-01-31T23:59:59Z
&sortBy=latency_ms # created_at | latency_ms | cost_usd | total_tokens
&sortDir=desc # asc | desc
&page=1
&limit=50 # max 100
# One request by id (includes full request_body + response_body)
GET /api/v1/requests/:id
# Replay, curl snippet (proxy-ready payload)
POST /api/v1/requests/:id/replay
Body: { "model": "gpt-4o-mini" } # optional model override
# Replay, execute server-side and return result (latency / tokens / cost)
POST /api/v1/requests/:id/replay/run
Body: { "model": "gpt-4o-mini" } # optional model overridebashThe list endpoint returns { success, data, meta: { total, page, limit } }. Each row includes a flattened provider_key_name field (the human-readable key label) so the dashboard can render it without a second round-trip.
Privacy & retention
- Authorization headers are stripped from
request_bodybefore it's stored, your OpenAI/Anthropic/Gemini key never appears in logs. - API key patterns are auto-masked in stored bodies. Anything matching
sk-*,sk-proj-*,sk-ant-*,AIza*, orsl_live_*(≥12-char body) is replaced with<prefix>***before insert. Defense-in-depth for keys that slip into prompts/tool output/error text. See Security for details. - 64KB body cap. Large prompts (e.g. 40-page PDF extraction) are truncated at 64KB with a visible marker. Full bodies would blow up storage and cost.
- Body retention opt-out. Pass
logBody: 'meta'in the SDK (orX-Spanlens-Log-Body: metaheader) to skip body storage entirely while keeping tokens / cost / latency / identifiers. Set'none'to additionally dropuser_idandsession_id. See SDK. - Retention policy.Free plan: 14 days. Pro: 90 days. Team and Enterprise: 365 days (Enterprise is extendable by contract). Enforced by the table's TTL plus a per-plan query-time clip, older rows are dropped by ClickHouse's background merge.
- Tenant isolation. ClickHouse has no row-level security; every read path goes through the
requestsScope()helper which injects anorganization_id = ?filter on every query. Direct ClickHouse access is server-only. The dashboard cannot bypass the filter.
Limitations
- 64KB body cap is fixed.A “full-body archive to S3” opt-in for Enterprise customers is on the roadmap.
- No full-text body search in the UI yet.The model filter uses case-insensitive substring match; there is no free-text search over request/response body content. ClickHouse can do it efficiently, the dashboard hasn't exposed it yet.
- Streaming response bodies are reconstructed, not original.SSE chunks are tee'd while pass-through to the client. After the stream closes, the assistant text is reassembled and written to
response_bodyin the upstream's standard non-streaming shape (so the dashboard renders it identically to a non-stream response). Tool calls / images / non-text content blocks are not preserved, only the assistant-visible text portion. Aborted streams keep whatever was received up to the break.
Related: Traces (grouped view), Cost tracking, Security flags, /requests dashboard.