Data model
Spanlens stores everything in eight core entities. Most of what you do in the dashboard is querying one or two of them. This page lays out each entity, what it stores, how it relates to the others, and which UI surface reads from it, so you can pick the right primitive when you write your own integration or run an ad-hoc query.
The eight entities at a glance
| Entity | Storage | Purpose |
|---|---|---|
Request | ClickHouse | One LLM call. Cost, latency, tokens, full body. |
Trace | Supabase | One logical user interaction. Aggregates spans. |
Span | Supabase | One step inside a trace. Forms the tree. |
Prompt Version | Supabase | Immutable snapshot of a prompt template. |
Evaluator | Supabase | How to score outputs (LLM-as-judge config). |
Eval Run + Eval Result | Supabase | One execution of an evaluator + per-sample scores. |
Dataset + Dataset Item | Supabase | Reusable test inputs for offline evaluation. |
User / Session | columns on Request | Header-driven grouping. No separate table. |
Storage split: why two databases?
High-cardinality append-only data (Requests) lives in ClickHouse for columnar compression and fast time-range aggregation. Relational data with frequent updates (Traces, Prompts, Evals, billing, RLS) lives in Supabase Postgres. They are joined at the application layer via shared UUIDs.
- ClickHouse:
requests(the only table) - Supabase: everything else, including the
organization_idtenant boundary enforced by Row Level Security
Self-hosting? Both stores are part of the bundled docker-compose. See Self-hosting.
Request
A Request is one LLM call. Created by the proxy automatically; you never create one from the SDK directly.
Table: requests (ClickHouse)
Order key: (organization_id, project_id, created_at, id)
Partition: monthly by created_at
TTL: 365 days, plan-based filtering at query timetext| Field | Type | Notes |
|---|---|---|
id | UUID | Stable id, surfaced in /requests URL. |
organization_id / project_id | UUID | Tenant scope. Always filter on these. |
provider / model | LowCardinality(String) | Compressed for fast group-by queries. |
prompt_tokens / completion_tokens / total_tokens | UInt32 | Parsed from provider response. |
cache_read_tokens / cache_write_tokens | UInt32 | From OpenAI prompt cache, Anthropic cache control. |
cost_usd | Decimal(18, 8) | Computed from model_prices at log time. Null if model is unseeded. |
latency_ms / proxy_overhead_ms | UInt32 | End-to-end and our share. |
status_code | UInt16 | HTTP status from upstream. |
request_body / response_body | String, ZSTD(3) | Full bodies. Can be empty if x-spanlens-log-body=meta|none. |
trace_id / span_id | Nullable(UUID) | Set when the call ran inside a Spanlens trace. |
prompt_version_id | Nullable(UUID) | Set via x-spanlens-prompt-version. |
user_id / session_id | Nullable(String) | Set via x-spanlens-user / x-spanlens-session. |
flags / response_flags | String (JSON) | Security findings (PII, jailbreak). See Security. |
truncated | Bool | True if the stream was cut at the 290s deadline. |
Where it appears in the UI
- /requests: every Request, filterable
- /users: Requests grouped by
user_id - /savings: cost analysis joined with
model_prices - /anomalies: spikes detected by aggregating Requests over time windows
Trace
A Trace groups Spans for one logical interaction (one user question, one cron tick, one webhook). Created explicitly via the SDK or implicitly by the LangChain / Vercel AI callback handlers.
Table: traces (Supabase)
Status: running | completed | error
Aggregates (refreshed by DB trigger on span change):
span_count, total_tokens, total_cost_usd, duration_mstextThe trigger means dashboards only have to query traces to render summary rows; they do not have to re-aggregate Spans on every read.
UI: /traces (list + detail with waterfall).
Span
A Span is one step inside a Trace: an LLM call, a tool call, a retrieval, an embedding, or arbitrary custom work. Spans form a tree via parent_span_id.
Table: spans (Supabase)
span_type: llm | tool | retrieval | embedding | custom
parent_span_id: UUID, NO FK constraint (intentional)textThe lack of a foreign key on parent_span_id is deliberate. Real agent code (LangGraph parallel fan-out, Promise.all([...])) closes spans in non-deterministic order. The lack of FK lets the database accept the spans in whatever order they arrive without rejecting late writes.
LLM spans optionally link to the underlying Request via request_id when the call went through the Spanlens proxy. This is how the trace waterfall shows token counts and cost on LLM nodes without re-querying ClickHouse.
Prompt and Prompt Version
A Prompt is just a name. A Prompt Version is an immutable snapshot of that prompt's content. Creating a new version never touches the old one; old logged requests keep their link to the version they actually used.
Table: prompt_versions (Supabase)
Unique: (organization_id, name, version)
content: text (the template)
variables: jsonb [{ name, description, required }]textRequests link to a version via requests.prompt_version_id, set when the call carries the x-spanlens-prompt-version header (SDK helper: withPromptVersion('name@version')).
UI: /prompts (list + version tree), Prompt A/B view (compare two versions on production traffic with significance tests).
Evaluator
Reusable definition of how to score outputs. Independent of any specific run.
Table: evaluators (Supabase)
type: 'llm_judge' (only type today)
config: jsonb {
criterion,
judge_provider, judge_model,
scale_min, scale_max (normalized to 0..1 on save)
}textUI: /evals (evaluator list).
Eval Run and Eval Result
An Eval Run is one execution of an Evaluator over N samples. Each Eval Result is one score for one sample (either a Request from production, or a Dataset Item).
Table: eval_runs (Supabase)
source: 'production' | 'dataset'
sample_size: 1..1000
status: pending | running | completed | failed
avg_score: numeric (set on completion)
Table: eval_results (Supabase)
request_id or dataset_item_id (exactly one)
score: 0..1
reasoning: judge's explanation
judge_cost_usd, judge_tokenstextUI: /evals (run history per evaluator, drill into score distribution, list 5 lowest-scoring samples).
Dataset and Dataset Item
A Dataset is a named collection of test cases. Each Dataset Item is one input, optionally with an expected_output for accuracy-style scoring.
Table: datasets (Supabase)
Unique: (organization_id, name)
Table: dataset_items (Supabase)
input: jsonb, two accepted shapes:
{ "variables": { "name": "Alice", ... } } ← for variable-based prompts
{ "messages": [ {role, content}, ... ] } ← for raw chat input
expected_output: text (optional)
source_request_id: links back to the production request it was imported fromtextDatasets can be populated three ways: manual entry, CSV import, or one-click "import this request" from the request detail page.
UI: /datasets.
User and Session
Spanlens does not have a Users table or a Sessions table. Both are columns on the Request row, populated from request headers:
| Concept | Header | SDK helper | Column on requests |
|---|---|---|---|
| End user | x-spanlens-user | withUser(id) | user_id |
| Session / conversation | x-spanlens-session | withSession(id) | session_id |
UI: /users aggregates by user_id, showing cost, request count, and last-seen timestamp per end user.
How they fit together
Organization (tenant boundary)
└── Project
├── API Key (sl_live_...)
│ └── Provider Key (encrypted OpenAI / Anthropic / Gemini key)
│
├── Prompt
│ └── Prompt Version (immutable)
│
├── Evaluator
│ └── Eval Run
│ └── Eval Result ──┐
│ │ (one per scored sample)
├── Dataset │
│ └── Dataset Item ─────┤
│ │
├── Trace │
│ └── Span │
│ └── (optional) request_id ──┐
│ │
└── Request (ClickHouse) ───────────┤
├── user_id (header column) │
├── session_id (header column) │
└── prompt_version_id ──────────┘textTenant boundary
Every table carries organization_id. On Supabase, Row Level Security forces every read and write to match the caller's organization. On ClickHouse, we enforce the same via the requestsScope helper in apps/server/src/lib/requests-query.ts; bypassing it is treated as a security bug.
Plan retention (Free 14 days, Pro 90 days, Team 365 days) is layered on top of the 365-day ClickHouse TTL: queries auto-filter by created_at >= now() - plan_retention unless the call is a billing read with ignoreRetention: true.
Next: LangGraph integration for graph topology specifics, or direct proxy for non-Node clients.