Data model

Spanlens stores everything in eight core entities. Most of what you do in the dashboard is querying one or two of them. This page lays out each entity, what it stores, how it relates to the others, and which UI surface reads from it, so you can pick the right primitive when you write your own integration or run an ad-hoc query.

The eight entities at a glance

EntityStoragePurpose
RequestClickHouseOne LLM call. Cost, latency, tokens, full body.
TraceSupabaseOne logical user interaction. Aggregates spans.
SpanSupabaseOne step inside a trace. Forms the tree.
Prompt VersionSupabaseImmutable snapshot of a prompt template.
EvaluatorSupabaseHow to score outputs (LLM-as-judge config).
Eval Run + Eval ResultSupabaseOne execution of an evaluator + per-sample scores.
Dataset + Dataset ItemSupabaseReusable test inputs for offline evaluation.
User / Sessioncolumns on RequestHeader-driven grouping. No separate table.

Storage split: why two databases?

High-cardinality append-only data (Requests) lives in ClickHouse for columnar compression and fast time-range aggregation. Relational data with frequent updates (Traces, Prompts, Evals, billing, RLS) lives in Supabase Postgres. They are joined at the application layer via shared UUIDs.

  • ClickHouse: requests (the only table)
  • Supabase: everything else, including the organization_id tenant boundary enforced by Row Level Security

Self-hosting? Both stores are part of the bundled docker-compose. See Self-hosting.

Request

A Request is one LLM call. Created by the proxy automatically; you never create one from the SDK directly.

Table: requests (ClickHouse)
Order key: (organization_id, project_id, created_at, id)
Partition: monthly by created_at
TTL: 365 days, plan-based filtering at query time
text
FieldTypeNotes
idUUIDStable id, surfaced in /requests URL.
organization_id / project_idUUIDTenant scope. Always filter on these.
provider / modelLowCardinality(String)Compressed for fast group-by queries.
prompt_tokens / completion_tokens / total_tokensUInt32Parsed from provider response.
cache_read_tokens / cache_write_tokensUInt32From OpenAI prompt cache, Anthropic cache control.
cost_usdDecimal(18, 8)Computed from model_prices at log time. Null if model is unseeded.
latency_ms / proxy_overhead_msUInt32End-to-end and our share.
status_codeUInt16HTTP status from upstream.
request_body / response_bodyString, ZSTD(3)Full bodies. Can be empty if x-spanlens-log-body=meta|none.
trace_id / span_idNullable(UUID)Set when the call ran inside a Spanlens trace.
prompt_version_idNullable(UUID)Set via x-spanlens-prompt-version.
user_id / session_idNullable(String)Set via x-spanlens-user / x-spanlens-session.
flags / response_flagsString (JSON)Security findings (PII, jailbreak). See Security.
truncatedBoolTrue if the stream was cut at the 290s deadline.

Where it appears in the UI

  • /requests: every Request, filterable
  • /users: Requests grouped by user_id
  • /savings: cost analysis joined with model_prices
  • /anomalies: spikes detected by aggregating Requests over time windows

Trace

A Trace groups Spans for one logical interaction (one user question, one cron tick, one webhook). Created explicitly via the SDK or implicitly by the LangChain / Vercel AI callback handlers.

Table: traces (Supabase)
Status: running | completed | error
Aggregates (refreshed by DB trigger on span change):
  span_count, total_tokens, total_cost_usd, duration_ms
text

The trigger means dashboards only have to query traces to render summary rows; they do not have to re-aggregate Spans on every read.

UI: /traces (list + detail with waterfall).

Span

A Span is one step inside a Trace: an LLM call, a tool call, a retrieval, an embedding, or arbitrary custom work. Spans form a tree via parent_span_id.

Table: spans (Supabase)
span_type: llm | tool | retrieval | embedding | custom
parent_span_id: UUID, NO FK constraint (intentional)
text

The lack of a foreign key on parent_span_id is deliberate. Real agent code (LangGraph parallel fan-out, Promise.all([...])) closes spans in non-deterministic order. The lack of FK lets the database accept the spans in whatever order they arrive without rejecting late writes.

LLM spans optionally link to the underlying Request via request_id when the call went through the Spanlens proxy. This is how the trace waterfall shows token counts and cost on LLM nodes without re-querying ClickHouse.

Prompt and Prompt Version

A Prompt is just a name. A Prompt Version is an immutable snapshot of that prompt's content. Creating a new version never touches the old one; old logged requests keep their link to the version they actually used.

Table: prompt_versions (Supabase)
Unique: (organization_id, name, version)
content: text (the template)
variables: jsonb [{ name, description, required }]
text

Requests link to a version via requests.prompt_version_id, set when the call carries the x-spanlens-prompt-version header (SDK helper: withPromptVersion('name@version')).

UI: /prompts (list + version tree), Prompt A/B view (compare two versions on production traffic with significance tests).

Evaluator

Reusable definition of how to score outputs. Independent of any specific run.

Table: evaluators (Supabase)
type: 'llm_judge' (only type today)
config: jsonb {
  criterion,
  judge_provider, judge_model,
  scale_min, scale_max (normalized to 0..1 on save)
}
text

UI: /evals (evaluator list).

Eval Run and Eval Result

An Eval Run is one execution of an Evaluator over N samples. Each Eval Result is one score for one sample (either a Request from production, or a Dataset Item).

Table: eval_runs (Supabase)
source: 'production' | 'dataset'
sample_size: 1..1000
status: pending | running | completed | failed
avg_score: numeric (set on completion)

Table: eval_results (Supabase)
request_id or dataset_item_id (exactly one)
score: 0..1
reasoning: judge's explanation
judge_cost_usd, judge_tokens
text

UI: /evals (run history per evaluator, drill into score distribution, list 5 lowest-scoring samples).

Dataset and Dataset Item

A Dataset is a named collection of test cases. Each Dataset Item is one input, optionally with an expected_output for accuracy-style scoring.

Table: datasets (Supabase)
Unique: (organization_id, name)

Table: dataset_items (Supabase)
input: jsonb, two accepted shapes:
  { "variables": { "name": "Alice", ... } }   ← for variable-based prompts
  { "messages": [ {role, content}, ... ] }    ← for raw chat input
expected_output: text (optional)
source_request_id: links back to the production request it was imported from
text

Datasets can be populated three ways: manual entry, CSV import, or one-click "import this request" from the request detail page.

UI: /datasets.

User and Session

Spanlens does not have a Users table or a Sessions table. Both are columns on the Request row, populated from request headers:

ConceptHeaderSDK helperColumn on requests
End userx-spanlens-userwithUser(id)user_id
Session / conversationx-spanlens-sessionwithSession(id)session_id

UI: /users aggregates by user_id, showing cost, request count, and last-seen timestamp per end user.

How they fit together

Organization (tenant boundary)
└── Project
    ├── API Key (sl_live_...)
    │   └── Provider Key (encrypted OpenAI / Anthropic / Gemini key)
    │
    ├── Prompt
    │   └── Prompt Version (immutable)
    │
    ├── Evaluator
    │   └── Eval Run
    │       └── Eval Result ──┐
    │                         │ (one per scored sample)
    ├── Dataset               │
    │   └── Dataset Item ─────┤
    │                         │
    ├── Trace                 │
    │   └── Span              │
    │       └── (optional) request_id ──┐
    │                                   │
    └── Request (ClickHouse) ───────────┤
        ├── user_id (header column)     │
        ├── session_id (header column)  │
        └── prompt_version_id ──────────┘
text

Tenant boundary

Every table carries organization_id. On Supabase, Row Level Security forces every read and write to match the caller's organization. On ClickHouse, we enforce the same via the requestsScope helper in apps/server/src/lib/requests-query.ts; bypassing it is treated as a security bug.

Plan retention (Free 14 days, Pro 90 days, Team 365 days) is layered on top of the 365-day ClickHouse TTL: queries auto-filter by created_at >= now() - plan_retention unless the call is a billing read with ignoreRetention: true.


Next: LangGraph integration for graph topology specifics, or direct proxy for non-Node clients.