Changelog
What is new in Spanlens. Updated when something ships, not on a calendar.
- Feature#
LangGraph topology view in /traces
Trace detail pages now have a Timeline / Graph toggle. The Graph view renders LangGraph (and any LangChain callback) traces as a node-and-edge diagram, with each `chain.*` span as a node and edges inferred from sibling execution order. Parallel fan-out, sequential transitions, and the critical path are all visible at a glance.
Critical-path nodes and edges are drawn in accent color so the slowest dependency chain stands out without reading numbers. Click any node to open the existing span drawer.
The Graph tab is enabled automatically when a trace contains enough `chain.*` spans to be worth the view (currently 20% of total spans). Simple two-call RAG traces continue to default to the Gantt. See the LangGraph integration docs for instrumentation details.
- Reliability#
Public status page at status.spanlens.io
Independent monitoring of the proxy (liveness + deep health) and the dashboard, posted at status.spanlens.io.
Subscribe by email or RSS directly on the page. The page runs on Better Stack and is monitored from four global regions every 3 minutes.
- Docs#
Docs: migration guides, data model reference, tutorials, production guides
Nine new doc pages: drop-in migration guides for Langfuse, Helicone, and LangSmith, a single-page data model reference, a dedicated LangGraph integration, three tutorials (RAG chatbot, agent tracing, nightly evals), and two production guides (reliability, scaling).
Also: /doc (missing-s typo) now permanently redirects to /docs.
- Feature#
In-app feedback button
Floating feedback button on every dashboard page. Sends thoughts straight to the team without leaving your workflow.
- ImprovementInfrastructure#
Full OpenAI and Anthropic model price catalog
Cost calculations now cover every current OpenAI and Anthropic model, including dated variants (e.g. gpt-4o-mini-2024-07-18 maps to gpt-4o-mini pricing). Tiered pricing and prompt cache discounts are honored.
- Reliability#
Zero log loss during ClickHouse outages
When the ClickHouse insert fails, the request row is queued in a Supabase fallback table instead of dropped. A cron drains the queue every 5 minutes once ClickHouse recovers.
You can monitor the queue depth from GET /health/deep as `fallback.queue`.
- Improvement#
Streaming deadline with graceful close at 290s
Long-running streams that approach the Vercel 300s ceiling now close gracefully at 290s, with the partial response body logged and a `truncated` badge in /requests. Previously these would silently disappear when the platform killed the function.
- Infrastructure#
Requests moved to ClickHouse columnar storage
The `requests` table now lives in ClickHouse with monthly partitioning and ZSTD body compression. Time-range queries on /requests are 5-20x faster, storage cost is ~3x lower for the same body data.
- Feature#
Evals: LLM-as-judge scoring of production responses
Define a reusable evaluator (criterion + judge model), run it against a sample of production traffic for a specific prompt version, get a 0..1 score per sample with reasoning. See the Evals docs or the nightly evals tutorial.
- Feature#
Datasets: reusable test inputs for offline evaluation
Create named datasets of (input, optional expected_output) pairs and run evaluators against them instead of sampling production. One-click "import this request as a dataset item" from the request detail view.
- Feature#
Human annotation queue
Sample N requests, score them in a queue UI, capture human ratings alongside LLM-judge scores. See the Annotation docs.
- Improvement#
Unified API keys (one key per project, provider-agnostic)
Spanlens keys (`sl_live_*`) are now provider-agnostic. One key authenticates calls to OpenAI, Anthropic, Gemini, and Azure OpenAI; the provider is inferred from the request URL path.
Provider keys are registered separately per project and stay server-side, AES-256-GCM encrypted at rest.
- Feature#
Prompt A/B comparison with statistical significance
Compare two prompt versions on production traffic. Cost, latency, and quality (when an evaluator is attached) come with confidence intervals and significance tests so you stop shipping changes based on 10 samples.
- Feature#
Agent tracing with parallel span fan-out
Group related LLM calls, tool calls, and retrievals into one trace tree. Spans intentionally have no foreign key on `parent_span_id` so out-of-order parallel branches never break the tree. Critical path is highlighted in the waterfall.