NEWSDK v0.6.1 with Ollama (local LLMs) and LangGraph tracing

One line.
Every LLM call, observed.

Record every OpenAI, Anthropic, and Gemini call: cost, latency, tokens, full request and response. Then score quality, run experiments, catch anomalies and PII, and ship cheaper models with proof.

$ npx @spanlens/cli init
$ pip install spanlens
TypeScript · Python · Next.js, Node, Edge · MIT · self-hostable
Try live demo →
spanlens.io / requests
LIVE
All modelsLast 24hStatus: all+ filter12,481 events / 1h
ModelEndpointLatencyTokensCostStatusAge
claude-sonnet-4.5/chat1240ms2,104$0.03122002s
gpt-4o-mini/extract410ms612$0.00092005s
gpt-4o/summarize3440ms3,218$0.04822007s
gemini-2.0-flash/rerank180ms240$0.00012009s
claude-haiku-4.5/chat680ms984$0.001820011s
gpt-4o/classify2120ms1,840$0.027642914s
What you get

The lens. Not the weight.

Spanlens sits in front of your provider. No agents to run. No SDK to rewrite. One baseURLand you're done.
01$0.0021
Request log
Every call with model, tokens, cost, latency, and full body. Filter, group, export.
02−38%
Cost tracking
Per-request breakdown, daily rollups, budget alerts before you blow the month.
0312 spans
Agent tracing
Multi-step workflows as waterfall span trees. Find the one step that took 18s.
043.1σ
Anomaly detection
3σ deviations in latency or cost vs. your 7-day baseline, flagged on arrival.
05SSN · email
PII + injection scan
Regex detection on request bodies at log time. API keys auto-masked before storage; PII patterns flagged for review.
06−$412/mo
Model recommender
"Your gpt-4o calls look like classification, try gpt-4o-mini." With numbers.
070.82 avg
Evals
LLM-as-judge scores every response 0 to 1. Know if v8 is actually better than v7, not just cheaper.
08v7 vs v8
Experiments & datasets
Replay a fixed dataset across prompt versions and models. Quality, cost, and latency side by side.
091,204 users
User analytics
Per end-user and per-session cost, volume, and errors. Find the customer burning your budget.
Cost visibility

See the bill before it arrives.

Per-team, per-model, per-route cost. Daily rollups. Budget alerts by Slack or webhook. One place to answer “why did our OpenAI bill jump?”

gpt-4o$421.80$182.40−57%
claude-sonnet-4$189.40$192.20+1.5%
gemini-2.0-flash$21.40$24.10+13%
This month · projected
$2,481−$1,218
vs. last month. 3 model-swap suggestions pending.
APR 01APR 10APR 23 ← today
Agent tracing

Find the one span
that cost you 18 seconds.

Multi-step agents as waterfall trees. Critical path, cost attribution, and latency outliers, highlighted automatically.

critical path · 78% of wall-clock in 1 span
cost attribution · per LLM, per tool
retry & error spans as first-class
trace_8812· support agent · 8.24s ·critical: summarize_tickets v7
agent.run
8.24s
classify_intent
520ms
kb_search
680ms
summarize_tickets · v7
5.8s · critical
llm.sonnet-4.5
5.4s
format_reply
480ms
The improvement loop

Don't just watch.
Improve.

Cost and latency tell you what happened. Spanlens tells you whether it got better. Capture real traffic into datasets, score it with an LLM judge or your own team, then run the next prompt version against it before you ship.

Evals · LLM-as-judge scores, 0 to 1, per version
Experiments · replay a dataset across versions and models
Annotation · human review to build golden sets
Playground · iterate on real inputs, compare side by side
experiment_241· support-reply · 320 cases ·winner: v8
VersionQualityCost / 1kp50
gpt-4o · v70.71$4.821240ms
gpt-4o-mini · v80.82$0.31410ms
gpt-4o-mini · v60.64$0.30430ms
v8 · +0.11 quality · −94% cost · same dataset
The product

One platform. One source of truth.

Every screen reads the same span store. Move from a cost chart to the exact failing request in two clicks.
Requests12,481 / 1h
Full body, headers, cost. Filter, group, replay.
Traces842 / day
Waterfall with critical path & retry spans.
Prompts24 · v7
Versioned library, diff, A/B, gradual rollout.
Anomalies3 open · high
7-day rolling baseline, z-score triggers.
Security48 masked
PII · secrets · injection · jailbreak detectors.
Savings$7.2k / mo
Swap, cache, trim. Ranked by evidence.
Users1,204
Per end-user and session cost, volume, and error rates.
Playground4 models
Test a prompt across models and versions on real inputs.
Evals0.82 avg
LLM-as-judge scoring per prompt version. Quality, quantified.
Experimentsv7 vs v8
Run a dataset through versions and models. Compare on evidence.
Datasets320 cases
Golden test cases built from real production traffic.
Annotation58 reviewed
Human-in-the-loop review to build labels and golden sets.
Works with
OpenAIsdk · azure
Anthropicsdk · bedrock
Googlegemini · vertex
Mistralsdk · api
TypeScript SDK@spanlens/sdk
Python SDKpip · 3.9+
LangChainjs · py
LlamaIndexjs · py
Vercel AI SDKjs
Self-hostable

Your data, your VPC.

Run Spanlens in your cluster with Docker Compose or a single binary. Prompts and completions never leave your network.

Self-host docs →docker-compose.ymlSingle binary
# one-liner · docker
docker run -d --name spanlens \
-p 3001:3001 \
-e SUPABASE_URL="https://..." \
-e ENCRYPTION_KEY="$(openssl rand -base64 32)" \
ghcr.io/spanlens/spanlens-server:latest
# → curl http://localhost:3001/health
Built for teams

Ship together. Stay audited.

Projects isolate workloads, roles and invitations manage the whole team, and an audit log records every change. Wire Spanlens into your stack with webhooks and alerts.

ProjectsRoles & invitationsAudit logWebhooksAlertsSaved filters
Pricing

Simple. Flat monthly.

Free while you're small. Flat monthly fee, not per seat. Self-host is free forever.

Free
$0/mo
·50K req / mo
·14 day retention
·1 seat
·All core features
·Community support
Start free
Most popular
Pro
$29/mo
·100K req / mo
·90 day retention
·3 seats
·5 alerts · email notify
·+$8 / 100K extra
Start Pro
Team
$149/mo
·1M req / mo
·365 day retention
·10 seats
·Slack · webhooks · unlimited alerts
·+$5 / 100K extra
Start Team
Enterprise
Custom
·Custom volume & rate limits
·365 day retention (extendable)
·SSO (SAML / Okta)
·Unlimited seats
·Dedicated support + SLA
Contact us
FAQ

Reasonable questions.

How does instrumentation work?
Swap the provider SDK for our drop-in. Same surface, same types. We record the full request and response on the wire, with no extra round-trip and no sampling by default.
What about latency overhead?
p99 overhead is under 3ms. Ingestion happens async in a worker. If we ever fail, your request completes anyway. Spanlens never sits on the critical path.
How do you handle PII?
PII detectors (SSN, credit card, email, IBAN, passport, etc.) run at log time and flag matches for review in the Security dashboard, without blocking the request. API keys that slip into prompts are auto-masked before the row lands on disk. For workloads where prompt bodies must not be stored at all, opt out per-call with X-Spanlens-Log-Body: meta.
Do you support OpenTelemetry?
Yes. OTLP/HTTP ingest and export. Your existing OTel tracing flows into the same span store; LLM spans get LLM-specific attributes on top.
What's the data retention?
Free is 14 days. Pro is 90 days, Team is 365 days. Enterprise & self-hosted are configurable, including unlimited.
Can I export my data?
Anytime. JSON, CSV, Parquet. Or pipe the raw stream to S3, BigQuery, or your warehouse via our sink connectors.
Can Spanlens tell me if a prompt actually got better?
Yes. Evals scores responses with an LLM-as-judge on a 0 to 1 scale, per prompt version. Pair it with Experiments to replay a dataset across versions and models, so you compare quality, cost, and latency on the same inputs before you roll out.
Does Spanlens work for a whole team?
Projects isolate workloads, roles and invitations manage access, and audit logs record every change. Team and Enterprise add Slack, webhooks, unlimited alerts, and SSO.

See what your app is saying.

30-second setup. Your first 50,000 requests are on us. Cancel anytime. There's nothing to cancel.

Install

It's genuinely one line.

app/api/chat/route.ts
- import OpenAI from 'openai'
- const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
+ import { createOpenAI } from '@spanlens/sdk/openai'
+ const openai = createOpenAI()

  const res = await openai.chat.completions.create({ ... })
app/main.py
- from openai import OpenAI
- client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
+ from spanlens.integrations.openai import create_openai
+ client = create_openai()

  res = client.chat.completions.create(...)