Same proxy-first architecture, with more observability depth: Critical Path tracing, statistical A/B testing, and a fallback-replay durability layer.
Helicone proved the proxy-based LLM observability model and ships a polished, focused product, though it entered maintenance mode after its 2026 Mintlify acquisition. Spanlens uses the same architecture, is actively developed, and adds Critical Path tracing on agent runs, Welch t-test on A/B prompt experiments, judge to human correlation tracking, and a ClickHouse fallback-replay queue so logs are not silently dropped on infra hiccups.
| Feature | Spanlens | Helicone |
|---|---|---|
| Proxy-based instrumentation | Yes | Yes |
| 1-line baseURL swap | Yes | Yes |
| OpenTelemetry (OTLP) ingest | Yes | Partial |
| Streaming response support | Yes | Yes |
| Major provider proxies | Yes | Yes |
| Local LLMs (Ollama) via SDK | Yes | Partial |
| Multi-step span trees | Yes | Yes |
| Critical Path highlighting | Yes | No |
| Retry span annotation | Yes | Partial |
| Versioned prompt library | Yes | Yes |
| Prompt A/B traffic split | Yes | Partial |
| Built-in Welch t-test on A/B results | Yes | No |
| Prompt playground | Yes | Yes |
| LLM-as-judge scoring | Yes | Partial |
| Human annotation queue | Yes | Partial |
| Judge to human correlation tracking | Yes | Partial |
| Datasets / golden test sets | Yes | Partial |
| Security scanning (API keys, PII, prompt injection) | Yes | Partial |
| Per-call log-body opt-out header | Yes | Partial |
| ClickHouse fallback-replay queue | Yes | No |
| Stream deadline with truncation flag | Yes | Partial |
| Fully open source | Yes | Yes |
| Docker Compose self-host | Yes | Yes |
| Managed cloud option | Yes | Yes |
Updated 2026-06-03. Scroll for the grouped view with notes below.
Helicone was acquired by Mintlify in 2026 and is now in maintenance mode: security patches, bug fixes, and new-model support continue, but active feature development has ended and the founders moved on. Spanlens is actively building. If you want a tool that keeps shipping new capabilities, that gap matters.
Multi-step agents show as waterfalls in both tools. Only Spanlens highlights the longest dependency chain, the actual bottleneck. Helicone shows you spans, and you find the slow one yourself.
Spanlens lets you split traffic between prompt variants and reports statistical significance (Welch t-test on latency and cost, plus a z-test on error rate). Helicone supports prompt versioning, but A/B comparison and significance testing are bring-your-own.
Spanlens lets you annotate by hand and measures how well your LLM judge tracks human raters. If your judge drifts, you see it as a metric. Helicone supports custom scores but does not name this correlation as a first-class feature.
Spanlens proactively flags routes where a smaller model would match quality and quotes the monthly savings. Helicone has cost dashboards, and the swap recommendation is left as a manual exercise.
Spanlens writes to ClickHouse for analytics. If ClickHouse hiccups, requests fall back to a Postgres queue and replay automatically when it recovers, so logs are not silently dropped. This durability layer is a Spanlens-specific design.
Spanlens layers 3σ anomaly detection on top of agent trace data, so a slow critical-path span is also flagged when latency drifts off its 7-day baseline. The two surfaces reinforce each other inside one product.
Last updated 2026-06-03 · Spot something inaccurate? Let us know.
We don't think every team should pick us. Here's where Helicone legitimately wins.
Helicone has been public longer with extensive docs and case studies. If proven adoption is your top criterion, Helicone is ahead. Spanlens shipped in 2026 with Critical Path tracing, Welch t-test A/B, and the ClickHouse fallback queue already in v1.
Helicone supports a broad set of SDKs and frameworks out of the box. If you're using a less-common provider or SDK, check both lists before committing.
Helicone is a more focused product. If you want logging and cost dashboards and nothing else, Helicone's narrower scope is easier to onboard. Spanlens covers the same surface in its default dashboard and only adds depth when you opt into it.
Helicone leans into proxy-gateway features like custom rate limiting, retries, and caching at the edge. Spanlens currently focuses on observability and leaves gateway concerns to upstream tools.
Helicone was acquired by Mintlify in 2026 and is now in maintenance mode: security patches, bug fixes, and new-model support continue, but active feature development has ended and the founders moved on. Spanlens is actively building. If you want a tool that keeps shipping new capabilities, that gap matters.
Multi-step agents show as waterfalls in both tools. Only Spanlens highlights the longest dependency chain, the actual bottleneck. Helicone shows you spans, and you find the slow one yourself.
Spanlens lets you split traffic between prompt variants and reports statistical significance (Welch t-test on latency and cost, plus a z-test on error rate). Helicone supports prompt versioning, but A/B comparison and significance testing are bring-your-own.
Spanlens lets you annotate by hand and measures how well your LLM judge tracks human raters. If your judge drifts, you see it as a metric. Helicone supports custom scores but does not name this correlation as a first-class feature.
Spanlens proactively flags routes where a smaller model would match quality and quotes the monthly savings. Helicone has cost dashboards, and the swap recommendation is left as a manual exercise.
Spanlens writes to ClickHouse for analytics. If ClickHouse hiccups, requests fall back to a Postgres queue and replay automatically when it recovers, so logs are not silently dropped. This durability layer is a Spanlens-specific design.
Spanlens layers 3σ anomaly detection on top of agent trace data, so a slow critical-path span is also flagged when latency drifts off its 7-day baseline. The two surfaces reinforce each other inside one product.
Helicone has been public longer with extensive docs and case studies. If proven adoption is your top criterion, Helicone is ahead. Spanlens shipped in 2026 with Critical Path tracing, Welch t-test A/B, and the ClickHouse fallback queue already in v1.
Helicone supports a broad set of SDKs and frameworks out of the box. If you're using a less-common provider or SDK, check both lists before committing.
Helicone is a more focused product. If you want logging and cost dashboards and nothing else, Helicone's narrower scope is easier to onboard. Spanlens covers the same surface in its default dashboard and only adds depth when you opt into it.
Helicone leans into proxy-gateway features like custom rate limiting, retries, and caching at the edge. Spanlens currently focuses on observability and leaves gateway concerns to upstream tools.
If you want a battle-tested proxy with a focused feature set, Helicone is a strong choice. If you want the same proxy ergonomics plus deeper agent analytics, statistical A/B, and log durability, try Spanlens.
Free tier · No credit card · Self-host with Docker