Troubleshooting

Start from the symptom you are seeing. Each section gives the likely cause and the fix. This page is organized by what you observe; the error-codes reference is organized by the stable error.code value, so branch your client logic there and use this page to diagnose.

HTTP status quick reference

Every 4xx / 5xx response carries the standard envelope (error.code, error.message, error.details, error.requestId). Match the status you got, then jump to the section below for the full walkthrough.

Status & codeLikely causeFix
401 UNAUTHORIZEDMissing, malformed, or wrong auth header. The runtime has no SPANLENS_API_KEY, or the upstream client is still pointed at the provider with the provider key.Send your Spanlens key (sl_live_…), not the provider key. OpenAI/Azure clients use Authorization: Bearer sl_live_…; Anthropic uses x-api-key: sl_live_…; Gemini uses x-goog-api-key: sl_live_…. Confirm SPANLENS_API_KEY is set in the runtime.
403 PUBLIC_KEY_WRITE_FORBIDDENA public-scope key (sl_live_pub_…) was used on a write endpoint. Public keys are read-only by design and are rejected on /proxy/*, /ingest/*, and OTLP /v1/traces.Use a full-scope key (sl_live_… without the pub_ segment) for proxying and ingest. Keep public keys for MCP servers, BI tools, and read embeds only.
429 RATE_LIMITA rate limit fired. Two distinct kinds: (1) a limit you configured on a key/project/end-user, or (2) your Spanlens plan monthly request quota.Read details.source. "customer_limit" is one you set, so back off using Retry-After, or raise it on the Projects page. A plan/quota 429 includes an upgrade link; upgrade or wait for the window to reset.
400 NO_PROVIDER_KEYThe Spanlens key has no active provider key registered for the provider you called (inferred from the URL path).Open /projects, expand the Spanlens key, click Add provider key, pick the matching provider (OpenAI / Anthropic / Gemini / …), and paste your real AI key.
502 UPSTREAM_FAILEDThe upstream provider returned an error or the network to it failed. details.provider names which one.Usually a provider-side incident or a bad request the provider rejected. Check the provider status page, inspect the row in /requests for the upstream error body, then retry.
503 DECRYPT_FAILEDThe stored provider key could not be decrypted. Operator-side configuration drift: ENCRYPTION_KEY no longer matches the key used when the provider key was saved.Self-host only. The operator must restore the original ENCRYPTION_KEY, or re-enter every provider key under the new one. See the empty-decryption section below.
504 UPSTREAM_TIMEOUTA non-streaming upstream call did not return headers within the timeout (UPSTREAM_TIMEOUT_MS, 35s). The provider was slow or the generation was very long.Safe to retry. For long generations, switch to streaming (stream: true) so the first byte arrives in ~200 ms and you use the 290s stream budget instead of the 35s header timeout.

Authentication and scope (401 / 403)

A 401 UNAUTHORIZED means the server could not authenticate the request at all; a 403 means it authenticated you but the key is not allowed to do what you asked.

Send the Spanlens key on the header your provider SDK uses

You authenticate with your Spanlens key (sl_live_…), never the raw provider key. Each provider SDK puts the key on a different header, and the proxy accepts whichever one arrives:

OpenAI / Azure   Authorization: Bearer sl_live_...
Anthropic        x-api-key: sl_live_...
Gemini           x-goog-api-key: sl_live_...

If you use an official provider SDK, you do not set these by hand: pass the Spanlens key as the SDK's api_key and point base_url at the matching proxy URL. A stray 401 "Incorrect API key"almost always means one of: SPANLENS_API_KEY is missing in the runtime, the value has a typo or trailing whitespace, or the client is still constructed with the upstream baseURL so the request never reaches Spanlens.

403 PUBLIC_KEY_WRITE_FORBIDDEN (wrong key scope)

Spanlens keys come in two scopes. A full key (sl_live_…) can proxy, ingest, and read. A public key (sl_live_pub_…) is read-only and is deliberately rejected on every write path (/proxy/*, /ingest/*, and OTLP /v1/traces) with:

HTTP/1.1 403 Forbidden

{
  "error": {
    "code": "PUBLIC_KEY_WRITE_FORBIDDEN",
    "message": "Public scope keys cannot use proxy, ingest, or OTLP endpoints",
    "details": { "scope": "public" }
  }
}

Fix: use a full-scope key for proxying and ingest. Public keys are meant for places where the key is exposed in plaintext (MCP servers, BI dashboards, read-only embeds), so they can only call the read APIs. You issue full keys per project and public keys per workspace on the Projects page.

Which rate limit fired (429)

A 429 RATE_LIMIT can come from two very different places. Read error.details.source to tell them apart before you change anything.

Customer-configured limit (details.source = "customer_limit")

This is a limit you set on a Spanlens key, a project, or an individual end-user from the Projects page. It exists precisely to throttle that traffic, so exceeding it does reject the call. The body tells you which limit fired:

{
  "error": {
    "code": "RATE_LIMIT",
    "message": "Customer-configured rate limit exceeded (end_user): 60 requests per 60s.",
    "details": {
      "source": "customer_limit",
      "scope": "end_user",
      "limit": 60,
      "window_seconds": 60,
      "end_user_id": "user_123"
    }
  }
}

The response also carries Retry-After (window length in seconds) and X-Spanlens-RateLimit-Scope (api_key, project, or end_user). A customer_limit 429 never includes an upgrade link, which is how you distinguish it from a plan limit. Per-end-user limits bucket on the X-Spanlens-User header, so send it for those limits to apply. Fix: back off until Retry-After, or raise the limit on the Projects page.

Spanlens plan quota

Separate from your own limits, your plan has a monthly request quota. When that is the limit that fired, the error includes plan and upgrade context. Fix: upgrade the plan or wait for the quota window to reset. See billing & quotas.

There is also a high platform ceiling on /proxy/* that only exists to stop a runaway loop. Going over it does not reject the request: the call still passes through and the response carries X-Spanlens-RateLimit-Overage: true so you can spot the spike. Every proxy response also carries X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset (unix epoch second), and X-RateLimit-Window; read X-RateLimit-Resetdirectly instead of parsing your own clock to avoid skew. Details in the proxy docs.

Upstream and provider-key errors (502 / 503 / 504)

These are the failures that happen after your request is authenticated but before (or during) the call to the real provider.

  • 502 UPSTREAM_FAILED means the provider returned an error or the network to it failed. details.provider names which provider. Check the provider status page and the row in /requests for the upstream error body, then retry.
  • 504 UPSTREAM_TIMEOUT means a non-streaming call did not return headers within the ~35s header timeout. The provider was slow or the generation was long. Safe to retry; better yet switch to streaming (see truncated responses below).
  • 503 DECRYPT_FAILED means the stored provider key could not be decrypted. This is operator-side configuration drift, not a client bug. See the next section.

Requests are not appearing in the dashboard

You are making LLM calls but /requests stays empty. Work through these in order; each is a real cause we have seen in the field.

  1. The Spanlens key is not set in the deployed environment. Setting SPANLENS_API_KEY in .env.local covers local dev only. Add it to your production environment (Vercel / Railway / Fly) as well, then redeploy, because new env values do not apply to existing deployments.
  2. The request is still going to the provider directly. In your Network tab the call should hit server.spanlens.io/proxy/*, not api.openai.com (or the equivalent). If it is not, the client is constructed with the upstream baseURL; use the SDK helper or set base_url to the proxy.
  3. No provider key is registered → the call returns 400 NO_PROVIDER_KEY instead of logging. Add one under the Spanlens key on /projects.
  4. Your app is silently in mock mode.Some apps return a canned 200 "mock" response when an API key env var is missing, so the AI looks like it works but no real call is ever made and nothing reaches Spanlens. After adding env vars, confirm a real row lands in /requests rather than trusting the app response.

Verify with one curl call

The fastest way to isolate app config from Spanlens config is to bypass your app entirely and hit the proxy directly. A row should appear in /requests within a few seconds:

curl https://server.spanlens.io/proxy/openai/v1/chat/completions \
  -H "Authorization: Bearer $SPANLENS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "ping"}]
  }'
  • Get a 200 and a row appears → Spanlens is configured correctly; the gap is in your application (env not loaded, wrong base URL, or mock mode).
  • Get a 401 → the key is wrong or not exported in this shell.
  • Get a 400 NO_PROVIDER_KEY → register a provider key (step 3 above).

Traces show Spans: 0 / Tokens: 0

A trace exists in /traces but shows zero spans and zero tokens. This is the signature of an outdated SDK. Older SDK versions fired the trace and its child-span ingest POSTs concurrently, so a span could arrive before the trace row was committed and be silently dropped; a late end() patch then found no row to update. Short traces happened to pass, which is why it slips through in dev.

Fix: upgrade the SDK. The ingest ordering race was fixed by chaining child POSTs after the parent creation promise; make sure you are on a current release:

pnpm add @spanlens/sdk@latest
# or: npm install @spanlens/sdk@latest

If you built a custom ingest client instead of using the SDK, replicate the same rule: do not POST a child span until the parent trace create has resolved, and do not send an end() patch until the span create has resolved. See the SDK reference.

Response cut off or shows a "truncated" badge

A streaming response stops early and the row in /requestscarries a truncated badge. The proxy runs on Vercel Pro with a 300-second hard ceiling and gracefully closes streams at 290 seconds to leave room to flush the log. A generation that runs past that budget is cut off, the partial output is logged with truncated: true, and your client sees the stream end before [DONE] / message_stop.

Options, in order of preference:

  • Lower the work per call. Reduce max_tokens or split the task so a single generation finishes well inside 290 seconds.
  • Use streaming with chunked accumulation. For large max_tokens or JSON mode with big outputs, set stream: true and accumulate chunks, and the first byte arrives in ~200 ms and you get partial output even if the deadline is hit.
  • Self-hosting? The deadline is tunable with the STREAM_DEADLINE_MS env var (default 290000). On a Vercel Hobby plan, whose function ceiling is 60s, set it to about 50000.

Note: a non-streaming call that exceeds the header timeout returns 504 UPSTREAM_TIMEOUTinstead of a truncated row, which is the same underlying "too slow" problem, which is why streaming is the fix for both. More detail in the proxy docs.

Provider key decryption returns empty (self-host)

On a self-hosted instance, provider keys are stored encrypted with AES-256-GCM under your ENCRYPTION_KEY. If that key does not match the one in use when a provider key was saved, decryption can fail quietly and yield an empty string rather than an error, so the upstream call goes out with no credential and the provider rejects it. When the failure surfaces as a typed error it is 503 DECRYPT_FAILED.

Checklist for the operator:

  • ENCRYPTION_KEY must be a 32-byte base64 value and must be identical across every instance and deploy that reads the same database.
  • If the key was rotated, the old provider-key rows can no longer be decrypted. Restore the original ENCRYPTION_KEY, or re-enter every provider key on the Projects page so they are re-encrypted under the new value.
  • Watch for calls that reach the provider with no credential (upstream 401): an empty decryption result is the usual cause. See self-hosting for environment setup.

Still stuck? Look up the exact code in the error-codes reference, review the proxy docs for headers and the stream deadline, or open an issue at github.com/spanlens/Spanlens/issues with the error.requestId from the response so the operator can pull the matching logs.