OpenAI Assistants API integration

The Assistants API encapsulates threads, runs, and steps. Spanlens captures the full hierarchy: each thread becomes a trace, each run becomes an agent_step span, and each step (message creation, tool call, code interpreter invocation) becomes a child span. Spans inherit the OpenAI thread_id, run_id, and step_id as tags so you can pivot or filter on any of them.

Setup

The Spanlens OpenAI drop-in instruments the underlying HTTP layer, so all Assistants API endpoints are captured automatically. No extra wiring per thread or per run.

import { createOpenAI } from '@spanlens/sdk/openai'

const openai = createOpenAI()

const assistant = await openai.beta.assistants.create({
  name: 'Support agent',
  model: 'gpt-4o-mini',
  tools: [{ type: 'code_interpreter' }],
})

const thread = await openai.beta.threads.create()
await openai.beta.threads.messages.create(thread.id, {
  role: 'user',
  content: 'Plot the last week of revenue from this CSV.',
})

const run = await openai.beta.threads.runs.createAndPoll(thread.id, {
  assistant_id: assistant.id,
})
ts

What gets captured

Assistants API resourceSpanlens entityNotes
Threadtracethread_id is the trace_id; one thread, one trace.
Runspan (kind="agent_step")One run per span. Includes model, usage, status.
Message creation stepspan (kind="llm")Full input/output captured.
Tool call step (function)span (kind="tool")Tool name, arguments, output captured.
Code interpreter stepspan (kind="tool") with subtype="code"Generated code and stdout captured.
File search / retrieval stepspan (kind="tool") with subtype="retrieval"Vector store ID and result count captured.

Multi-run thread cost

For a long-lived thread that runs the assistant multiple times, Spanlens aggregates cost across runs at the thread level. The trace view shows the cumulative cost-per-user-message split, which is the unit most product teams want to bill on.

Streaming runs

Streaming runs work without extra setup. The drop-in handles the server-sent events stream and emits one span per message delta plus a final aggregating span for the full run.

Where to go next