Tutorial: trace a multi-step agent
Thirty minutes. We trace an agent that classifies a user message, dispatches to two parallel tools, and composes a final response. By the end you can answer which step ate the wall-clock and which call ate the cost, even when steps run concurrently.
What you will end up with
- One trace per agent invocation in /traces.
- Parallel branches stacked vertically with their real start / end times.
- Critical path highlighted (the longest chain that gates the final answer).
- Per-step cost rolled up to the trace total.
The agent we are tracing
A customer-support agent. Given a user message it:
- Classifies intent (LLM call)
- Based on intent, fans out two lookups in parallel:
- Order lookup (Shopify API tool + summarizer LLM)
- Knowledge base lookup (Pinecone retrieval + answerer LLM)
- Composes a final response from both branches (LLM call)
We will write this without LangChain to keep the example portable. The same pattern works for LangGraph; see LangGraph integration for the callback handler version.
Step 1. Set up
pnpm add @spanlens/sdkbashSPANLENS_API_KEY=sl_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxenvRegister your OpenAI provider key in /projects as usual.
Step 2. Bootstrap the trace
One trace per agent invocation. All work hangs off this trace.
import { SpanlensClient, observe } from '@spanlens/sdk'
import { createOpenAI } from '@spanlens/sdk/openai'
const client = new SpanlensClient()
const openai = createOpenAI()
export async function handleSupportMessage(message: string, userId: string) {
const trace = client.startTrace({
name: 'support-agent',
metadata: { user_id: userId },
})
try {
// ... steps go here ...
} finally {
await trace.end()
}
}tsStep 3. Classify the intent
Wrap the classification LLM call so it shows up as a child span of the trace. The proxy already emits an LLM span automatically; observe() here gives us a meaningful name (classify_intent) and groups the call under the trace explicitly.
type Intent = 'order_status' | 'general_question'
const intent = await observe(
trace,
{ name: 'classify_intent', spanType: 'llm', input: { message } },
async () => {
const res = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{ role: 'system', content: 'Classify the user message. Reply with exactly: order_status OR general_question.' },
{ role: 'user', content: message },
],
})
return res.choices[0].message.content?.trim() as Intent
},
)tsStep 4. Fan out two parallel branches
Parallel branches in Spanlens are first-class. Each branch is its own observe() call; Promise.all runs them concurrently, and the waterfall shows them stacked with their real start / end times.
const [orderInfo, kbAnswer] = await Promise.all([
observe(trace, { name: 'lookup_order', spanType: 'custom' }, async (span) => {
// child span: shopify tool call
const orderRow = await observe(
span,
{ name: 'shopify.query', spanType: 'tool', input: { userId } },
async () => shopify.getRecentOrder(userId),
)
// child span: summarizer LLM
const summary = await observe(
span,
{ name: 'summarize_order', spanType: 'llm' },
async () => {
const res = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{ role: 'system', content: 'Summarize the order in one sentence.' },
{ role: 'user', content: JSON.stringify(orderRow) },
],
})
return res.choices[0].message.content
},
)
return summary
}),
observe(trace, { name: 'lookup_kb', spanType: 'custom' }, async (span) => {
// child span: pinecone retrieval
const docs = await observe(
span,
{ name: 'pinecone.query', spanType: 'retrieval', input: { topK: 5 } },
async () => pinecone.index('kb').query({ vector: await embed(message), topK: 5 }),
)
// child span: answerer LLM
const answer = await observe(
span,
{ name: 'answer_from_kb', spanType: 'llm' },
async () => {
const res = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{ role: 'system', content: 'Answer using the context.' },
{ role: 'user', content: `Context: ${docs.matches.map(m => m.metadata?.text).join('\n\n')}\n\nQuestion: ${message}` },
],
})
return res.choices[0].message.content
},
)
return answer
}),
])tsTwo important details:
- The second argument to
observe()is the parent. Passingtracecreates a top-level span; passingspaninside a callback creates a child of that span. This is how the tree is built. - Spanlens does not enforce a foreign key on
parent_span_id. If one branch finishes before the other, the late branch's spans still attach correctly when they eventually close. You do not need to await branches in order.
Step 5. Compose the final response
const final = await observe(
trace,
{ name: 'compose_final', spanType: 'llm' },
async () => {
const res = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{ role: 'system', content: 'Write a friendly reply combining the order info and the answer.' },
{ role: 'user', content: `Order info: ${orderInfo}\n\nAnswer: ${kbAnswer}` },
],
})
return res.choices[0].message.content
},
)
return finaltsStep 6. What you see in /traces
Trace: support-agent (3.2s, $0.0061, 5 LLM calls + 1 tool + 1 retrieval)
├── classify_intent (450ms, $0.0008)
├── lookup_order (parallel) (1.1s, $0.0011)
│ ├── shopify.query (840ms)
│ └── summarize_order (240ms, $0.0005)
├── lookup_kb (parallel) (2.7s, $0.0035) ← critical path
│ ├── pinecone.query (300ms, 5 docs)
│ └── answer_from_kb (2.3s, $0.0023)
└── compose_final (640ms, $0.0007)textThe critical path annotation tells you the wall-clock-blocking chain. Even though lookup_order took 1.1 s, it ran in parallel with the slower lookup_kb branch, so trimming order lookup buys you nothing. Optimizing answer_from_kb would.
Common variations
Conditional branches
If the intent is general_question, you might skip the order lookup entirely. Just wrap the conditional in plain TypeScript:
const branches = intent === 'order_status'
? [orderBranch(), kbBranch()]
: [kbBranch()]
const results = await Promise.all(branches)tsSkipped branches simply have no spans. The trace is whatever you actually executed.
Tagging the trace with user / session for cross-trace analysis
const trace = client.startTrace({
name: 'support-agent',
metadata: {
user_id: userId,
session_id: sessionId,
intent, // populated after classification, set via trace.update(...) if you prefer post-hoc
},
})tsStreaming the final response
Replace chat.completions.create with stream: true. The proxy buffers and logs the full response on stream end; the span closes when the stream finishes. First byte still arrives in ~200 ms.
What you skipped that you might want later
- Per-step evals. Score the
answer_from_kbstep on helpfulness. See Nightly evals tutorial. - LangGraph version. If your agent grows past 5+ nodes, LangGraph + the callback handler is less ceremony than threading
observe()everywhere. LangGraph integration covers it. - Error surfaces.
observe()catches throws and setsspan.status='error'automatically; failed traces show in red on the /traces list.
Next tutorial: scheduled evals on production prompts.