Eval Emission
Eval traces and runs.
Eval emission is the entry point of the evaluation pipeline. When an agent session reaches a terminal status (completed, failed, or stopped), the emission hook fires, checks whether the session's AgentCard has opted into evals, and writes two rows: an eval_traces row capturing I/O telemetry and an eval_runs row tying the trace to the agent card and organization.
How emission works
Emission is wrapped in Next.js after(), which keeps the lambda alive past the HTTP response boundary on serverless runtimes. The session status route's latency budget is never consumed.
Opting in
Emission is off by default. Enable it per AgentCard via the evalConfig field:
# AgentCard YAML (agent-cards/my-agent.yaml)
name: my-sdlc-agent
workType: development
evalConfig:
enabled: true
graders:
- structural/zod-v1
- model-grader/llm-judge-v1
regressionThreshold: 0.15 # optional - default 0.15
driftWindowDays: 7 # optional - default 7The graders list is an ordered list of grader IDs. Grader job enqueueing runs after emission (configured in the eval runs pipeline). The enabled flag is the only required field - the rest have defaults.
EvalConfig schema
Prop
Type
Row shapes
eval_traces
One row per session emission. The trace captures the I/O snapshot.
Prop
Type
eval_runs
One row per emission event, tied to the AgentCard and org.
Prop
Type
WorkareaSnapshotRef
When evalConfig.enabled=true, the emission hook marks the workarea snapshot as retain='eval-permanent', preventing garbage collection:
interface WorkareaSnapshotRef {
provider: string // workarea provider ('e2b', 'vercel', 'local', ...)
snapshotId: string // opaque provider-internal ID
retain: 'default' | 'eval-permanent' | 'evicted'
capturedAt?: string // ISO-8601
}This ensures replay can re-run graders against the exact workarea state.
The withEvalEmissionHook decorator
The hook wraps the session status route handler. Stack it alongside other Layer-6 hooks:
// app/api/sessions/[id]/status/route.ts
export const PUT = withEvalEmissionHook(
withLlmInferenceEmissionHook(
withSessionStatusHook(baseHandler)
)
)Order does not matter - all deferred work runs in after() post-response.
The hook only fires when:
- The handler returns a 2xx response.
- The request body contains
statusinTERMINAL_STATUSES = { 'completed', 'failed', 'stopped' }. - The session's AgentCard has
evalConfig.enabled = true.
Known limitations
- Input/output payload capture is infrastructure. The
inputPayloadandoutputPayloadfields oneval_tracesare currentlynull. Full payload capture requires the session state protocol to expose I/O blobs. TheinputHash/outputHashfields oneval_runsare populated via SHA-256 of a sentinel string in the interim. - Grader job enqueueing is deferred. The
gradeResultsarray starts empty; graders are run via the grader queue after emission.
Related pages
- Structural Grader - deterministic Zod-schema grader
- Model Grader - LLM-as-judge grader
- Human Grader - operator review queue grader
- Eval Runs Dashboard - operator view of all eval runs with regression detection
- Eval Replay - re-run graders against a frozen trace
- BFSI Eval Mode - strict mode requirements for regulated environments