Rensei docs
Evals

Eval Emission

Eval traces and runs.

Eval emission is the entry point of the evaluation pipeline. When an agent session reaches a terminal status (completed, failed, or stopped), the emission hook fires, checks whether the session's AgentCard has opted into evals, and writes two rows: an eval_traces row capturing I/O telemetry and an eval_runs row tying the trace to the agent card and organization.

How emission works

Emission is wrapped in Next.js after(), which keeps the lambda alive past the HTTP response boundary on serverless runtimes. The session status route's latency budget is never consumed.

Opting in

Emission is off by default. Enable it per AgentCard via the evalConfig field:

# AgentCard YAML (agent-cards/my-agent.yaml)
name: my-sdlc-agent
workType: development
evalConfig:
  enabled: true
  graders:
    - structural/zod-v1
    - model-grader/llm-judge-v1
  regressionThreshold: 0.15   # optional - default 0.15
  driftWindowDays: 7           # optional - default 7

The graders list is an ordered list of grader IDs. Grader job enqueueing runs after emission (configured in the eval runs pipeline). The enabled flag is the only required field - the rest have defaults.

EvalConfig schema

Prop

Type

Row shapes

eval_traces

One row per session emission. The trace captures the I/O snapshot.

Prop

Type

eval_runs

One row per emission event, tied to the AgentCard and org.

Prop

Type

WorkareaSnapshotRef

When evalConfig.enabled=true, the emission hook marks the workarea snapshot as retain='eval-permanent', preventing garbage collection:

interface WorkareaSnapshotRef {
  provider: string        // workarea provider ('e2b', 'vercel', 'local', ...)
  snapshotId: string      // opaque provider-internal ID
  retain: 'default' | 'eval-permanent' | 'evicted'
  capturedAt?: string     // ISO-8601
}

This ensures replay can re-run graders against the exact workarea state.

The withEvalEmissionHook decorator

The hook wraps the session status route handler. Stack it alongside other Layer-6 hooks:

// app/api/sessions/[id]/status/route.ts
export const PUT = withEvalEmissionHook(
  withLlmInferenceEmissionHook(
    withSessionStatusHook(baseHandler)
  )
)

Order does not matter - all deferred work runs in after() post-response.

The hook only fires when:

  1. The handler returns a 2xx response.
  2. The request body contains status in TERMINAL_STATUSES = { 'completed', 'failed', 'stopped' }.
  3. The session's AgentCard has evalConfig.enabled = true.

Known limitations

  • Input/output payload capture is infrastructure. The inputPayload and outputPayload fields on eval_traces are currently null. Full payload capture requires the session state protocol to expose I/O blobs. The inputHash / outputHash fields on eval_runs are populated via SHA-256 of a sentinel string in the interim.
  • Grader job enqueueing is deferred. The gradeResults array starts empty; graders are run via the grader queue after emission.

On this page