Human Grader
Review-queue grader.
The human grader defers scoring to an operator. When it runs, it immediately returns a pending placeholder result and places the eval trace in a review queue. The eval run stays in a running-like state until an operator fulfills the grade via the admin UI. This is the required grader for acceptance-critical sessions in BFSI strict mode.
Grader ID
human-grader/v1How it works
HumanGrader.evaluate() does not inspect the agent output at all. It:
- Generates a unique
queueId(hgq_<12-byte-hex>). - Records the
requestedAtISO timestamp. - Returns a
GradeResultwithscore: 0,pass: false,reasoning: 'pending', andmetadata.pending: true.
The pending entry is stored in eval_runs.grade_results (JSONB array). The operator reviews and grades it via PATCH /api/evals/runs/:id/grade, which updates the grade result array in-place with the final score.
Constructor
new HumanGrader(label?: string)| Parameter | Type | Default | Notes |
|---|---|---|---|
label | string? | - | Display name shown in the admin review queue (e.g. 'QA acceptance gate') |
import { HumanGrader } from '@/lib/evals/graders/human-grader'
const grader = new HumanGrader('Acceptance review - critical path')
const result = await grader.evaluate(ctx)
// result:
// {
// graderId: 'human-grader/v1',
// score: 0,
// pass: false,
// reasoning: 'pending',
// metadata: {
// pending: true,
// queueId: 'hgq_3f7a9c12e45b',
// requestedAt: '2026-06-02T14:31:00.000Z',
// label: 'Acceptance review - critical path'
// }
// }GradeResult schema (pending state)
Prop
Type
Operator workflow
Navigate to Admin → Evals (/admin/evals).
Filter by Pending human review to see runs awaiting operator input.
Open a run. The detail view shows the trace (session ID, token counts, tool calls) alongside the pending grade entry.
Submit a score (0.0-1.0) and a written explanation. The platform updates the grade_results array in-place and sets pending: false.
Unfulfilled human-grader entries keep the aggregate score for the eval run low (because the placeholder score: 0 is averaged in). Complete pending reviews promptly to get accurate run-level scores and regression detection.
No new database table
The human grader stores its pending entry in the existing eval_runs.grade_results JSONB array - no separate review-queue table is needed. The queueId within metadata is the correlation key for the PATCH endpoint.
Performance characteristics
| Property | Value |
|---|---|
| Mode | async |
| LLM call | None |
| Latency | Microseconds (evaluate() itself is near-zero) |
| Fulfillment latency | Operator-dependent |
The grader itself is fast; the blocking factor is operator response time.
BFSI strict mode
In BFSI strict mode, human grader items for acceptance-phase sessions are routed to the org's review queue and must be fulfilled before the eval run can be marked complete. This is the acceptance-critical path for SR 11-7 compliance artifacts.
Registering in an AgentCard
evalConfig:
enabled: true
graders:
- structural/zod-v1 # fast structural check first
- human-grader/v1 # operator review for every completed sessionOr with a label for the queue UI:
// Programmatic registration when building the grader pipeline
new HumanGrader('Acceptance gate - production deploy session')Related pages
- Eval Emission - how graders are queued after session completion
- Structural Grader - run before human grader to catch schema failures cheaply
- Eval Runs Dashboard - where pending human reviews appear
- BFSI Eval Mode - compliance requirements that mandate human graders