BFSI Eval Mode
BFSI strict mode for evals.
BFSI strict mode (bfsiMode: true on an AgentCard's evalConfig) applies additional controls to the evaluation pipeline for banking, financial services, and insurance workloads subject to SR 11-7 and related model risk management requirements. It affects dispatch gating, grader requirements, trace retention, and compliance artifact generation.
BFSI strict mode is an eval-pipeline concern. The SDLC-level BFSI template (sdlc-bfsi) and Cedar approval gate bridge are separate features. See BFSI Compliance Overview for the full compliance picture.
What BFSI mode changes
| Behavior | Standard mode | BFSI strict mode (bfsiMode: true) |
|---|---|---|
| Structural grader requirement | Optional | Required for development and qa sessions - dispatch blocked without it |
| Human grader items | Routed to eval run queue | Routed to org review queue - must be fulfilled before run completes |
| Trace retention | 'default' (subject to standard GC) | 'eval-permanent' - 7-year retention, never GC'd |
sys:bfsi-output-constraints partial | Not attached | Auto-composed into the agent system prompt at dispatch |
| Compliance artifact | Not generated | score_pct included in the BFSI compliance artifact |
Enabling BFSI mode
Set bfsiMode: true in the AgentCard's evalConfig:
# agent-cards/my-bfsi-agent.yaml
name: credit-risk-sdlc-agent
workType: development
evalConfig:
enabled: true
bfsiMode: true
graders:
- structural/zod-v1 # REQUIRED in bfsiMode for development/qa
- model-grader/llm-judge-v1 # optional - additional LLM scoring
- human-grader/v1 # optional - routes to org review queue in bfsiMode
regressionThreshold: 0.10 # stricter threshold for regulated agents
driftWindowDays: 14BFSI mode is opt-in per AgentCard. Non-BFSI agent cards on the same org are unaffected.
Dispatch gating
When bfsiMode: true and the session's workType is development or qa, the dispatch path checks that evalConfig.graders includes at least one grader with ID structural/zod-v1. If no structural grader is configured, dispatch is blocked and returns:
{
"error": "BFSI strict mode requires a structural grader for development/qa sessions. Add 'structural/zod-v1' to evalConfig.graders."
}This ensures that every development and QA session for regulated agents has a deterministic, auditable schema check on record before it can proceed.
Human grader and org review queue
In standard mode, human grader pending entries appear in the operator's personal eval review queue. In BFSI strict mode, they are routed to the org-level review queue, which can be assigned to a dedicated compliance reviewer. The run stays in a pending-complete state until the org-queue entry is fulfilled.
This satisfies the four-eyes principle required by SR 11-7 for model output acceptance.
Trace retention
All eval_traces rows created under BFSI mode are written with retain: 'eval-permanent'. The platform's GC job skips rows with this retention tag. This ensures the raw I/O trace is available for 7 years to satisfy record-keeping requirements under 12 CFR Part 11 and equivalent international standards.
// Emit path sets this automatically when bfsiMode: true
const retain: EvalTraceRetain = evalConfig.bfsiMode ? 'eval-permanent' : 'default'Traces with retain: 'gdpr-delete' (set by crypto-shredding during data subject deletion) override eval-permanent - GDPR erasure requirements supersede BFSI retention.
System prompt partial auto-composition
When bfsiMode: true, the sys:bfsi-output-constraints partial is automatically composed into the agent's system prompt at dispatch. This partial instructs the agent to:
- Avoid including personally identifiable financial information (PIFI) in output payloads that will be stored in eval traces.
- Format structured output to match the registered Zod schema.
- Include a compliance disclaimer in any human-facing output.
The partial is a system-scope partial (seeded at platform startup) and cannot be overridden by org or project-scope partials. See Agent Card Partials for how the partial auto-attach system works.
Compliance artifact
The BFSI compliance artifact generated at the end of an acceptance-phase session includes a score_pct field derived from the eval run's aggregate score:
{
"sessionId": "sess_...",
"workType": "acceptance",
"approvalGateOutcome": "approved",
"evalRunId": "evr_...",
"score_pct": 92,
"graders": [
{ "graderId": "structural/zod-v1", "pass": true, "score": 1.0 },
{ "graderId": "human-grader/v1", "pass": true, "score": 0.9, "fulfilledBy": "ops_user@..." }
],
"retentionClass": "eval-permanent",
"generatedAt": "2026-06-02T14:31:00.000Z"
}score_pct is round(aggregateScore * 100). A score_pct below the configured regressionThreshold surfaces a warning in the compliance artifact but does not block acceptance - the human grader fulfillment is the blocking gate.
EvalConfig fields relevant to BFSI mode
Prop
Type
Checklist for BFSI eval setup
Related pages
- Structural Grader - required in BFSI mode for development/qa sessions
- Human Grader - routes to org review queue in BFSI mode
- Eval Emission - sets trace retention to
eval-permanentin BFSI mode - Eval Replay - 7-year retention ensures replay is always available for audit
- BFSI Compliance Overview - SR 11-7, risk scoring, approval gates
- Agent Card Partials -
sys:bfsi-output-constraintsauto-attach mechanism