BFSI Eval Mode

BFSI strict mode (bfsiMode: true on an AgentCard's evalConfig) applies additional controls to the evaluation pipeline for banking, financial services, and insurance workloads subject to SR 11-7 and related model risk management requirements. It affects dispatch gating, grader requirements, trace retention, and compliance artifact generation.

BFSI strict mode is an eval-pipeline concern. The SDLC-level BFSI template (sdlc-bfsi) and Cedar approval gate bridge are separate features. See BFSI Compliance Overview for the full compliance picture.

What BFSI mode changes

Behavior	Standard mode	BFSI strict mode (`bfsiMode: true`)
Structural grader requirement	Optional	Required for `development` and `qa` sessions - dispatch blocked without it
Human grader items	Routed to eval run queue	Routed to org review queue - must be fulfilled before run completes
Trace retention	`'default'` (subject to standard GC)	`'eval-permanent'` - 7-year retention, never GC'd
`sys:bfsi-output-constraints` partial	Not attached	Auto-composed into the agent system prompt at dispatch
Compliance artifact	Not generated	`score_pct` included in the BFSI compliance artifact

Enabling BFSI mode

Set bfsiMode: true in the AgentCard's evalConfig:

# agent-cards/my-bfsi-agent.yaml
name: credit-risk-sdlc-agent
workType: development
evalConfig:
  enabled: true
  bfsiMode: true
  graders:
    - structural/zod-v1         # REQUIRED in bfsiMode for development/qa
    - model-grader/llm-judge-v1 # optional - additional LLM scoring
    - human-grader/v1           # optional - routes to org review queue in bfsiMode
  regressionThreshold: 0.10     # stricter threshold for regulated agents
  driftWindowDays: 14

BFSI mode is opt-in per AgentCard. Non-BFSI agent cards on the same org are unaffected.

Dispatch gating

When bfsiMode: true and the session's workType is development or qa, the dispatch path checks that evalConfig.graders includes at least one grader with ID structural/zod-v1. If no structural grader is configured, dispatch is blocked and returns:

{
  "error": "BFSI strict mode requires a structural grader for development/qa sessions. Add 'structural/zod-v1' to evalConfig.graders."
}

This ensures that every development and QA session for regulated agents has a deterministic, auditable schema check on record before it can proceed.

Human grader and org review queue

In standard mode, human grader pending entries appear in the operator's personal eval review queue. In BFSI strict mode, they are routed to the org-level review queue, which can be assigned to a dedicated compliance reviewer. The run stays in a pending-complete state until the org-queue entry is fulfilled.

This satisfies the four-eyes principle required by SR 11-7 for model output acceptance.

Trace retention

All eval_traces rows created under BFSI mode are written with retain: 'eval-permanent'. The platform's GC job skips rows with this retention tag. This ensures the raw I/O trace is available for 7 years to satisfy record-keeping requirements under 12 CFR Part 11 and equivalent international standards.

// Emit path sets this automatically when bfsiMode: true
const retain: EvalTraceRetain = evalConfig.bfsiMode ? 'eval-permanent' : 'default'

Traces with retain: 'gdpr-delete' (set by crypto-shredding during data subject deletion) override eval-permanent - GDPR erasure requirements supersede BFSI retention.

System prompt partial auto-composition

When bfsiMode: true, the sys:bfsi-output-constraints partial is automatically composed into the agent's system prompt at dispatch. This partial instructs the agent to:

Avoid including personally identifiable financial information (PIFI) in output payloads that will be stored in eval traces.
Format structured output to match the registered Zod schema.
Include a compliance disclaimer in any human-facing output.

The partial is a system-scope partial (seeded at platform startup) and cannot be overridden by org or project-scope partials. See Agent Card Partials for how the partial auto-attach system works.

Compliance artifact

The BFSI compliance artifact generated at the end of an acceptance-phase session includes a score_pct field derived from the eval run's aggregate score:

{
  "sessionId": "sess_...",
  "workType": "acceptance",
  "approvalGateOutcome": "approved",
  "evalRunId": "evr_...",
  "score_pct": 92,
  "graders": [
    { "graderId": "structural/zod-v1", "pass": true, "score": 1.0 },
    { "graderId": "human-grader/v1", "pass": true, "score": 0.9, "fulfilledBy": "ops_user@..." }
  ],
  "retentionClass": "eval-permanent",
  "generatedAt": "2026-06-02T14:31:00.000Z"
}

score_pct is round(aggregateScore * 100). A score_pct below the configured regressionThreshold surfaces a warning in the compliance artifact but does not block acceptance - the human grader fulfillment is the blocking gate.

EvalConfig fields relevant to BFSI mode

Prop

Type

Checklist for BFSI eval setup

Structural Grader - required in BFSI mode for development/qa sessions
Human Grader - routes to org review queue in BFSI mode
Eval Emission - sets trace retention to eval-permanent in BFSI mode
Eval Replay - 7-year retention ensures replay is always available for audit
BFSI Compliance Overview - SR 11-7, risk scoring, approval gates
Agent Card Partials - sys:bfsi-output-constraints auto-attach mechanism

1. Configure the AgentCard

2. Define your Zod schema

3. Assign org review queue members

4. Verify trace retention

5. Test dispatch gating

On this page