Agent Reliability
Success/crash/recovery and turn distribution.
Track agent stability and recovery patterns. The Agent Reliability panel shows completion rates, crash frequency, and turn distribution across your workflow runs.
Overview
Agent reliability metrics measure how consistently your agents complete work and recover from errors. The panel displays:
- Completed - Sessions that finished with a
workResult(success or rework) - Rework - Sessions that required manual intervention or re-attempt
- Success Rate - Completed ÷ (Completed + Rework)
- Crash Rate - Proportion of sessions that hit unrecoverable errors
- Crash Count - Total number of crash events
- Recovery Count - Sessions that recovered from a crash before terminal completion
Session Lifecycle
Understanding where a session can go helps you interpret reliability numbers correctly:
stateDiagram-v2
[*] --> running: session created
running --> completed: workResult=passed
running --> reworked: workResult=rework
running --> crashed: unrecoverable error
running --> recovered: transient error → auto-retry
recovered --> completed: retry succeeded
recovered --> crashed: retry also failed
completed --> [*]
reworked --> [*]
crashed --> [*]- completed → counted in "Completed" and "Success Rate"
- reworked → counted in "Rework" (reduces success rate)
- crashed → counted in "Crash Count" and "Crash Rate"
- recovered → counted in "Recovery Count" (session eventually reached completed or crashed)
Key Metrics
Completion Metrics
Completed
- Count of sessions with terminal state
completedorreworked - Indicates sessions that reached an end state (whether success or failure)
Rework
- Count of sessions that returned
workResult = 'rework'or were manually overridden by a human - Tracks sessions requiring a second pass or escalation to human review
Success Rate (%)
- Calculated as:
completed / (completed + rework) * 100 - High success rate (>80%) indicates robust agent execution
- Lower rates signal workflow issues, credential problems, or integration gaps
Crash & Recovery
Crash Rate (%)
- Proportion of sessions that encountered an unrecoverable error (e.g., runtime panic, infrastructure failure)
- Crashes are distinct from rework - a crashed session cannot emit a structured work result
- High crash rates warrant investigation of agent logs and execution environment health
Crash Count
- Absolute number of crash events in the time window
- Useful for detecting spikes and trending over time
Recovery Count
- Sessions that experienced an error mid-execution but recovered and eventually completed
- Indicates resilient session lifecycle (e.g., transient network errors that auto-retried)
- Recovery implies the session continued past the error, not that it succeeded - check completion metrics for outcomes
Turn Distribution
Turn distribution visualizes session length: how many agent "turns" (back-and-forth interactions) does a typical session take? A turn corresponds to a distinct thought/action/response triplet in the activity log.
Turns (1-5, 6-10, 11-20, 20+)
├─ 1-5 turns: 45% [████████░░] - Quick, direct sessions
├─ 6-10 turns: 35% [██████░░░░] - Moderate iteration
├─ 11-20 turns: 15% [███░░░░░░░] - Complex reasoning
└─ 20+ turns: 5% [█░░░░░░░░░] - Edge cases / hangsInterpretation:
- Skew toward 1-5 turns: Agent is highly deterministic, minimal tooling or well-structured prompts
- Bimodal distribution (1-5 and 11-20): Two distinct task pathways - simple and complex - handled by the same agent. Consider splitting into two agents.
- Long tail (20+ turns): Potential infinite loops or over-iteration; investigate workflow design or gate exit conditions
- All sessions in one bucket: Possible templated behavior or a single canonical task type
Screenshot placeholder: The turn distribution chart (stacked bar or donut) showing cohort percentages would help readers recognize the typical shape in their own dashboards.
Display & Filtering
The Performance dashboard hosts the Agent Reliability panel with controls for:
- Time window - Last 7 days, 30 days, or 90 days (configurable via dashboard settings)
- Project filter - Drill into a specific project's reliability
- Workflow filter - Optional; focus on one workflow definition
- Work-type filter - Optional; compare reliability across research/dev/qa/acceptance stages
Data Sources
All metrics derive from the agent_sessions and session_activities tables:
- Session completion state -
agent_sessions.state(completed, reworked, crashed) - Work result -
agent_sessions.workResultenum (success, rework, crashed) - Activity log -
session_activitiesrows; crash events marked withtype='error'and error severity - Turn count - Activity count per session (distinct thought/action/response triplets)
Visual Components
Outcome Distribution Bar
A horizontal stacked bar shows completion breakdown:
[████ Pass ██ Rework]
80% 20%Green for completed, red for rework. Aids at-a-glance health assessment.
Metric Cards
Four cards at the top of the panel:
┌─ Completed ┌─ Rework ┌─ Success Rate ┌─ Crash Rate
│ 847 sessions │ 132 sessions │ 86.5% │ 2.3%
└────────────────└────────────────└────────────────└──────────────Turn Distribution Chart
Stacked bar or donut chart breaking down session length cohorts. Hover reveals counts and percentages.
API & Querying
Component props (TypeScript):
export interface AgentStats {
completed: number
failed: number // "rework" count
successRate: number // 0-100
crashRate: number // 0-100
crashCount: number
recoveryCount: number
toolUsage: Array<{
name: string
count: number
errorRate: number
}>
turnDistribution: Array<{
label: string // "1-5", "6-10", etc.
count: number
}>
}The panel is typically populated by a dashboard fetch call that aggregates these metrics server-side (implementation in performance-client.tsx).
Interpreting Results
Healthy Agent Fleet
- ✓ Success rate >85%
- ✓ Crash rate <5%
- ✓ Turn distribution skews toward 1-5 and 6-10 (deterministic execution)
- ✓ Recovery count >0 (resilience is working)
Warning Signs
- ⚠ Success rate 70-85% - rework happening; check workflow design and condition logic
- ⚠ Success rate <70% - systemic issue; likely credential/integration problem or gate stuck
- ⚠ Crash rate >10% - infrastructure or agent code instability; check execution logs
- ⚠ Skew to 20+ turns - potential infinite loop or over-iteration; verify loop exit conditions
Action Items
- High rework rate - Review human overrides in the performance dashboard; identify repeated escalations
- High crash rate - Check execution environment logs (Sentry, runtime logs) for stack traces
- Long turns - Inspect session activity stream to find where iteration exceeds expectations
Related Pages
- Tool Analytics - Tool invocation frequency and error rates
- Rework & Escalation - Dive deeper into rework patterns
- Session Detail - Inspect individual session traces and activities
- Session Inspector - 7-tab deep-debug view for crash forensics