Code Survival

The Code Survival dashboard measures how long agent-authored code persists in production after merge. It is a key indicator of code quality and agent competence: code that survives 30+ days is considered high-quality; code that is reverted or significantly modified within days indicates quality or fit issues.

Overview

Code Survival combines three systems:

Agent PR Attribution - tracks which pull requests were authored (or co-authored) by agents
Survival Scan - periodically checks if code still exists in production (pool-based batch architecture)
Learning Feedback - feeds survival rewards back into agent routing decisions via Thompson Sampling

The dashboard shows survival rates by agent, by provider (Claude, GPT, etc.), by work type, and over time.

What is Code Survival?

When an agent completes a task and code is merged to production, the Rensei platform tracks:

Merge date - when the PR was merged to the default branch
Survival checkpoints - 7 days, 30 days, 90 days post-merge
Survival status - code still present, partially modified, reverted

A commit "survives" if:

It is still in the git history of the default branch
It has not been reverted (no revert commit exists)
Its authored changes are still semantically present (not fully replaced)

The Six Survival Waves (RW1-RW6)

The code-survival system operates in six waves:

RW1: Batch Queue (Redis FIFO)

Agent-authored PRs are enqueued in a per-org Redis queue after merge. Each PR becomes a batch work item awaiting scan.

PR attributions are written by the GitHub webhook handler when a PR authored by an agent is merged. The survival scan job is triggered via POST /api/factory/code-survival (cron-only, bearer-gated) with optional body { referenceTime?, maxPrs?, repoPaths? }.

RW2: Scheduler

A scheduler pulls batch items from the queue and schedules them for execution based on checkpoint timing:

7d checkpoint - scan 7 days post-merge
30d checkpoint - scan 30 days post-merge
90d checkpoint - scan 90 days post-merge

Scheduling accounts for org request rate (1-2 scans/hour per org) to avoid GitHub API throttling.

RW3: Result Ingestion

A dedicated endpoint ingests survival scan results from the worker pool:

POST /api/factory/code-survival/results:

{
  "prUrl": "https://github.com/owner/repo/pull/123",
  "checkpoint": "30d",
  "survived": true,
  "reason": "code_persists",
  "commitSha": "abc123...",
  "timestamp": "2026-06-02T10:00:00Z"
}

Results are stored in the code_survival_results table.

RW4: Observability & Metrics

Survival rates are aggregated by:

Agent ID (did this agent's code survive?)
Provider (Claude vs GPT vs Gemini)
Work type (dev vs qa vs security)
Time period (7d, 30d, 90d survival rate %)

Metrics are exposed via the dashboard and via /api/factory/code-survival?agentId=...&checkpoint=30d.

RW5: Reward Feeder (Agent Routing)

Survival scores feed into the Thompson Sampling posteriors used for agent routing decisions. A survivalReward field on each ProviderPosterior is updated:

// High 30d survival → positive reward (boost this agent's posterior)
// Low survival → neutral (don't penalize, but don't prioritize)
survivalReward = (survivalRate - 0.5) * 0.2  // bounded to [-0.1, 0.1]

See Routing Intelligence for full Thompson Sampling details.

RW6: Donmai Redis Store

Survival data is also synced to the Donmai Redis store (used by the daemon for local routing decisions) via a pub/sub bridge on the survival-reward channel.

Reading the Dashboard

The Code Survival dashboard displays:

Survival Scorecard

A card showing your org's aggregate survival rates:

7-Day Survival:  87%  (good)
30-Day Survival: 72%  (acceptable)
90-Day Survival: 48%  (needs attention)

Interpretation:

>85% at 30d - high-quality code; agents are shipping production-ready work
70-85% at 30d - acceptable; some rework within 30 days
<70% at 30d → quality alert; investigate agent skill, model routing, or task complexity

Agent Leaderboard

Top agents by 30-day survival rate:

Agent               30d Survival  Issues  Avg Cost
claude-sonnet-1     89%           23      $1.45
gpt-4-turbo         76%           18      $2.10
claude-haiku        82%           31      $0.48

Why it matters: Agents with high survival rates are "trusted" and should receive more work via routing. Agents with low survival rates may need retraining or reassignment.

Survival Over Time

A line chart showing survival rate trend over 90 days.

Interpretation:

Rising trend → agents improving; code quality increasing
Declining trend → code quality degrading; investigate recent changes
Flat trend → consistent quality

Configuration

Code Survival is configured at the agent-card level:

apiVersion: rensei.ai/v1
kind: AgentCard
metadata:
  name: dev-agent
spec:
  evalConfig:
    enabled: true
    codeSurvivalEnabled: true  # enable 7d/30d/90d scans
    codeSurvivalProviders:
      - anthropic  # scan code from Claude-authored PRs
      - openai     # and from GPT-authored PRs

Disable code survival scans for non-production agents (e.g., testing or experimentation agents).

API Access

Query Survival Results

curl -H "Authorization: Bearer $RENSEI_API_KEY" \
  "https://app.rensei.ai/api/factory/code-survival/results?checkpoint=30d&agentId=agent-123"

Response:

{
  "results": [
    {
      "prUrl": "https://github.com/owner/repo/pull/123",
      "checkpoint": "30d",
      "survived": true,
      "survivalRate": 1.0,
      "mergedAt": "2026-05-03T10:00:00Z",
      "scanAt": "2026-06-02T10:00:00Z"
    },
    ...
  ],
  "summary": {
    "totalScans": 45,
    "survivalRate": 0.82,
    "avgTimeToRevert": "12d"
  }
}

Enqueue a Survival Scan

For manual testing or backfilling:

curl -X POST -H "Authorization: Bearer $RENSEI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prUrl": "https://github.com/owner/repo/pull/456",
    "authorAgentId": "agent-123",
    "workType": "development",
    "checkpoint": "30d"
  }' \
  "https://app.rensei.ai/api/factory/code-survival/"

See Metrics API for full endpoint documentation.

Functional bug - code had a runtime error
Fit issue - code didn't match requirements
Incomplete work - agent didn't finish the task properly

Action: Review the revert commit message and the agent's session logs to understand why.

Code Survived = False / Modified

The code is still in the history but has been significantly rewritten (>50% of lines changed). This is a partial failure:

Code was not production-ready
It required substantial rework
Agent missed important requirements

Action: Increase code review rigor or provide the agent with better context.

Feedback Loop

Survival data automatically feeds into agent routing:

30-day survival score is calculated daily
Reward is computed: (survivalRate - 0.5) * 0.2
Thompson Sampling posterior is updated on next session start
Router prefers high-survival agents for similar work

This creates a reinforcement loop: agents that ship durable code get more opportunities to do so.

Caveats and Limitations

Code Survival requires:

GitHub integration configured (for PR history)
Valid authorAgentId on merged PRs (tracked via commit metadata)
Agent authorship metadata available in the platform

If any of these is missing, survival scans will be skipped.

Survival rates should NOT be the sole metric for agent evaluation. Consider:

Work complexity - complex work is harder to sustain
Code coverage - undertested code may survive but hide bugs
Architecture - well-designed systems are easier to sustain than monoliths
Team velocity - fast iteration might mean short-lived code by design

Next Steps

To see how survival scores affect routing, see Routing Intelligence
To configure agent assignment, see A2A Policies
To enable code survival for an agent, see Agent Cards