Rensei docs
Factory

PR Attribution

How merged PRs are correlated back to the agent session and issue that produced them.

PR Attribution correlates GitHub pull requests back to the agent session and issue that produced them. It is the bridge between GitHub and the platform for downstream analytics - Code Survival joins on it to measure whether agent-written code persists.

How PR Attribution Works

Every GitHub pull_request webhook (opened, closed, merged) is processed on the ingest path:

  1. Webhook received - GitHub delivers the pull_request event to the webhook ingest route.
  2. Signal extraction - extractAttributionSignals pulls an issue id and/or session id out of the PR body, head branch, and title (pure function, no I/O).
  3. Session resolution - if the PR body did not carry an explicit session id, the issue id is resolved to its most recent agent session via the durable factory_events log.
  4. Attribution stored - an idempotent upsert into agent_pr_attributions, unique on (pr_repo, pr_number). Later events for the same PR (e.g. the merge) update the row's state, timestamps, and - if a stronger signal appears - its session attribution.

Correlation signals, strongest first

SignalSource valueConfidence
Session: <id> token in the PR bodypr_bodyhigh - trusted directly, no lookup needed
Agent worktree branch convention (<user>/<issue-id>-…)branchmedium - issue id resolved to a session via factory_events
Issue identifier in the PR title (e.g. (ABC-123))pr_titlelow - humans write this string too; recorded for downstream heuristics

Attribution is session-token / branch / title driven, not commit-email or Co-Authored-By driven - agents include a Session: token in PR bodies, and the fleet's worktree tooling makes the branch pattern deterministic.

Data Model

The agent_pr_attributions table stores one row per PR:

ColumnTypeNotes
idBIGSERIALPrimary key
workspace_id / org_idTEXT (nullable)Resolved from the session where possible; rows with no resolvable session (external/human PRs) stay null and are excluded from downstream ingestion
pr_repoTEXTowner/repo (unique with pr_number)
pr_numberINTPR number
pr_url, pr_title, head_branchTEXTPR metadata
session_idTEXT (nullable)The attributed agent session; null when the PR was seen but no session resolved
issue_idTEXT (nullable)Issue identifier (e.g. ABC-1246)
attribution_sourceTEXTpr_body | branch | pr_title | manual
confidenceTEXThigh | medium | low
stateTEXTopen | closed | merged
merge_commit_shaTEXTFor merged PRs
authorTEXTGitHub login
payloadJSONBRaw pull_request snapshot at last event
opened_at, merged_at, closed_atTIMESTAMPTZLifecycle timestamps

Reading attribution data

There is no dedicated read API for attributions today - consumers are in-platform:

  • Code Survival scopes its per-symbol drill-down with GET /api/factory/code-survival?attributionId=<id> (or sessionId=<id>), joining survival results to the attribution row.
  • Operators can inspect rows directly with SQL, e.g.:
-- Merged agent-attributed PRs in the last 30 days
SELECT pr_repo, pr_number, session_id, issue_id, confidence
FROM agent_pr_attributions
WHERE state = 'merged'
  AND session_id IS NOT NULL
  AND merged_at > now() - interval '30 days';

Interpreting confidence

  • high - the agent itself stamped the session id into the PR body; safe for downstream joins (code survival, benchmarking).
  • medium - branch-convention match resolved through the event log; reliable for fleet-authored branches.
  • low - only a title match; a human may have authored the PR. Recorded, but treat as unattributed for agent-specific metrics.

Configuration

PR attribution is automatic once the GitHub integration delivers webhooks:

  1. Enable the GitHub webhook - see GitHub integration; attribution runs on the shared ingest route.
  2. Keep the Session: token in PR bodies - the agent fleet adds it by default; stripping PR templates that drop the body lowers attribution to branch/title confidence.

Limitations

  • Rows can be session-less. PRs with only a title signal (or none) persist with session_id = NULL and a null org - they are visible in the table but excluded from code-survival ingestion.
  • Webhook delivery is the transport. Missed deliveries mean missed attributions; check the GitHub webhook delivery log if counts look low.
  • No per-agent authorship stats. The row attributes a PR to a session (and through it an issue); per-agent co-authorship, work-type classification, and PR stats (additions/deletions) are not modeled.

Next Steps

On this page