PR Attribution
How merged PRs are correlated back to the agent session and issue that produced them.
PR Attribution correlates GitHub pull requests back to the agent session and issue that produced them. It is the bridge between GitHub and the platform for downstream analytics - Code Survival joins on it to measure whether agent-written code persists.
How PR Attribution Works
Every GitHub pull_request webhook (opened, closed, merged) is processed on the ingest path:
- Webhook received - GitHub delivers the
pull_requestevent to the webhook ingest route. - Signal extraction -
extractAttributionSignalspulls an issue id and/or session id out of the PR body, head branch, and title (pure function, no I/O). - Session resolution - if the PR body did not carry an explicit session id, the issue id is resolved to its most recent agent session via the durable
factory_eventslog. - Attribution stored - an idempotent upsert into
agent_pr_attributions, unique on(pr_repo, pr_number). Later events for the same PR (e.g. the merge) update the row'sstate, timestamps, and - if a stronger signal appears - its session attribution.
Correlation signals, strongest first
| Signal | Source value | Confidence |
|---|---|---|
Session: <id> token in the PR body | pr_body | high - trusted directly, no lookup needed |
Agent worktree branch convention (<user>/<issue-id>-…) | branch | medium - issue id resolved to a session via factory_events |
Issue identifier in the PR title (e.g. (ABC-123)) | pr_title | low - humans write this string too; recorded for downstream heuristics |
Attribution is session-token / branch / title driven, not commit-email or Co-Authored-By driven - agents include a Session: token in PR bodies, and the fleet's worktree tooling makes the branch pattern deterministic.
Data Model
The agent_pr_attributions table stores one row per PR:
| Column | Type | Notes |
|---|---|---|
id | BIGSERIAL | Primary key |
workspace_id / org_id | TEXT (nullable) | Resolved from the session where possible; rows with no resolvable session (external/human PRs) stay null and are excluded from downstream ingestion |
pr_repo | TEXT | owner/repo (unique with pr_number) |
pr_number | INT | PR number |
pr_url, pr_title, head_branch | TEXT | PR metadata |
session_id | TEXT (nullable) | The attributed agent session; null when the PR was seen but no session resolved |
issue_id | TEXT (nullable) | Issue identifier (e.g. ABC-1246) |
attribution_source | TEXT | pr_body | branch | pr_title | manual |
confidence | TEXT | high | medium | low |
state | TEXT | open | closed | merged |
merge_commit_sha | TEXT | For merged PRs |
author | TEXT | GitHub login |
payload | JSONB | Raw pull_request snapshot at last event |
opened_at, merged_at, closed_at | TIMESTAMPTZ | Lifecycle timestamps |
Reading attribution data
There is no dedicated read API for attributions today - consumers are in-platform:
- Code Survival scopes its per-symbol drill-down with
GET /api/factory/code-survival?attributionId=<id>(orsessionId=<id>), joining survival results to the attribution row. - Operators can inspect rows directly with SQL, e.g.:
-- Merged agent-attributed PRs in the last 30 days
SELECT pr_repo, pr_number, session_id, issue_id, confidence
FROM agent_pr_attributions
WHERE state = 'merged'
AND session_id IS NOT NULL
AND merged_at > now() - interval '30 days';Interpreting confidence
high- the agent itself stamped the session id into the PR body; safe for downstream joins (code survival, benchmarking).medium- branch-convention match resolved through the event log; reliable for fleet-authored branches.low- only a title match; a human may have authored the PR. Recorded, but treat as unattributed for agent-specific metrics.
Configuration
PR attribution is automatic once the GitHub integration delivers webhooks:
- Enable the GitHub webhook - see GitHub integration; attribution runs on the shared ingest route.
- Keep the
Session:token in PR bodies - the agent fleet adds it by default; stripping PR templates that drop the body lowers attribution to branch/title confidence.
Limitations
- Rows can be session-less. PRs with only a title signal (or none) persist with
session_id = NULLand a null org - they are visible in the table but excluded from code-survival ingestion. - Webhook delivery is the transport. Missed deliveries mean missed attributions; check the GitHub webhook delivery log if counts look low.
- No per-agent authorship stats. The row attributes a PR to a session (and through it an issue); per-agent co-authorship, work-type classification, and PR stats (additions/deletions) are not modeled.
Next Steps
- For code survival analysis, see Code Survival
- For provider benchmarking, see Provider Benchmarks
- For routing decisions based on agent quality, see Routing Intelligence
- To configure GitHub integration, see GitHub