PR Ingest
PR-merge arch ingest.
PR ingest extracts architectural observations from merged GitHub pull requests and contributes them to the knowledge graph corpus. Because a merged PR has passed human code review, its observations carry higher confidence than the auto-extracted session signals.
PR ingest is opt-in. Set ARCH_INTEL_PR_INGEST_ENABLED=true in your environment to enable it. The feature flag defaults to off because observation rows accumulate without a review surface until the arch-intel review dashboard is enabled.
Trigger
The pull_request.closed GitHub webhook event fires the ingest. Only events with merged = true are processed. The webhook route calls ingestPrMerge inside after() so the 200 ack is on the wire before any DB writes.
GitHub pull_request.closed (merged = true)
└─ webhook route (sync: sig verify + idempotency check)
└─ after()
└─ ingestPrMerge({ prUrl, headSha, baseSha, orgId, projectId?, ... })Confidence levels
PR ingest uses higher confidence values than auto-ingest because human review has filtered the noise:
| Observation type | Confidence |
|---|---|
Regular file change (pattern) | 0.55 |
ADR-touched file (decision) | 0.70 |
| Session-terminal auto-ingest | 0.30 |
ADR paths are detected by two conventions:
- Files under
docs/adr/(e.g.docs/adr/0012-auth-strategy.md) - Files with basename matching
ADR-*.md(e.g.ADR-2026-05-18-arch-intel.md)
What gets extracted
For each merged PR, the ingest produces:
- One aggregate observation - a
pattern-kind summary with PR metadata: title, author, head SHA, base SHA, file count, additions/deletions. - One per-file observation - a
patternobservation for each changed file (regular change), or adecisionobservation for ADR-touching files.
Pagination: the GitHub API returns up to 100 files per page; the ingest caps at 10 pages (1,000 files). PRs with more than 1,000 changed files are truncated - they are typically noise for architectural intelligence purposes.
Idempotency
The idempotency key is (org_id, project_id, agent_id, content_hash) where content_hash is derived from <prUrl>|<headSha>|<filename>. The agent ID is the constant sentinel arch-intel:pr-ingest. Webhook retries for the same head SHA produce zero new rows (all conflict on the unique index and DO NOTHING).
Result type
interface IngestPrMergeResult {
ran: boolean
inserted: number // net new observations (dedup excluded)
duplicates: number // idempotency hits
filesSeen: number
adrFiles: string[] // ADR-touching filenames
reason?: string // present when ran = false
}Common ran = false reasons:
reason | Meaning |
|---|---|
feature_flag_disabled | ARCH_INTEL_PR_INGEST_ENABLED not set to true |
db_unavailable | DB health check failed |
missing_pr_number | Webhook payload malformed (no pull_request.number) |
no_diff_available | GitHub App token mint failed or API returned non-2xx |
Downstream refinement
PR ingest writes raw rows to the observations table; the refinement into typed graph nodes (pattern / convention / decision / deviation) runs on the scheduled graph-extraction cron, the same path that drains auto-ingested observations.
On every fresh insert the ingest also calls emitMemoryObservation('created', obs) on the in-process memoryHookBus, but no pipeline subscriber is attached in the deployed platform today - the event is a no-op, kept as a seam for future in-process consumers. Duplicate observations (idempotency hits) do NOT emit.
Connecting ingest to projects
The ingest resolves projectId via the project_trackers table - the operator-driven owner/repo ↔ project binding. When a repository has no project binding, observations are still written scoped to the org level (projectId = '(unknown)'). This means drift is visible in the review dashboard even for unbound repositories; the operator can add a binding later and the observations will be backfilled.
To bind a repository to a project:
rensei project trackers add owner/repo --provider github_issues --project my-projectRelated pages
- Auto-Ingest - lower-confidence session-terminal ingest
- Extraction Pipeline - how observations become graph nodes
- GitHub Integration - webhook setup and App installation