PR Ingest

PR ingest extracts architectural observations from merged GitHub pull requests and contributes them to the knowledge graph corpus. Because a merged PR has passed human code review, its observations carry higher confidence than the auto-extracted session signals.

PR ingest is opt-in. Set ARCH_INTEL_PR_INGEST_ENABLED=true in your environment to enable it. The feature flag defaults to off because observation rows accumulate without a review surface until the arch-intel review dashboard is enabled.

Trigger

The pull_request.closed GitHub webhook event fires the ingest. Only events with merged = true are processed. The webhook route calls ingestPrMerge inside after() so the 200 ack is on the wire before any DB writes.

GitHub pull_request.closed (merged = true)
  └─ webhook route (sync: sig verify + idempotency check)
       └─ after()
            └─ ingestPrMerge({ prUrl, headSha, baseSha, orgId, projectId?, ... })

Confidence levels

PR ingest uses higher confidence values than auto-ingest because human review has filtered the noise:

Observation type	Confidence
Regular file change (`pattern`)	`0.55`
ADR-touched file (`decision`)	`0.70`
Session-terminal auto-ingest	`0.30`

ADR paths are detected by two conventions:

Files under docs/adr/ (e.g. docs/adr/0012-auth-strategy.md)
Files with basename matching ADR-*.md (e.g. ADR-2026-05-18-arch-intel.md)

What gets extracted

For each merged PR, the ingest produces:

One aggregate observation - a pattern-kind summary with PR metadata: title, author, head SHA, base SHA, file count, additions/deletions.
One per-file observation - a pattern observation for each changed file (regular change), or a decision observation for ADR-touching files.

Pagination: the GitHub API returns up to 100 files per page; the ingest caps at 10 pages (1,000 files). PRs with more than 1,000 changed files are truncated - they are typically noise for architectural intelligence purposes.

Idempotency

The idempotency key is (org_id, project_id, agent_id, content_hash) where content_hash is derived from <prUrl>|<headSha>|<filename>. The agent ID is the constant sentinel arch-intel:pr-ingest. Webhook retries for the same head SHA produce zero new rows (all conflict on the unique index and DO NOTHING).

Result type

interface IngestPrMergeResult {
  ran: boolean
  inserted: number         // net new observations (dedup excluded)
  duplicates: number       // idempotency hits
  filesSeen: number
  adrFiles: string[]       // ADR-touching filenames
  reason?: string          // present when ran = false
}

Common ran = false reasons:

`reason`	Meaning
`feature_flag_disabled`	`ARCH_INTEL_PR_INGEST_ENABLED` not set to `true`
`db_unavailable`	DB health check failed
`missing_pr_number`	Webhook payload malformed (no `pull_request.number`)
`no_diff_available`	GitHub App token mint failed or API returned non-2xx

PR ingest writes raw rows to the observations table; the refinement into typed graph nodes (pattern / convention / decision / deviation) runs on the scheduled graph-extraction cron, the same path that drains auto-ingested observations.

On every fresh insert the ingest also calls emitMemoryObservation('created', obs) on the in-process memoryHookBus, but no pipeline subscriber is attached in the deployed platform today - the event is a no-op, kept as a seam for future in-process consumers. Duplicate observations (idempotency hits) do NOT emit.

Connecting ingest to projects

The ingest resolves projectId via the project_trackers table - the operator-driven owner/repo ↔ project binding. When a repository has no project binding, observations are still written scoped to the org level (projectId = '(unknown)'). This means drift is visible in the review dashboard even for unbound repositories; the operator can add a binding later and the observations will be backfilled.

To bind a repository to a project:

rensei project trackers add owner/repo --provider github_issues --project my-project

Auto-Ingest - lower-confidence session-terminal ingest
Extraction Pipeline - how observations become graph nodes
GitHub Integration - webhook setup and App installation