Rensei docs
Graph

PR Ingest

PR-merge arch ingest.

PR ingest extracts architectural observations from merged GitHub pull requests and contributes them to the knowledge graph corpus. Because a merged PR has passed human code review, its observations carry higher confidence than the auto-extracted session signals.

PR ingest is opt-in. Set ARCH_INTEL_PR_INGEST_ENABLED=true in your environment to enable it. The feature flag defaults to off because observation rows accumulate without a review surface until the arch-intel review dashboard is enabled.

Trigger

The pull_request.closed GitHub webhook event fires the ingest. Only events with merged = true are processed. The webhook route calls ingestPrMerge inside after() so the 200 ack is on the wire before any DB writes.

GitHub pull_request.closed (merged = true)
  └─ webhook route (sync: sig verify + idempotency check)
       └─ after()
            └─ ingestPrMerge({ prUrl, headSha, baseSha, orgId, projectId?, ... })

Confidence levels

PR ingest uses higher confidence values than auto-ingest because human review has filtered the noise:

Observation typeConfidence
Regular file change (pattern)0.55
ADR-touched file (decision)0.70
Session-terminal auto-ingest0.30

ADR paths are detected by two conventions:

  • Files under docs/adr/ (e.g. docs/adr/0012-auth-strategy.md)
  • Files with basename matching ADR-*.md (e.g. ADR-2026-05-18-arch-intel.md)

What gets extracted

For each merged PR, the ingest produces:

  1. One aggregate observation - a pattern-kind summary with PR metadata: title, author, head SHA, base SHA, file count, additions/deletions.
  2. One per-file observation - a pattern observation for each changed file (regular change), or a decision observation for ADR-touching files.

Pagination: the GitHub API returns up to 100 files per page; the ingest caps at 10 pages (1,000 files). PRs with more than 1,000 changed files are truncated - they are typically noise for architectural intelligence purposes.

Idempotency

The idempotency key is (org_id, project_id, agent_id, content_hash) where content_hash is derived from <prUrl>|<headSha>|<filename>. The agent ID is the constant sentinel arch-intel:pr-ingest. Webhook retries for the same head SHA produce zero new rows (all conflict on the unique index and DO NOTHING).

Result type

interface IngestPrMergeResult {
  ran: boolean
  inserted: number         // net new observations (dedup excluded)
  duplicates: number       // idempotency hits
  filesSeen: number
  adrFiles: string[]       // ADR-touching filenames
  reason?: string          // present when ran = false
}

Common ran = false reasons:

reasonMeaning
feature_flag_disabledARCH_INTEL_PR_INGEST_ENABLED not set to true
db_unavailableDB health check failed
missing_pr_numberWebhook payload malformed (no pull_request.number)
no_diff_availableGitHub App token mint failed or API returned non-2xx

Downstream refinement

PR ingest writes raw rows to the observations table; the refinement into typed graph nodes (pattern / convention / decision / deviation) runs on the scheduled graph-extraction cron, the same path that drains auto-ingested observations.

On every fresh insert the ingest also calls emitMemoryObservation('created', obs) on the in-process memoryHookBus, but no pipeline subscriber is attached in the deployed platform today - the event is a no-op, kept as a seam for future in-process consumers. Duplicate observations (idempotency hits) do NOT emit.

Connecting ingest to projects

The ingest resolves projectId via the project_trackers table - the operator-driven owner/repo ↔ project binding. When a repository has no project binding, observations are still written scoped to the org level (projectId = '(unknown)'). This means drift is visible in the review dashboard even for unbound repositories; the operator can add a binding later and the observations will be backfilled.

To bind a repository to a project:

rensei project trackers add owner/repo --provider github_issues --project my-project

On this page