AST Extraction

AST-driven dependency extraction converts TypeScript and JavaScript source files into knowledge graph nodes and edges. When an agent writes a file_operation observation, this strategy uses the TypeScript compiler API to parse the source AST and emit one Module node per imported specifier plus a depends_on edge back to the source file.

This is the highest-precision extraction strategy - it produces deterministic, zero-LLM-cost output that can handle every valid import and re-export syntax variant.

How it works

Import kinds

Each import is tagged with an ImportKind that describes how the specifier is referenced:

Kind	TypeScript syntax	Example
`value`	Regular import, namespace import	`import { foo } from 'bar'`, `import * as N from 'bar'`
`type`	Type-only import or re-export	`import type { T } from 'bar'`, `export type { T } from 'bar'`
`dynamic`	Dynamic `import()` expression	`const m = await import('./foo')`
`side-effect`	Import with no clause	`import './styles.css'`
`require`	CJS `require()` or `import x = require()`	`const x = require('bar')`, `import x = require('bar')`

When the same specifier appears under multiple kinds (for example, a file is both import type and import in different statements), the highest-priority kind wins: value > require > dynamic > side-effect > type. This ensures runtime dependencies are represented accurately even in mixed-style codebases.

Node and edge shape

The strategy emits a Module node for the source file and one Module node per unique imported specifier, with a depends_on edge from source to each dependency:

{
  "nodes": [
    {
      "id": "module-src-lib-auth-ts",
      "name": "src/lib/auth.ts",
      "type": "Module",
      "description": "Module at src/lib/auth.ts"
    },
    {
      "id": "module-next-server",
      "name": "next/server",
      "type": "Module",
      "description": "Imported module: next/server (kind: value)"
    },
    {
      "id": "module---lib-db",
      "name": "@/lib/db",
      "type": "Module",
      "description": "Imported module: @/lib/db (kind: value)"
    }
  ],
  "edges": [
    {
      "sourceNodeId": "module-src-lib-auth-ts",
      "targetNodeId": "module-next-server",
      "relationshipName": "depends_on"
    },
    {
      "sourceNodeId": "module-src-lib-auth-ts",
      "targetNodeId": "module---lib-db",
      "relationshipName": "depends_on"
    }
  ]
}

The import kind is embedded in the dependency node's description field. The relationshipName is always depends_on - this preserves backward compatibility with graph queries that rely on that edge type.

File extension handling

The parser picks the correct TypeScript ScriptKind based on the file extension in observation.metadata.filePath:

Extension	ScriptKind
`.tsx`	`TSX`
`.jsx`	`JSX`
`.js`, `.mjs`, `.cjs`	`JS`
`.json`	`JSON`
everything else (`.ts`, etc.)	`TS`

If no filePath is available in the observation metadata, a synthetic path file-<id> is used and the parser defaults to TS mode.

Using `extractFromFileOperation`

The function is exported from strategies.ts and called by the extraction pipeline's strategy dispatcher:

import { extractFromFileOperation } from '@/lib/graph/extraction/strategies'

const graph = extractFromFileOperation({
  id: 'obs_123',
  type: 'file_operation',
  content: `
    import { NextRequest } from 'next/server'
    import type { Db } from '@/lib/db'
    import './polyfill'
    const Redis = require('ioredis')
  `,
  metadata: { filePath: 'src/lib/handler.ts' },
})

// graph.nodes: [ { id: 'module-src-lib-handler-ts', ... }, { id: 'module-next-server', ... }, ... ]
// graph.edges: [ { sourceNodeId: 'module-src-lib-handler-ts', targetNodeId: 'module-next-server', ... }, ... ]

Parse failures return { nodes: [], edges: [] } and log a warning. The pipeline degrades gracefully - a failed extraction does not block the pipeline.

Other extraction strategies

The runStrategy dispatcher in strategies.ts routes observations to the appropriate strategy by type:

Observation type	Strategy	LLM call?
`file_operation`	`extractFromFileOperation` (AST)	No
`decision`	`extractFromDecision`	Yes
`error`	`extractFromError`	Yes
`session_summary`	`extractFromSessionSummary`	Yes
`explicit`	`extractFromExplicit`	Yes
(unknown)	Falls through to `extractFromExplicit`	Yes

The AST strategy is the only deterministic, zero-cost strategy. All others issue an LLM inference call against the configured judge model and require LLM_API_KEY to be set.

LLM strategy prompts

For the LLM-backed strategies, the prompt instructs the model to return a JSON graph with a fixed schema:

{
  "nodes": [
    { "id": "<slug>", "name": "<name>", "type": "<Service|Module|API|Database|Decision|Pattern|Person|Config|Dependency>", "description": "<text>" }
  ],
  "edges": [
    { "sourceNodeId": "<id>", "targetNodeId": "<id>", "relationshipName": "<snake_case>" }
  ]
}

Each LLM strategy specializes the prompt for its observation type - decision extracts Decision + Person nodes with decided_by edges; error extracts anti-pattern Pattern nodes with workaround_for edges.

Extraction Pipeline - the cron-based worker that calls runStrategy on queued observations
Auto-Ingest from Sessions - how file_operation observations are created at session terminal
Knowledge Graph Store - where extracted nodes and edges are persisted
Arch Query Layer - how the graph is queried by the platform's read path

On this page