AST Extraction
AST dependency extraction.
AST-driven dependency extraction converts TypeScript and JavaScript source files into knowledge graph nodes and edges. When an agent writes a file_operation observation, this strategy uses the TypeScript compiler API to parse the source AST and emit one Module node per imported specifier plus a depends_on edge back to the source file.
This is the highest-precision extraction strategy - it produces deterministic, zero-LLM-cost output that can handle every valid import and re-export syntax variant.
How it works
Import kinds
Each import is tagged with an ImportKind that describes how the specifier is referenced:
| Kind | TypeScript syntax | Example |
|---|---|---|
value | Regular import, namespace import | import { foo } from 'bar', import * as N from 'bar' |
type | Type-only import or re-export | import type { T } from 'bar', export type { T } from 'bar' |
dynamic | Dynamic import() expression | const m = await import('./foo') |
side-effect | Import with no clause | import './styles.css' |
require | CJS require() or import x = require() | const x = require('bar'), import x = require('bar') |
When the same specifier appears under multiple kinds (for example, a file is both import type and import in different statements), the highest-priority kind wins: value > require > dynamic > side-effect > type. This ensures runtime dependencies are represented accurately even in mixed-style codebases.
Node and edge shape
The strategy emits a Module node for the source file and one Module node per unique imported specifier, with a depends_on edge from source to each dependency:
{
"nodes": [
{
"id": "module-src-lib-auth-ts",
"name": "src/lib/auth.ts",
"type": "Module",
"description": "Module at src/lib/auth.ts"
},
{
"id": "module-next-server",
"name": "next/server",
"type": "Module",
"description": "Imported module: next/server (kind: value)"
},
{
"id": "module---lib-db",
"name": "@/lib/db",
"type": "Module",
"description": "Imported module: @/lib/db (kind: value)"
}
],
"edges": [
{
"sourceNodeId": "module-src-lib-auth-ts",
"targetNodeId": "module-next-server",
"relationshipName": "depends_on"
},
{
"sourceNodeId": "module-src-lib-auth-ts",
"targetNodeId": "module---lib-db",
"relationshipName": "depends_on"
}
]
}The import kind is embedded in the dependency node's description field. The relationshipName is always depends_on - this preserves backward compatibility with graph queries that rely on that edge type.
File extension handling
The parser picks the correct TypeScript ScriptKind based on the file extension in observation.metadata.filePath:
| Extension | ScriptKind |
|---|---|
.tsx | TSX |
.jsx | JSX |
.js, .mjs, .cjs | JS |
.json | JSON |
everything else (.ts, etc.) | TS |
If no filePath is available in the observation metadata, a synthetic path file-<id> is used and the parser defaults to TS mode.
Using extractFromFileOperation
The function is exported from strategies.ts and called by the extraction pipeline's strategy dispatcher:
import { extractFromFileOperation } from '@/lib/graph/extraction/strategies'
const graph = extractFromFileOperation({
id: 'obs_123',
type: 'file_operation',
content: `
import { NextRequest } from 'next/server'
import type { Db } from '@/lib/db'
import './polyfill'
const Redis = require('ioredis')
`,
metadata: { filePath: 'src/lib/handler.ts' },
})
// graph.nodes: [ { id: 'module-src-lib-handler-ts', ... }, { id: 'module-next-server', ... }, ... ]
// graph.edges: [ { sourceNodeId: 'module-src-lib-handler-ts', targetNodeId: 'module-next-server', ... }, ... ]Parse failures return { nodes: [], edges: [] } and log a warning. The pipeline degrades gracefully - a failed extraction does not block the pipeline.
Other extraction strategies
The runStrategy dispatcher in strategies.ts routes observations to the appropriate strategy by type:
| Observation type | Strategy | LLM call? |
|---|---|---|
file_operation | extractFromFileOperation (AST) | No |
decision | extractFromDecision | Yes |
error | extractFromError | Yes |
session_summary | extractFromSessionSummary | Yes |
explicit | extractFromExplicit | Yes |
| (unknown) | Falls through to extractFromExplicit | Yes |
The AST strategy is the only deterministic, zero-cost strategy. All others issue an LLM inference call against the configured judge model and require LLM_API_KEY to be set.
LLM strategy prompts
For the LLM-backed strategies, the prompt instructs the model to return a JSON graph with a fixed schema:
{
"nodes": [
{ "id": "<slug>", "name": "<name>", "type": "<Service|Module|API|Database|Decision|Pattern|Person|Config|Dependency>", "description": "<text>" }
],
"edges": [
{ "sourceNodeId": "<id>", "targetNodeId": "<id>", "relationshipName": "<snake_case>" }
]
}Each LLM strategy specializes the prompt for its observation type - decision extracts Decision + Person nodes with decided_by edges; error extracts anti-pattern Pattern nodes with workaround_for edges.
Related pages
- Extraction Pipeline - the cron-based worker that calls
runStrategyon queued observations - Auto-Ingest from Sessions - how
file_operationobservations are created at session terminal - Knowledge Graph Store - where extracted nodes and edges are persisted
- Arch Query Layer - how the graph is queried by the platform's read path