Context Budget Pareto

The Context Budget Pareto view optimizes memory injection strategy by analyzing the relationship between tokens injected and session success rate across different work types. This helps you find the sweet spot: maximum effectiveness with minimum token spending.

The Trade-Off

Memory injection trades context tokens (cost, latency) for success rate (accuracy, fewer rework loops):

More tokens → Better context → Higher success
But: more tokens → slower inference → higher cost

The goal is to find the Pareto frontier - the optimal balance where you can't improve success without spending significantly more tokens, or reduce tokens without significantly hurting success.

Reading the Chart

The Pareto view shows:

X-axis: Average tokens injected per session (context budget)
Y-axis: Success rate (% of sessions that completed without rework)
One scatter point per work type (research, dev, qa, acceptance, etc.)
Pareto frontier line: The optimal boundary; work types above it are efficient; below it are wasteful

Ideal Positioning

success
  100% ├─ ✓ (good) ✓ (good)
       │        ╱╱
   80% ├───╱╱──────────
       │ ✗(waste)  ✓(frontier)
   60% ├─────────────
       │     ✗
       └─────────────────── tokens
         0    500    1000

Zones:

🟢 Upper left (high success, low tokens) - Efficient work type; keep injection strategy
🟡 Above frontier (more tokens than needed for success) - Wasteful; reduce token budget
🔴 Below frontier (low success despite tokens) - Check observation quality; may need better curation
🟢 On frontier - Optimal; your Pareto reference point

Work Type Patterns

Typical Shapes

Work Type	Tokens	Success	Interpretation
research	600	85%	Moderate spending; good ROI
dev	450	78%	Efficient; context helps but not critical
qa	800	92%	Heavy context needed for thoroughness
acceptance	350	70%	Low token budget; limited context available

Outliers to Watch

High tokens, low success (far below frontier):

Observation quality issue (contradictions, stale patterns)
Agents can't effectively use the context (prompt/tool issue)
Work type is fundamentally uncertain (won't improve with more memory)
Action: Audit recent observations; reduce budget and focus on highest-value patterns

Low tokens, high success (upper left corner):

Efficient work type; agents need little context to succeed
Either simple task domain, or agents have strong innate skills
Action: Invest tokens elsewhere; this work type can handle reductions safely

Bimodal distribution (two clusters):

Some sessions have high tokens (rare, complex issues)
Others have low tokens (routine issues)
Action: Segment by complexity; tune budgets per tier

Cost Implications

Token spending scales linearly with inference cost and latency:

500 avg tokens ≈ $0.01-0.05 per session (model-dependent)
1000 avg tokens ≈ $0.02-0.10 per session
2000 avg tokens ≈ $0.05-0.20 per session

Use this dashboard to estimate:

monthly_cost ≈ sessions_per_month × avg_tokens × cost_per_1000_tokens

Example: 1000 sessions/month, 600 tokens avg, $5 per 1M tokens:

cost ≈ 1000 × 600 × (5 / 1,000,000) ≈ $3/month

Optimization Strategies

Strategy 1: Conservative (Lower Cost)

Target 400-500 avg tokens
Accept 70-75% success rate
Inject only top-10 highest-confidence observations
Best for: low-stakes work types, tight budgets

Strategy 2: Balanced (Default)

Target 500-700 avg tokens
Expect 80-85% success rate
Inject top-20 observations, ranked by relevance
Best for: general use; most organizations start here

Strategy 3: Quality-First (Higher Cost)

Target 800-1000+ avg tokens
Expect 90%+ success rate
Inject all relevant observations; let retrieval rank them
Best for: safety-critical (BFSI, security), high-stakes decisions

Strategy 4: Hybrid (Per-Work-Type)

Research: Conservative (400 tokens, 70%)
Dev: Balanced (600 tokens, 82%)
QA: Quality-First (900 tokens, 92%)
Acceptance: Conservative (350 tokens, 68%)

Action: Adjust per-work-type budget in Memory Config.

Identifying Optimal Budget

Locate your work type on the chart
Find the "knee" - where ROI (success per token) starts declining sharply
Move slightly right of knee - gives headroom for edge cases
Compare to adjacent work types - are you aligned with peers?

Example optimization:

Current: dev @ 650 tokens, 80% success
Knee at: ~550 tokens, 78% success (only 2% drop)
Recommendation: Reduce to 580 tokens, save ~12% on context cost

API Reference

Endpoint: GET /api/memory/analytics/context-budget

Query Parameters:

projectId (optional) - Filter to a specific project
since (optional) - ISO-8601 date lower bound (default: 30 days ago)
until (optional) - ISO-8601 date upper bound (default: now)
workType (optional) - Narrow to a single work type

Response:

{
  workTypes: Array<{
    workType: string
    sessionCount: number
    currentBudget: number            // configured token budget from DEFAULT_CONTEXT_BUDGET
    scatter: Array<{                  // quantile-bucketed scatter points
      tokens: number
      successRate: number
    }>
    optimalZone: {                    // contiguous bucket range near peak success rate (null if <5 samples)
      minTokens: number
      maxTokens: number
      peakSuccessRate: number
    } | null
    recommendation: {                 // "increase" | "decrease" | null based on gap from optimal zone
      direction: "increase" | "decrease"
      targetTokens: number
    } | null
  }>
  projectId: string | null
  since: string
  until: string
}

Example:

curl -X GET "https://api.rensei.ai/api/memory/analytics/context-budget?since=2026-05-01T00:00:00Z" \
  -H "Authorization: Bearer rsk_..."

Response:

{
  "workTypes": [
    {
      "workType": "bug_fix",
      "sessionCount": 142,
      "currentBudget": 750,
      "scatter": [
        {"tokens": 350, "successRate": 0.72},
        {"tokens": 600, "successRate": 0.85},
        {"tokens": 900, "successRate": 0.88}
      ],
      "optimalZone": {"minTokens": 500, "maxTokens": 800, "peakSuccessRate": 0.88},
      "recommendation": null
    }
  ],
  "projectId": null,
  "since": "2026-05-01T00:00:00Z",
  "until": "2026-06-01T00:00:00Z"
}

Budget Configuration

Adjust token budgets in the Memory Config panel:

memory:
  injection:
    budgets:
      research:
        max_tokens: 600
        strategy: "balanced"
      dev:
        max_tokens: 500
        strategy: "conservative"
      qa:
        max_tokens: 900
        strategy: "quality_first"
      acceptance:
        max_tokens: 350
        strategy: "conservative"

Changes take effect on next agent invocation. Use A/B testing to validate:

Change budget for one project
Monitor success rate for 1-2 weeks
Compare to control project
Roll out if successful

Warning Signs

Signal	Action
All work types at 1000+ tokens	You may be injecting low-quality observations; audit and prune
Success rate flat even as tokens increase	Observation quality is the bottleneck, not quantity
P95 tokens >> avg (e.g., 500 avg, 2000 P95)	Long tail of expensive sessions; consider cap or fallback
Pareto frontier is horizontal	Token budget has no impact; may indicate agents ignore context

Combining with Other Views

For a complete memory optimization strategy:

Coverage Heatmap - Where are observations concentrated?
Top Observations - Are you injecting high-weight patterns?
Drift Alerts - Are contradictions driving up token spending without helping?
Feedback Impact - Is your feedback tuning injection helpfully?
Context Budget Pareto - Is your token spend buying you success?

Best Practices

Monitor monthly - Pareto frontier shifts as agents learn; adjust budgets quarterly
Segment by complexity - High-complexity work types justify higher budgets
Test incrementally - Reduce tokens by 10% at a time, watch for success degradation
Track cost trends - Use monthly reports to show cost savings from optimization
Communicate trade-offs - Share the Pareto chart with stakeholders to justify token/budget decisions

Memory Health - Storage costs (not token costs, but related)
Feedback Impact - How to improve observation quality (reduces tokens needed)
Top Observations - Quality of injected observations
Memory Export - Download data for detailed cost analysis

Rate Limits

The context-budget API enforces a 100 req/min quota per organization. Calculations are cached for 1 hour.

On this page