Rensei docs
Memory

Context Budget Pareto

Tokens vs success and optimal zone.

The Context Budget Pareto view optimizes memory injection strategy by analyzing the relationship between tokens injected and session success rate across different work types. This helps you find the sweet spot: maximum effectiveness with minimum token spending.

The Trade-Off

Memory injection trades context tokens (cost, latency) for success rate (accuracy, fewer rework loops):

More tokens → Better context → Higher success
But: more tokens → slower inference → higher cost

The goal is to find the Pareto frontier - the optimal balance where you can't improve success without spending significantly more tokens, or reduce tokens without significantly hurting success.

Reading the Chart

The Pareto view shows:

  • X-axis: Average tokens injected per session (context budget)
  • Y-axis: Success rate (% of sessions that completed without rework)
  • One scatter point per work type (research, dev, qa, acceptance, etc.)
  • Pareto frontier line: The optimal boundary; work types above it are efficient; below it are wasteful

Ideal Positioning

success
  100% ├─ ✓ (good) ✓ (good)
       │        ╱╱
   80% ├───╱╱──────────
       │ ✗(waste)  ✓(frontier)
   60% ├─────────────
       │     ✗
       └─────────────────── tokens
         0    500    1000

Zones:

  • 🟢 Upper left (high success, low tokens) - Efficient work type; keep injection strategy
  • 🟡 Above frontier (more tokens than needed for success) - Wasteful; reduce token budget
  • 🔴 Below frontier (low success despite tokens) - Check observation quality; may need better curation
  • 🟢 On frontier - Optimal; your Pareto reference point

Work Type Patterns

Typical Shapes

Work TypeTokensSuccessInterpretation
research60085%Moderate spending; good ROI
dev45078%Efficient; context helps but not critical
qa80092%Heavy context needed for thoroughness
acceptance35070%Low token budget; limited context available

Outliers to Watch

High tokens, low success (far below frontier):

  • Observation quality issue (contradictions, stale patterns)
  • Agents can't effectively use the context (prompt/tool issue)
  • Work type is fundamentally uncertain (won't improve with more memory)
  • Action: Audit recent observations; reduce budget and focus on highest-value patterns

Low tokens, high success (upper left corner):

  • Efficient work type; agents need little context to succeed
  • Either simple task domain, or agents have strong innate skills
  • Action: Invest tokens elsewhere; this work type can handle reductions safely

Bimodal distribution (two clusters):

  • Some sessions have high tokens (rare, complex issues)
  • Others have low tokens (routine issues)
  • Action: Segment by complexity; tune budgets per tier

Cost Implications

Token spending scales linearly with inference cost and latency:

  • 500 avg tokens ≈ $0.01-0.05 per session (model-dependent)
  • 1000 avg tokens ≈ $0.02-0.10 per session
  • 2000 avg tokens ≈ $0.05-0.20 per session

Use this dashboard to estimate:

monthly_cost ≈ sessions_per_month × avg_tokens × cost_per_1000_tokens

Example: 1000 sessions/month, 600 tokens avg, $5 per 1M tokens:

cost ≈ 1000 × 600 × (5 / 1,000,000) ≈ $3/month

Optimization Strategies

Strategy 1: Conservative (Lower Cost)

  • Target 400-500 avg tokens
  • Accept 70-75% success rate
  • Inject only top-10 highest-confidence observations
  • Best for: low-stakes work types, tight budgets

Strategy 2: Balanced (Default)

  • Target 500-700 avg tokens
  • Expect 80-85% success rate
  • Inject top-20 observations, ranked by relevance
  • Best for: general use; most organizations start here

Strategy 3: Quality-First (Higher Cost)

  • Target 800-1000+ avg tokens
  • Expect 90%+ success rate
  • Inject all relevant observations; let retrieval rank them
  • Best for: safety-critical (BFSI, security), high-stakes decisions

Strategy 4: Hybrid (Per-Work-Type)

  • Research: Conservative (400 tokens, 70%)
  • Dev: Balanced (600 tokens, 82%)
  • QA: Quality-First (900 tokens, 92%)
  • Acceptance: Conservative (350 tokens, 68%)

Action: Adjust per-work-type budget in Memory Config.

Identifying Optimal Budget

  1. Locate your work type on the chart
  2. Find the "knee" - where ROI (success per token) starts declining sharply
  3. Move slightly right of knee - gives headroom for edge cases
  4. Compare to adjacent work types - are you aligned with peers?

Example optimization:

Current: dev @ 650 tokens, 80% success
Knee at: ~550 tokens, 78% success (only 2% drop)
Recommendation: Reduce to 580 tokens, save ~12% on context cost

API Reference

Endpoint: GET /api/memory/analytics/context-budget

Query Parameters:

  • projectId (optional) - Filter to a specific project
  • since (optional) - ISO-8601 date lower bound (default: 30 days ago)
  • until (optional) - ISO-8601 date upper bound (default: now)
  • workType (optional) - Narrow to a single work type

Response:

{
  workTypes: Array<{
    workType: string
    sessionCount: number
    currentBudget: number            // configured token budget from DEFAULT_CONTEXT_BUDGET
    scatter: Array<{                  // quantile-bucketed scatter points
      tokens: number
      successRate: number
    }>
    optimalZone: {                    // contiguous bucket range near peak success rate (null if <5 samples)
      minTokens: number
      maxTokens: number
      peakSuccessRate: number
    } | null
    recommendation: {                 // "increase" | "decrease" | null based on gap from optimal zone
      direction: "increase" | "decrease"
      targetTokens: number
    } | null
  }>
  projectId: string | null
  since: string
  until: string
}

Example:

curl -X GET "https://api.rensei.ai/api/memory/analytics/context-budget?since=2026-05-01T00:00:00Z" \
  -H "Authorization: Bearer rsk_..."

Response:

{
  "workTypes": [
    {
      "workType": "bug_fix",
      "sessionCount": 142,
      "currentBudget": 750,
      "scatter": [
        {"tokens": 350, "successRate": 0.72},
        {"tokens": 600, "successRate": 0.85},
        {"tokens": 900, "successRate": 0.88}
      ],
      "optimalZone": {"minTokens": 500, "maxTokens": 800, "peakSuccessRate": 0.88},
      "recommendation": null
    }
  ],
  "projectId": null,
  "since": "2026-05-01T00:00:00Z",
  "until": "2026-06-01T00:00:00Z"
}

Budget Configuration

Adjust token budgets in the Memory Config panel:

memory:
  injection:
    budgets:
      research:
        max_tokens: 600
        strategy: "balanced"
      dev:
        max_tokens: 500
        strategy: "conservative"
      qa:
        max_tokens: 900
        strategy: "quality_first"
      acceptance:
        max_tokens: 350
        strategy: "conservative"

Changes take effect on next agent invocation. Use A/B testing to validate:

  1. Change budget for one project
  2. Monitor success rate for 1-2 weeks
  3. Compare to control project
  4. Roll out if successful

Warning Signs

SignalAction
All work types at 1000+ tokensYou may be injecting low-quality observations; audit and prune
Success rate flat even as tokens increaseObservation quality is the bottleneck, not quantity
P95 tokens >> avg (e.g., 500 avg, 2000 P95)Long tail of expensive sessions; consider cap or fallback
Pareto frontier is horizontalToken budget has no impact; may indicate agents ignore context

Combining with Other Views

For a complete memory optimization strategy:

  1. Coverage Heatmap - Where are observations concentrated?
  2. Top Observations - Are you injecting high-weight patterns?
  3. Drift Alerts - Are contradictions driving up token spending without helping?
  4. Feedback Impact - Is your feedback tuning injection helpfully?
  5. Context Budget Pareto - Is your token spend buying you success?

Best Practices

  1. Monitor monthly - Pareto frontier shifts as agents learn; adjust budgets quarterly
  2. Segment by complexity - High-complexity work types justify higher budgets
  3. Test incrementally - Reduce tokens by 10% at a time, watch for success degradation
  4. Track cost trends - Use monthly reports to show cost savings from optimization
  5. Communicate trade-offs - Share the Pareto chart with stakeholders to justify token/budget decisions

Rate Limits

The context-budget API enforces a 100 req/min quota per organization. Calculations are cached for 1 hour.

On this page