Context Budget Pareto
Tokens vs success and optimal zone.
The Context Budget Pareto view optimizes memory injection strategy by analyzing the relationship between tokens injected and session success rate across different work types. This helps you find the sweet spot: maximum effectiveness with minimum token spending.
The Trade-Off
Memory injection trades context tokens (cost, latency) for success rate (accuracy, fewer rework loops):
More tokens → Better context → Higher success
But: more tokens → slower inference → higher costThe goal is to find the Pareto frontier - the optimal balance where you can't improve success without spending significantly more tokens, or reduce tokens without significantly hurting success.
Reading the Chart
The Pareto view shows:
- X-axis: Average tokens injected per session (context budget)
- Y-axis: Success rate (% of sessions that completed without rework)
- One scatter point per work type (research, dev, qa, acceptance, etc.)
- Pareto frontier line: The optimal boundary; work types above it are efficient; below it are wasteful
Ideal Positioning
success
100% ├─ ✓ (good) ✓ (good)
│ ╱╱
80% ├───╱╱──────────
│ ✗(waste) ✓(frontier)
60% ├─────────────
│ ✗
└─────────────────── tokens
0 500 1000Zones:
- 🟢 Upper left (high success, low tokens) - Efficient work type; keep injection strategy
- 🟡 Above frontier (more tokens than needed for success) - Wasteful; reduce token budget
- 🔴 Below frontier (low success despite tokens) - Check observation quality; may need better curation
- 🟢 On frontier - Optimal; your Pareto reference point
Work Type Patterns
Typical Shapes
| Work Type | Tokens | Success | Interpretation |
|---|---|---|---|
| research | 600 | 85% | Moderate spending; good ROI |
| dev | 450 | 78% | Efficient; context helps but not critical |
| qa | 800 | 92% | Heavy context needed for thoroughness |
| acceptance | 350 | 70% | Low token budget; limited context available |
Outliers to Watch
High tokens, low success (far below frontier):
- Observation quality issue (contradictions, stale patterns)
- Agents can't effectively use the context (prompt/tool issue)
- Work type is fundamentally uncertain (won't improve with more memory)
- Action: Audit recent observations; reduce budget and focus on highest-value patterns
Low tokens, high success (upper left corner):
- Efficient work type; agents need little context to succeed
- Either simple task domain, or agents have strong innate skills
- Action: Invest tokens elsewhere; this work type can handle reductions safely
Bimodal distribution (two clusters):
- Some sessions have high tokens (rare, complex issues)
- Others have low tokens (routine issues)
- Action: Segment by complexity; tune budgets per tier
Cost Implications
Token spending scales linearly with inference cost and latency:
- 500 avg tokens ≈ $0.01-0.05 per session (model-dependent)
- 1000 avg tokens ≈ $0.02-0.10 per session
- 2000 avg tokens ≈ $0.05-0.20 per session
Use this dashboard to estimate:
monthly_cost ≈ sessions_per_month × avg_tokens × cost_per_1000_tokensExample: 1000 sessions/month, 600 tokens avg, $5 per 1M tokens:
cost ≈ 1000 × 600 × (5 / 1,000,000) ≈ $3/monthOptimization Strategies
Strategy 1: Conservative (Lower Cost)
- Target 400-500 avg tokens
- Accept 70-75% success rate
- Inject only top-10 highest-confidence observations
- Best for: low-stakes work types, tight budgets
Strategy 2: Balanced (Default)
- Target 500-700 avg tokens
- Expect 80-85% success rate
- Inject top-20 observations, ranked by relevance
- Best for: general use; most organizations start here
Strategy 3: Quality-First (Higher Cost)
- Target 800-1000+ avg tokens
- Expect 90%+ success rate
- Inject all relevant observations; let retrieval rank them
- Best for: safety-critical (BFSI, security), high-stakes decisions
Strategy 4: Hybrid (Per-Work-Type)
- Research: Conservative (400 tokens, 70%)
- Dev: Balanced (600 tokens, 82%)
- QA: Quality-First (900 tokens, 92%)
- Acceptance: Conservative (350 tokens, 68%)
Action: Adjust per-work-type budget in Memory Config.
Identifying Optimal Budget
- Locate your work type on the chart
- Find the "knee" - where ROI (success per token) starts declining sharply
- Move slightly right of knee - gives headroom for edge cases
- Compare to adjacent work types - are you aligned with peers?
Example optimization:
Current: dev @ 650 tokens, 80% success
Knee at: ~550 tokens, 78% success (only 2% drop)
Recommendation: Reduce to 580 tokens, save ~12% on context costAPI Reference
Endpoint: GET /api/memory/analytics/context-budget
Query Parameters:
projectId(optional) - Filter to a specific projectsince(optional) - ISO-8601 date lower bound (default: 30 days ago)until(optional) - ISO-8601 date upper bound (default: now)workType(optional) - Narrow to a single work type
Response:
{
workTypes: Array<{
workType: string
sessionCount: number
currentBudget: number // configured token budget from DEFAULT_CONTEXT_BUDGET
scatter: Array<{ // quantile-bucketed scatter points
tokens: number
successRate: number
}>
optimalZone: { // contiguous bucket range near peak success rate (null if <5 samples)
minTokens: number
maxTokens: number
peakSuccessRate: number
} | null
recommendation: { // "increase" | "decrease" | null based on gap from optimal zone
direction: "increase" | "decrease"
targetTokens: number
} | null
}>
projectId: string | null
since: string
until: string
}Example:
curl -X GET "https://api.rensei.ai/api/memory/analytics/context-budget?since=2026-05-01T00:00:00Z" \
-H "Authorization: Bearer rsk_..."Response:
{
"workTypes": [
{
"workType": "bug_fix",
"sessionCount": 142,
"currentBudget": 750,
"scatter": [
{"tokens": 350, "successRate": 0.72},
{"tokens": 600, "successRate": 0.85},
{"tokens": 900, "successRate": 0.88}
],
"optimalZone": {"minTokens": 500, "maxTokens": 800, "peakSuccessRate": 0.88},
"recommendation": null
}
],
"projectId": null,
"since": "2026-05-01T00:00:00Z",
"until": "2026-06-01T00:00:00Z"
}Budget Configuration
Adjust token budgets in the Memory Config panel:
memory:
injection:
budgets:
research:
max_tokens: 600
strategy: "balanced"
dev:
max_tokens: 500
strategy: "conservative"
qa:
max_tokens: 900
strategy: "quality_first"
acceptance:
max_tokens: 350
strategy: "conservative"Changes take effect on next agent invocation. Use A/B testing to validate:
- Change budget for one project
- Monitor success rate for 1-2 weeks
- Compare to control project
- Roll out if successful
Warning Signs
| Signal | Action |
|---|---|
| All work types at 1000+ tokens | You may be injecting low-quality observations; audit and prune |
| Success rate flat even as tokens increase | Observation quality is the bottleneck, not quantity |
| P95 tokens >> avg (e.g., 500 avg, 2000 P95) | Long tail of expensive sessions; consider cap or fallback |
| Pareto frontier is horizontal | Token budget has no impact; may indicate agents ignore context |
Combining with Other Views
For a complete memory optimization strategy:
- Coverage Heatmap - Where are observations concentrated?
- Top Observations - Are you injecting high-weight patterns?
- Drift Alerts - Are contradictions driving up token spending without helping?
- Feedback Impact - Is your feedback tuning injection helpfully?
- Context Budget Pareto - Is your token spend buying you success?
Best Practices
- Monitor monthly - Pareto frontier shifts as agents learn; adjust budgets quarterly
- Segment by complexity - High-complexity work types justify higher budgets
- Test incrementally - Reduce tokens by 10% at a time, watch for success degradation
- Track cost trends - Use monthly reports to show cost savings from optimization
- Communicate trade-offs - Share the Pareto chart with stakeholders to justify token/budget decisions
Related Pages
- Memory Health - Storage costs (not token costs, but related)
- Feedback Impact - How to improve observation quality (reduces tokens needed)
- Top Observations - Quality of injected observations
- Memory Export - Download data for detailed cost analysis
Rate Limits
The context-budget API enforces a 100 req/min quota per organization. Calculations are cached for 1 hour.