V2 is available on Pro, Team, and Enterprise plans. Upgrade your plan to get access.
V1 vs V2 at a glance
| V1 | V2 | |
|---|---|---|
| Retrieval | Keyword + temporal | Keyword + temporal + semantic |
| Serving profiles | — | low_latency, balanced, high_recall |
| Token budget | Fixed | Adaptive — expands when needed |
| Ingest | Single artifact | Batch array in one call |
| Async indexing | — | Per-request async_index flag + job tracking |
| Workspaces | 1 | Pro: 5 · Team: 20 · Enterprise: unlimited |
| Token budget (max) | 4,096 | 16,384 – 131,072 |
| RPM | 30 | 120 – 1,000 |
Serving profiles
V2 lets you pick a retrieval strategy per request instead of applying one strategy to everything.low_latency
Fastest response. Skips semantic search. Best for real-time chat and quick lookups where a slightly smaller result set is fine.
balanced
The default. Semantic search on with adaptive budget. Covers most applications well.
high_recall
Maximum evidence. Runs multiple retrieval passes with an expanded budget. Best for compliance, legal review, and research.
/v2/context-pack request:
Semantic retrieval
V1 retrieval matches on keywords and recency. V2 adds a semantic channel that understands meaning — so a query like “customer complaints about billing” also surfaces artifacts that talk about “payment disputes” or “invoice errors” even if those exact words don’t appear in your query. Semantic retrieval is on by default inbalanced and high_recall profiles. It’s off in low_latency to keep latency predictable.
Adaptive budget
In V1, yourmax_tokens is a hard ceiling. In V2, the runtime can expand that budget automatically when evidence is sparse — so you get a meaningful context pack even when your memory doesn’t have dense coverage of the query topic.
The expansion multiplier depends on your serving profile:
| Profile | Adaptive budget |
|---|---|
low_latency | Off |
balanced | Up to 1.5× |
high_recall | Up to 4×, capped at 4,096 tokens |
token_accounting field of the response — the budget never silently exceeds your account’s plan limit.
Batch ingest and async job tracking
V1 ingests one artifact per request. V2 accepts an array, and withasync_index: true you get job IDs back immediately — no waiting for indexing to complete before moving on.
Multiple workspaces
V1 gives every tenant one workspace. V2 lets you create named workspaces — each fully isolated, each routable to its own storage if you’re on an appropriate plan.| Plan | Workspaces |
|---|---|
| Pro | 5 |
| Team | 20 |
| Enterprise | Unlimited |
Upgrade to V2
Pro starts at $9.99/mo. Everything in Free, plus V2 access, semantic retrieval, adaptive budget, and 100K events/month.