Skip to main content
The core endpoint. Given a query and token budget, retrieve relevant evidence, pack it optimally, enforce policy, and return ready-to-inject prompt text plus a full audit receipt.

Request

POST /v1/context-pack
query
string
required
The question or topic to retrieve evidence for.
max_tokens
integer
required
Token budget (1 - 128,000). The output will never exceed this.
workspace_id
string
Scope retrieval to a workspace.
actor_id
string
Scope retrieval to an actor’s permissions.
mode
string
default:"relevance"
relevance optimizes for top hits. coverage spreads across more sources.
policy
object
Access control policy for this request.
max_latency_ms
integer
Latency constraint (advisory).
max_cost_usd
float
Cost constraint (advisory).
session_id
string
Scope retrieval to a specific session.

Response

context_text
string
The full prompt string to inject into your LLM call. Contains prefix (pinned) + working set (query-specific) + conflict warnings.
prefix_text
string
Just the pinned/authoritative section.
working_set_text
string
Just the query-specific evidence section.
citations
Citation[]
Provenance-anchored citations for each evidence span.
confidence
float
0.0 - 1.0 confidence score based on relevance, coverage, and source diversity.
abstain_flag
boolean
true if the system determined there is insufficient evidence to answer. Your agent should handle this gracefully.
abstain_reason
string | null
Human-readable reason for abstention (e.g., “Confidence too low”, “Insufficient evidence tokens”).
receipt_id
string
UUID pointing to the full decision receipt. Use GET /v1/receipts/ to fetch it.
total_tokens
integer
Exact token count of context_text.
pack_hash
string
SHA-256 of the packed output. Same inputs always produce the same hash.

Example

pack = client.context_pack(
    query="What is the current enterprise discount cap for Q2?",
    workspace_id="ws_acme",
    max_tokens=2048,
    actor_id="user_alice",
)

if pack["abstain_flag"]:
    print(f"Cannot answer: {pack['abstain_reason']}")
else:
    # Inject into your LLM
    print(pack["context_text"])
    print(f"Confidence: {pack['confidence']:.0%}")
    print(f"Tokens used: {pack['total_tokens']}")
Response
{
  "context_text": "=== PINNED CONTEXT (Authoritative) ===\n[cite:1] The Q2 enterprise discount cap has been raised to 18%.\n\n=== TOP EVIDENCE (Most Relevant to Query) ===\n[cite:2] Enterprise pricing guidelines updated for Q2...\n\n=== END CONTEXT ===",
  "prefix_text": "=== PINNED CONTEXT (Authoritative) ===\n[cite:1] ...",
  "working_set_text": "=== TOP EVIDENCE ===\n[cite:2] ...",
  "citations": [
    {
      "artifact_id": "a1b2c3d4-...",
      "span_id": "s1e2f3g4-...",
      "start_offset": 0,
      "end_offset": 56,
      "content_hash": "sha256:...",
      "source_type": "chat_turn",
      "relevance_score": 0.94,
      "content_preview": "The Q2 enterprise discount cap has been raised to 18%."
    }
  ],
  "confidence": 0.87,
  "abstain_flag": false,
  "abstain_reason": null,
  "receipt_id": "r1a2b3c4-...",
  "total_tokens": 347,
  "pack_hash": "sha256:..."
}

Abstention

Always check abstain_flag before using context_text. When the system determines it cannot provide reliable evidence, it sets abstain_flag: true. Your agent should decline to answer rather than use low-quality context.
Abstention triggers when evidence quality or quantity is too low to provide a reliable answer. See Confidence and Abstention for details.

Determinism

Memory Runtime is fully deterministic. Given the same query against the same data, it produces byte-identical output — same evidence, same order, same token count, same pack_hash. You can verify this by comparing pack_hash values across runs.