Skip to main content
This guide shows the standard pattern for integrating Memory Runtime into any AI agent — LangChain, CrewAI, AutoGen, or custom frameworks.

The Loop

Every agent interaction follows four steps:
1. POST /v1/ingest       → Store artifacts (conversations, docs, tool outputs)
2. POST /v1/context-pack → Get budgeted, policy-safe context
3. Feed context_text     → Inject directly into your LLM's system prompt
4. POST /v1/feedback     → Pin, correct, or mark-private to improve future packs
The agent never sees raw memory. It only sees what passes through the budget + policy + relevance filters. Every decision is logged in a receipt.

Step 1: Ingest Memory

As your agent generates or receives information, ingest it:
import httpx

client = httpx.Client(base_url="https://api.9dlabs.xyz", headers={"X-API-Key": "your-key"})

# Ingest a conversation turn
client.post("/v1/ingest", json={
    "workspace_id": "ws_acme",
    "actor_id": "agent_assistant",
    "artifact_type": "chat_turn",
    "raw_payload": {
        "role": "assistant",
        "content": "The board approved an 18% enterprise discount cap for Q2."
    },
    "idempotency_key": "turn_2026030901",
})

# Ingest a document
client.post("/v1/ingest", json={
    "workspace_id": "ws_acme",
    "actor_id": "user_alice",
    "artifact_type": "document",
    "raw_payload": {
        "title": "Q2 Pricing Policy",
        "content": "... full document text ..."
    },
})
Use idempotency_key to safely retry ingestion without creating duplicates. Same key + workspace = same artifact returned.

Step 2: Retrieve Context Before Inference

Before every LLM call, ask Memory Runtime for the relevant context:
pack = client.post("/v1/context-pack", json={
    "query": "What is the enterprise discount cap for Q2?",
    "max_tokens": 4096,
    "workspace_id": "ws_acme",
    "actor_id": "user_alice",
}).json()

# Always check for abstention
if pack["abstain_flag"]:
    response = "I don't have enough information to answer that reliably."
else:
    # Use the context in your LLM call
    messages = [
        {
            "role": "system",
            "content": (
                "Answer using ONLY the provided context. "
                "Cite evidence using [cite:N] format.\n\n"
                + pack["context_text"]
            ),
        },
        {"role": "user", "content": "What is the enterprise discount cap for Q2?"},
    ]
    llm_response = call_your_llm(messages)

Step 3: Surface Receipts

Every context pack comes with a receipt_id. Use it to provide transparency to users and operators:
receipt = client.get(f"/v1/receipts/{pack['receipt_id']}").json()

# Show what evidence was used
for citation in pack["citations"]:
    print(f"  Source: {citation['source_type']} (relevance: {citation['relevance_score']:.2f})")
    print(f"  Preview: {citation['content_preview']}")

# Log any conflicts
for conflict in receipt["conflicts"]:
    print(f"  Conflict: {conflict['description']}")

# Token budget breakdown
acct = receipt["token_accounting"]
print(f"Budget: {acct['budget']}, Used: {acct['total_used']}, Remaining: {acct['remaining']}")

Step 4: Apply Feedback

When users correct the agent, feed it back to improve future context packs:
# User says a piece of evidence is wrong
client.post("/v1/feedback", json={
    "action": "mark_wrong",
    "artifact_id": "...",
    "span_id": "...",
    "actor_id": "user_alice",
    "workspace_id": "ws_acme",
    "reason": "This pricing info is outdated",
})

# Pin authoritative information
client.post("/v1/feedback", json={
    "action": "pin",
    "artifact_id": "...",
    "span_id": "...",
    "actor_id": "user_alice",
    "workspace_id": "ws_acme",
    "reason": "This is the current Q2 pricing policy",
})

Full Async Example

A complete async agent loop using the Python SDK:
import asyncio
import httpx
from openai import AsyncOpenAI

BASE = "https://api.9dlabs.xyz"
HEADERS = {"X-API-Key": "your-key"}

async def agent_turn(user_message: str, workspace_id: str) -> str:
    oai = AsyncOpenAI()

    async with httpx.AsyncClient(timeout=30, headers=HEADERS) as client:
        # Store the user message
        await client.post(f"{BASE}/v1/ingest", json={
            "workspace_id": workspace_id,
            "actor_id": "user",
            "artifact_type": "chat_turn",
            "raw_payload": {"role": "user", "content": user_message},
        })

        # Get budgeted context
        resp = await client.post(f"{BASE}/v1/context-pack", json={
            "query": user_message,
            "max_tokens": 4096,
            "workspace_id": workspace_id,
        })
        pack = resp.json()

        if pack["abstain_flag"]:
            return "I don't have enough information to answer that."

        # Call LLM with context
        completion = await oai.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {
                    "role": "system",
                    "content": (
                        "Answer using ONLY the provided context. "
                        "Cite sources using [cite:N] format.\n\n"
                        + pack["context_text"]
                    ),
                },
                {"role": "user", "content": user_message},
            ],
            temperature=0,
        )
        answer = completion.choices[0].message.content

        # Store the assistant response
        await client.post(f"{BASE}/v1/ingest", json={
            "workspace_id": workspace_id,
            "actor_id": "assistant",
            "artifact_type": "chat_turn",
            "raw_payload": {"role": "assistant", "content": answer},
        })

        return answer

asyncio.run(agent_turn("What did we decide about Q2 pricing?", "ws_sales"))