Results
Evaluated on LongMemEval (500 questions) and LoCoMo (200 questions):| Metric | Value |
|---|---|
| Evidence Recall (LongMemEval 500Q) | 99.2% |
| Answer Containment (gold answer in pack) | 80.0% |
| End-to-End LLM Accuracy (GPT-4o-mini) | 48.9% |
| Evidence Recall (LoCoMo 200Q) | 92.2% |
| Average Latency | 294ms |
| P95 Latency | 452ms |
| Policy Violations | 0 |
| Contract Tests Passing | 16/16 |