Memory Bake-Off
Persistent memory is the new context window. Here's how the frontier models stack up.
📊 Live Leaderboard🔬 Independent Testing⚡ Weekly Updates🔐 Signed Proofs
SCS
Session Continuity Score (0-100)
How well a model maintains identity and context across sessions
MRA
Memory Recall Accuracy (%)
Percentage of previously stored information accurately retrieved
CNC
Cost-Normalized Continuity
SCS per dollar spent (higher = better value)
Live Leaderboard
Rank | Model | SCS | MRA | CNC | Last Tested | Status |
---|---|---|---|---|---|---|
🥇 | Claude 3.5 Sonnet | 87 | 92% | 145 | 2025-08-12 | ✅ Live |
🥈 | GPT-5 | 82 | 89% | 123 | 2025-08-12 | ✅ Live |
🥉 | Grok-2 | 78 | 85% | 98 | 2025-08-12 | ✅ Live |
#4 | Gemini 2.0 | 75 | 83% | 112 | 2025-08-12 | ✅ Live |
#5 | Mistral Large | 71 | 79% | 89 | 2025-08-12 | ✅ Live |
Proof & Attestation
Signed Artifacts
- • GitHub Repository (signed commits)
- • Arweave Hash (immutable proof)
- • Vercel Deploy IDs (live verification)
- • Test Results Archive (raw data)
Independent Validation
- • Memory evaluation protocol (open source)
- • Cross-model testing framework
- • Cost analysis methodology
- • Failure case documentation