RAG vs. Long-Context: When to Retrieve vs. Stuff.
the path
Read. Master the vocabulary. Fire two hot-takes. Then write the pitch and draw the system. End-state: you speak this like it's native.
The brief
With 1M+ token context windows, the instinct is to dump everything in and let the model sort it. In practice, retrieval still wins on cost, latency, freshness, and attention-dilution for large corpora — but long-context wins for small, dense, high-coherence tasks.
- 01Long-context avoids retrieval infrastructure but costs ~linearly in tokens and degrades on distractors.
- 02RAG needs a reranker to survive realistic noise — top-k alone is usually not enough.
- 03Hybrid (BM25 + dense) beats pure dense on acronyms and rare terms.
- 04Citations add trust but constrain generation; agents sometimes cheat by paraphrasing.
“Long-context is a scalpel for focused tasks; RAG is the library card for everything else.”
The system
Vocabulary gym
Chunking
Splitting documents into retrievable units; typically 200–800 tokens with overlap.
flip back ←Hot-takes
Two hot-takes. One sentence each. No hedging, no lists — just the sharpest answer you can land. The coach replies in seconds with a score and a tighter rewrite.
When does long-context actually outperform RAG?
What's the role of a reranker and when can you skip it?
The drill
Write a 400-word memo to your CTO recommending RAG, long-context, or a hybrid for a legal-research product over a 2M-document corpus. Justify the choice with concrete trade-offs on cost, latency, accuracy, and update cadence.