Retrieval Degradation in Multi-Turn Agentic RAG Systems
Status: arXiv submission pending endorsement.
This paper examines how retrieval quality degrades as reasoning turns accumulate in a production agentic RAG deployment. Drawing on 150 sessions and 550 turn-level observations instrumented from Math Professor AI, the study documents a statistically significant 6.6% decline in retrieval quality (p < 0.0001) and identifies context length as the dominant predictor (Pearson r = −0.283). Three contributing mechanisms are characterized — context length interference, keyword drift, and attention head saturation — and a lightweight turn-aware confidence estimator is proposed (AUROC = 0.634). The findings carry direct implications for how agentic RAG systems should be evaluated and designed.
- Empirical study of retrieval quality degradation across reasoning turns in a production agentic RAG deployment (150 sessions, 550 turn-level observations instrumented from Math Professor AI)
- Documented statistically significant decline (6.6%, p < 0.0001) and identified context length as the dominant predictor (Pearson r = −0.283); proposed lightweight turn-aware confidence estimator (AUROC = 0.634)
- Characterized three contributing mechanisms (context length interference, keyword drift, attention head saturation) with implications for agentic RAG evaluation and system design