August 1, 2025
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
KV-efficient language models: MLA and sliding window attention