MiniMax M3: sparse attention architecture and 15.6× faster decoding
MiniMax published a technical report on its M2 series and announced M3 — a model with a new sparse attention mechanism (MSA) that decodes 15.6 times faster than M2 at one-million-token context lengths. It is the first sub-quadratic architecture the company says preserves multi-hop reasoning without compromise.