DeML OS Daily DeML OS 最新前沿分析 DeML OS デイリー
Explore Frontier
03.31
2026
Tue
📄
Paper
Route Experts by Sequence, not by Token https://arxiv.org/abs/2511.06494
Tiansheng Wen Routing Sparsity

Notes

DeML OS Q & A 问答
Deep Dive 💬
03.31
2026
Tue
😇
What is the main improvement of SeqTopK?
SeqTopK shifts the expert selection budget from per-token to per-sequence level, allowing dynamic allocation of more experts to complex tokens and fewer to easy ones, while keeping the total budget constant.
😎
😊
What advantages does SeqTopK have over traditional TopK routing?
SeqTopK dynamically allocates expert resources based on token difficulty, improving computational efficiency and model performance. It's simple to implement, adds negligible overhead, is compatible with pretrained models without retraining, and shows especially large gains under high sparsity.
😎
🤓
Why does SeqTopK show larger performance gains under higher sparsity?
Under higher sparsity, the number of activatable experts per token (K) is smaller, creating tighter resource constraints. The sequence-level flexibility of SeqTopK becomes critical, allowing it to precisely allocate scarce expert resources to the most needed (hardest) tokens, avoiding resource misallocation caused by fixed assignment, thus maximizing the utility of the limited compute budget.
😎