DeML OS Daily DeML OS 最新前沿分析 DeML OS デイリー
Explore Frontier
04.01
2026
Wed
📄
Paper
MoE-Sieve: Routing-Guided LoRA for Efficient MoE Fine-Tuning https://arxiv.org/abs/2603.24044
Andrea Manzoni MoE LoRA

Notes

DeML OS Q & A 问答
Deep Dive 💬
04.01
2026
Wed
😇
How does MoE-Sieve decide which experts to apply LoRA to?
It profiles the routing frequency (how many tokens each expert handles) on a small calibration set, then selects the top-k most frequently routed experts per layer for LoRA fine-tuning.
😎
😊
Why does random expert selection perform worse than MoE-Sieve?
Random selection ignores the routing signal, potentially picking rarely activated 'cold' experts. Adapting these experts introduces gradient noise without effectively improving task accuracy.
😎
🤓
What is the implication of the non-monotonic relationship between expert routing skew and seed-to-seed variance mentioned in the paper?
It suggests that adapting 'cold' experts (when routing skew is high) increases training instability (variance), supporting the hypothesis that tuning cold experts introduces harmful gradient noise. This explains how MoE-Sieve maintains stability while reducing parameters.
😎