DeML OS Daily DeML OS 最新前沿分析 DeML OS デイリー
Explore Frontier
04.02
2026
Thu
📄
Paper
MoEless: Efficient MoE LLM Serving via Serverless Computing https://arxiv.org/abs/2603.06350
Hanfei Yu Serverless Inference

Notes

DeML OS Q & A 问答
Deep Dive 💬
04.02
2026
Thu
😇
What core problem does MoEless solve in MoE serving?
MoEless addresses 'expert load imbalance' during inference. It mitigates 'straggler' bottlenecks caused by uneven data distribution to reduce latency and improve resource efficiency.
😎
😊
How does the 'layer-aware predictor' work and what is its role?
A lightweight model predicting layer-wise expert activation and load. Its role is 'proactive scheduling': identifying bottlenecks to guide elastic scaling and placement before loads arrive, avoiding runtime blocking.
😎
🤓
What challenges does MoEless face compared to static EP, and how are they mitigated?
Challenges include cold start latency and communication overhead. Mitigations: 1. Proactive warming via predictors to reduce cold starts. 2. Optimizing placement for 'function locality' to keep heavy communication within the GPU's high-speed bus.
😎