Mon Dec 12 2022
Sun Dec 11 2022

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Neural networks
AI for model optimization
Improving performance of sparsely activated models on SuperGLUE and ImageNet

Proposes a method to reuse sunk training costs by initializing a sparsely activated Mixture-of-Experts model from a dense checkpoint, which outperforms continued dense training on SuperGLUE and ImageNet.

Can significantly reduce the cost of training sparsely activated models by reusing sunk training costs, leading to improved performance on various tasks.

Heterogeneous Mixture-of-Experts

Neural networks
AI for model optimization
Improving training convergence time and fine-tuning performance of mixture-of-experts models

Proposes a novel expert choice method to prevent load imbalance in mixture-of-experts models, resulting in improved training convergence time and higher performance in fine-tuning tasks.

Can improve the performance and training speed of mixture-of-experts models by allowing experts to select top-k tokens instead of tokens selecting top-k experts.

Thu Dec 08 2022
Tue Dec 06 2022
Mon Dec 05 2022
Sun Dec 04 2022