Mon Jan 16 2023
Sun Jan 08 2023

Does compressing activations help model parallel training?

Machine learning
Model parallelism
Improve training speed of large-scale Transformer models

Presents the first empirical study on the effectiveness of compression methods to improve the communication speed of model parallelism.

Provides insights on the effectiveness of compression methods for model parallelism, which can potentially improve the training speed of large-scale Transformer models. The study evaluates three common classes of compression algorithms and provides analysis when the model is scaled up. Future development of model parallelism compression algorithms can benefit from the provided insights.

Thu Jan 05 2023
Mon Jan 02 2023
Wed Dec 28 2022
Mon Dec 26 2022