SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Proposes a model-parallel training algorithm for large models using cheap 'preemptible' instances or pooled existing resources from multiple regions, designed for poorly connected, heterogeneous, and unreliable devices. Showcases the ability to train a large Transformer language model with 1B shared parameters on preemptible T4 GPUs with less than 200Mb/s network.
Implement SWARM parallelism as an alternative setup for training large models to reduce communication requirements and cost.
Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion
Develops a cascading latent diffusion approach for generating high-quality stereo music at 48kHz from textual descriptions. Provides open-source libraries to facilitate future work in the field.
Use Moûsai's cascading latent diffusion approach for text-to-music generation.
Leveraging the Third Dimension in Contrastive Learning
Shows that incorporating the depth channel into SSL methods leads to an increase in downstream classification accuracy on ImageNette and ImageNet-C datasets.
Incorporate depth signals into SSL methods to improve robustness and generalization.