Dropout Reduces Underfitting
Models equipped with early dropout achieve lower final training loss compared to their counterparts without dropout
Early dropout can mitigate underfitting when used at the start of training, leading to improved performance in underfitting models
Consistency Models
Proposes consistency models, a new family of generative models that achieve high sample quality without adversarial training
Consistency models can be trained as a way to distill pre-trained diffusion models or as standalone generative models, achieving high sample quality and fast one-step generation
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Pre-trains a single model on a large unlabeled multilingual dataset of 12M hours spanning over 300 languages and fine-tunes on a smaller labeled dataset
The Universal Speech Model (USM) can perform automatic speech recognition (ASR) across 100+ languages, achieving state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks
Human Motion Diffusion as a Generative Prior
This paper shows that the gap in motion generation can be mitigated using a pre-trained diffusion-based model as a generative prior. The authors demonstrate the prior is effective for fine-tuning, in a few-, and even a zero-shot manner. They introduce DoubleTake, an inference-time method with which they demonstrate up to 10-minute long animations of prompted intervals and their meaningful and controlled transition.
This paper provides AI-based solutions for generating complex and long human motions from a small dataset, including few-shot and zero-shot settings. The proposed method can help businesses that rely on motion generation, such as gaming or animation companies, to improve their workflow and generate high-quality motions with limited data and resources.
Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control
This paper proposes a guided decoding strategy to construct an action sequence that is both likely according to the language model and also realizable according to grounded models of the environment. The authors demonstrate that this guided decoding strategy is able to solve complex, long-horizon embodiment tasks in a robotic setting by leveraging the knowledge of both models.
This paper provides AI-based solutions for improving robotic performance by combining language models with grounded models of the environment. The proposed method can help businesses that rely on robotics, such as manufacturing or logistics companies, to improve their operational efficiency and automate complicated tasks in a real-world setting.