Thu Mar 02 2023 - Top Trending AI Papers

Dropout Reduces Underfitting

Neural networks

Deep learning, Regularization

Improving generalization accuracy in neural network training

Models equipped with early dropout achieve lower final training loss compared to their counterparts without dropout

Early dropout can mitigate underfitting when used at the start of training, leading to improved performance in underfitting models

https://arxiv.org/pdf/2303.01500.pdf

https://arxiv.org/abs/2303.01500

https://twitter.com/arankomatsuzaki/status/1631469683055403008/photo/1

Consistency Models

Diffusion models

Generative models, Image generation

Fast one-step generation, Zero-shot data editing

Proposes consistency models, a new family of generative models that achieve high sample quality without adversarial training

Consistency models can be trained as a way to distill pre-trained diffusion models or as standalone generative models, achieving high sample quality and fast one-step generation

https://arxiv.org/pdf/2303.01469.pdf

https://arxiv.org/abs/2303.01469

https://twitter.com/i/web/status/1631469113703800832

https://twitter.com/arankomatsuzaki/status/1631469113703800832/photo/1

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

Speech recognition

Automatic speech recognition, Multilingual models

Multilingual ASR, Speech-to-text translation

Pre-trains a single model on a large unlabeled multilingual dataset of 12M hours spanning over 300 languages and fine-tunes on a smaller labeled dataset

The Universal Speech Model (USM) can perform automatic speech recognition (ASR) across 100+ languages, achieving state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks

https://arxiv.org/pdf/2303.01037.pdf

https://arxiv.org/abs/2303.01037

https://twitter.com/arankomatsuzaki/status/1631475081384738816/photo/1

Human Motion Diffusion as a Generative Prior

Computer graphics

Motion generation

Human motion generation for gaming and animation

Long sequence generation

Few-shot and zero-shot settings

This paper shows that the gap in motion generation can be mitigated using a pre-trained diffusion-based model as a generative prior. The authors demonstrate the prior is effective for fine-tuning, in a few-, and even a zero-shot manner. They introduce DoubleTake, an inference-time method with which they demonstrate up to 10-minute long animations of prompted intervals and their meaningful and controlled transition.

This paper provides AI-based solutions for generating complex and long human motions from a small dataset, including few-shot and zero-shot settings. The proposed method can help businesses that rely on motion generation, such as gaming or animation companies, to improve their workflow and generate high-quality motions with limited data and resources.

https://priormdm.github.io/priorMDM-page/

https://arxiv.org/pdf/2303.01418.pdf

https://arxiv.org/abs/2303.01418

https://twitter.com/arankomatsuzaki/status/1631472363278675970/video/1

Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control

Artificial intelligence

Machine learning

Robotics

Natural language processing

Robotic control

Language-conditioned robotic policies

Embodied agents

This paper proposes a guided decoding strategy to construct an action sequence that is both likely according to the language model and also realizable according to grounded models of the environment. The authors demonstrate that this guided decoding strategy is able to solve complex, long-horizon embodiment tasks in a robotic setting by leveraging the knowledge of both models.

This paper provides AI-based solutions for improving robotic performance by combining language models with grounded models of the environment. The proposed method can help businesses that rely on robotics, such as manufacturing or logistics companies, to improve their operational efficiency and automate complicated tasks in a real-world setting.

https://grounded-decoding.github.io/

https://arxiv.org/pdf/2303.00855.pdf

https://arxiv.org/abs/2303.00855

https://twitter.com/arankomatsuzaki/status/1631477652342161408/video/1