Thu Dec 08 2022 - Top Trending AI Papers

VideoDex: Learning Dexterity from Internet Videos

Robotics

Machine Learning

Computer Vision

Robotics

Artificial Intelligence

Automation

Proposes using internet videos of humans using their hands as real-world experience to guide robot behavior. Shows strong results on various manipulation tasks, outperforming various state-of-the-art methods.

Implement VideoDex algorithm to leverage visual, action, and physical priors from human video datasets to guide robot behavior in various manipulation tasks.

https://video-dex.github.io/

https://arxiv.org/pdf/2212.04498.pdf

https://arxiv.org/abs/2212.04498

https://twitter.com/arankomatsuzaki/status/1601029422659735553/video/1

MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis

Virtual Character Animation

Machine Learning

Computer Vision

Virtual Character Animation

Robotics

Automation

Introduces MoFusion, a denoising-diffusion-based framework for high-quality conditional human motion synthesis that generates long, temporally plausible, and semantically accurate motions based on a range of conditioning contexts. Can be used for several interactive motion editing applications, providing crucial abilities for virtual character animation and robotics.

Use MoFusion framework to generate high-quality and semantically accurate human motion in virtual character animation and robotics, and apply it for interactive motion editing applications such as inbetweening, seed conditioning, and text-based editing.

https://arxiv.org/pdf/2212.04495.pdf

https://arxiv.org/abs/2212.04495

https://twitter.com/arankomatsuzaki/status/1601029866941337601/photo/1

Learning Video Representations from Large Language Models

Video Analysis

Machine Learning

Computer Vision

Natural Language Processing

Video Analysis

Natural Language Processing

Automation

Introduces LaViLa, an approach to learning video-language representations by leveraging Large Language Models (LLMs). Repurposes pre-trained LLMs to be conditioned on visual input and finetunes them to create automatic video narrators. Outperforms previous SotA on multiple first-person and third-person video tasks.

Use LaViLa to create automatic video narrators for improved video-text embeddings, and leverage them to improve performance on first-person and third-person video tasks.

https://facebookresearch.github.io/LaViLa/

https://arxiv.org/pdf/2212.04501.pdf

https://arxiv.org/abs/2212.04501

https://github.com/facebookresearch/LaViLa

https://twitter.com/arankomatsuzaki/status/1601027451773390848/photo/1