Thu Feb 02 2023 - Top Trending AI Papers

Dreamix: Video Diffusion Models are General Video Editors

Image and video processing

Computer Vision

Machine Learning

Video editing

Image animation

Subject-driven video generation

Proposes a diffusion-based method for text-based motion and appearance editing of general videos, improving motion editability and introducing a new framework for image animation, with superior performance compared to baseline methods.

Can improve video editing capabilities and help generate subject-driven videos, potentially enhancing marketing and advertising efforts.

https://dreamix-video-editing.github.io/

https://arxiv.org/pdf/2302.01329.pdf

https://arxiv.org/abs/2302.01329

https://twitter.com/arankomatsuzaki/status/1621321639240531970/video/1

Multimodal Chain-of-Thought Reasoning in Language Models

Language and vision processing

Artificial Intelligence

Machine Learning

Natural language processing

Multimodal information processing

Image or video captioning

Proposes Multimodal-CoT, a two-stage framework incorporating language and vision modalities for complex reasoning, outperforming the previous state-of-the-art large language model (GPT-3.5) on the ScienceQA benchmark and even surpassing human performance.

Can improve natural language processing capabilities, particularly in tasks requiring multimodal information, such as image or video captioning.

https://arxiv.org/pdf/2302.00923.pdf

https://arxiv.org/abs/2302.00923

https://github.com/amazon-science/mm-cot

https://twitter.com/arankomatsuzaki/status/1621323705686134788/photo/1

Training Language Models with Language Feedback

Language processing

Artificial Intelligence

Machine Learning

Language model training

Natural language processing

Summarization

Proposes to learn from natural language feedback to address issues with pretrained language models not performing tasks in ways that align with our preferences, demonstrating the effectiveness of a three-step learning algorithm in enhancing GPT-3's summarization ability.

Can help improve the performance of language models in various tasks, leading to more accurate and useful outputs.

https://arxiv.org/pdf/2204.14146.pdf

https://arxiv.org/abs/2204.14146

https://twitter.com/arankomatsuzaki/status/1621248532035379201/photo/1

Accelerating Large Language Model Decoding with Speculative Sampling

Distributed computing

Machine learning

Artificial intelligence

Natural language processing

Language-based workflows

Language model applications

An algorithm for accelerating transformer decoding by enabling the generation of multiple tokens from each transformer call, achieving a 2-2.5x decoding speedup in a distributed setup without compromising sample quality or modifying the model itself.

Enabling faster decoding with large language models can improve natural language processing tasks and increase efficiency in language-based workflows.

https://arxiv.org/pdf/2302.01318.pdf

https://arxiv.org/abs/2302.01318

https://twitter.com/arankomatsuzaki/status/1621320648021737474/photo/1

Self-critiquing models for assisting human evaluators

Behavioral cloning

Machine learning

Artificial intelligence

Human-computer interaction

Machine learning system supervision

Natural language processing

AI-assisted human feedback

Finetunes large language models to write natural language critiques using behavioral cloning, enabling AI-assisted human feedback to scale the supervision of ML systems.

Using AI-assisted human feedback to scale the supervision of machine learning systems can improve the accuracy and efficiency of model development and implementation.

https://arxiv.org/pdf/2206.05802.pdf

https://arxiv.org/abs/2206.05802

https://twitter.com/arankomatsuzaki/status/1621247898661883905/photo/1