Thu Mar 16 2023 - Top Trending AI Papers

ART: Automatic multi-step reasoning and tool-use for large language models

Large Language Models

AI for Natural Language Processing

AI for Automation

Natural Language Processing

Task Automation

Reasoning

Automatic Reasoning and Tool-use (ART) framework uses frozen LLMs to automatically generate a program for intermediate reasoning steps. ART selects demonstrations of multi-step reasoning and tool use from a task library to solve new tasks at test time, achieving substantial improvement over few-shot prompting and automatic CoT on unseen tasks in BigBench and MMLU benchmarks.

Can improve performance on unseen tasks in benchmarks, matches performance of hand-crafted CoT prompts on majority of these tasks, and is extensible for humans to improve performance by incorporating new tools or correcting errors in task-specific programs.

https://arxiv.org/pdf/2303.09014.pdf

https://arxiv.org/abs/2303.09014

https://twitter.com/arankomatsuzaki/status/1636539680286965762/photo/1

SemDeDup: Data-efficient learning at web-scale through semantic deduplication

Pre-trained Models

Embeddings

AI for Data Quality Assurance

AI for Machine Learning Efficiency

Data Quality Assurance

Data Processing

Machine Learning Efficiency

SemDeDup leverages embeddings from pre-trained models to identify and remove semantic duplicates from large uncurated web-scale datasets, effectively halving training time with minimal performance loss. It provides an example of how quality embeddings can be used to make models learn faster with less data.

Can remove 50% of the data with minimal performance loss, effectively halving training time; improves performance out of distribution; and provides efficiency gains when training language models on partially curated datasets.

https://arxiv.org/pdf/2303.09540.pdf

https://arxiv.org/abs/2303.09540

https://twitter.com/arankomatsuzaki/status/1636528365661200386/photo/1

Fate/Zero: Fusing Attentions for Zero-shot Text-based Video Editing

Diffusion Models

Attention Mechanisms

AI for Video Editing

AI for Creative Industries

Video Editing

Content Creation

Creative AI

FateZero is the first zero-shot framework for text-driven video editing via pre-trained diffusion models without training. It captures intermediate attention maps during inversion, which effectively retains both structural and motion information, and fuses them in the editing process rather than generating them during denoising. It also introduces spatial-temporal attention to ensure frame consistency.

Can edit videos consistently without per-prompt training or use-specific mask; shows the ability of zero-shot text-driven video style and local attribute editing from trained text-to-image model; and has better zero-shot shape-aware editing ability based on the text-to-video model.

https://fate-zero-edit.github.io/

https://github.com/ChenyangQiQi/FateZero

https://arxiv.org/pdf/2303.09535.pdf

https://arxiv.org/abs/2303.09535

https://twitter.com/arankomatsuzaki/status/1636527520425058306/video/1

A Picture is Worth a Thousand Words: Language Models Plan from Pixels

Planning and decision making

Artificial intelligence

Natural language processing

Embodied visual environments

Plan sequences

Pre-trained language models

Exploring the use of pre-trained language models (PLMs) to reason about plan sequences from text instructions in embodied visual environments, showing that PLMs can accurately plan even when observations are directly encoded as input prompts for the PLM.

Implement pre-trained language models for planning in embodied visual environments in order to improve performance and accuracy of planning processes.

https://arxiv.org/pdf/2303.09031.pdf

https://arxiv.org/abs/2303.09031

https://twitter.com/arankomatsuzaki/status/1636533435974971394/photo/1

LERF: Language Embedded Radiance Fields

Vision-language models

Artificial intelligence

Natural language processing

Robotics

Language embeddings

3D CLIP embeddings

Open-ended language queries in 3D

Proposes Language Embedded Radiance Fields (LERFs), a method for grounding language embeddings from off-the-shelf models like CLIP into NeRF, which enables open-ended language queries in 3D. LERF can extract 3D relevancy maps for a broad range of language prompts interactively in real-time, supporting long-tail open-vocabulary queries hierarchically across the volume.

Use LERFs to enable pixel-aligned, zero-shot queries on the distilled 3D CLIP embeddings without relying on region proposals or masks, supporting long-tail open-vocabulary queries hierarchically across the volume.

https://www.lerf.io/

https://arxiv.org/pdf/2303.09553.pdf

https://arxiv.org/abs/2303.09553