Sun Apr 16 2023 - Top Trending AI Papers

Inpaint Anything: Segment Anything Meets Image Inpainting

Image inpainting

Computer vision

Natural language processing

Image editing

Object removal

Content generation

Users can select any object in an image by clicking on it. With powerful vision models, e.g., SAM, LaMa and Stable Diffusion (SD), Inpaint Anything is able to remove the object smoothly (i.e., Remove Anything). Further, prompted by user input text, Inpaint Anything can fill the object with any desired content (i.e., Fill Anything) or replace the background of it arbitrarily (i.e., Replace Anything).

Inpaint Anything can be used for mask-free image inpainting and provides a user-friendly interface for solving inpainting-related problems. Businesses can leverage this technology to improve their image editing processes and workflows.

https://arxiv.org/pdf/2304.06790.pdf

https://arxiv.org/abs/2304.06790

https://github.com/geekyutao/Inpaint-Anything

https://twitter.com/_akhaliq/status/1647771483962236928/photo/1

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text

Web corpora

Machine learning

Data mining

Vision and language models

Image analysis

Natural language processing

Multimodal C4 is a corpus of 103M documents containing 585M images interleaved with 43B English tokens. It spans everyday topics like cooking, travel, technology, etc., and can support in-context vision and language models like Flamingo. Businesses can leverage this dataset for training and evaluating their vision and language models.

Multimodal C4 provides a large-scale dataset for training and evaluating in-context vision and language models that can support various business operations, such as image and text analysis, natural language processing, and recommendation systems.

https://arxiv.org/pdf/2304.06939.pdf

https://arxiv.org/abs/2304.06939

https://github.com/allenai/mmc4

https://twitter.com/_akhaliq/status/1647804315833122816/photo/1

Delta Denoising Score

Text-to-image diffusion models

Machine learning

Natural language processing

Image editing

Text-based image-to-image translation

Delta Denoising Score (DDS) is a novel scoring function for text-based image editing that guides minimal modifications of an input image towards the content described in a target prompt. DDS utilizes the Score Distillation Sampling (SDS) mechanism for the purpose of image editing and can be used as a loss term in an optimization problem to steer an image towards a desired direction dictated by a text. DDS outperforms existing methods in terms of stability and quality, highlighting its potential for real-world applications in text-based image editing.

DDS can be used for text-based image editing and can guide minimal modifications of an input image towards a desired direction dictated by a text. Businesses can leverage this technology to improve their image editing processes and workflows.

https://arxiv.org/pdf/2304.07090.pdf

https://arxiv.org/abs/2304.07090

https://delta-denoising-score.github.io/

https://twitter.com/_akhaliq/status/1647777718270849024/video/1

Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study

Transformer-based Models

Machine Learning

Artificial Intelligence

Natural Language Processing

Text Generation

Language Models

Presents RETRO++, which significantly outperforms retrieval-augmented GPT across different model sizes.

Pretraining autoregressive LMs with retrieval can lead to better text generation quality and downstream task accuracy. RETRO++ outperforms retrieval-augmented GPT across different model sizes.

https://github.com/NVIDIA/Megatron-LM#retro

https://arxiv.org/pdf/2304.06762.pdf

https://arxiv.org/abs/2304.06762

https://twitter.com/i/web/status/1647763953295020033

https://twitter.com/arankomatsuzaki/status/1647763953295020033/photo/1

Soundini: Sound-Guided Diffusion for Natural Video Editing

Probabilistic Models

Machine Learning

Artificial Intelligence

Video Editing

Natural Video Editing

Computer Vision

Proposes a method for adding sound-guided visual effects to specific regions of videos with a zero-shot setting.

Sound-guided natural video editing using denoising diffusion probabilistic models and audio latent representation is a promising direction for creating more realistic visual effects. Optical flow-based guidance ensures temporal consistency between adjacent frames.

https://arxiv.org/pdf/2304.06818.pdf

https://arxiv.org/abs/2304.06818

https://kuai-lab.github.io/soundini-gallery/

https://twitter.com/_akhaliq/status/1647772025715412993/video/1