Mon Apr 17 2023
Sun Apr 16 2023

Inpaint Anything: Segment Anything Meets Image Inpainting

Image inpainting
Computer vision
Natural language processing
Image editing
Object removal
Content generation

Users can select any object in an image by clicking on it. With powerful vision models, e.g., SAM, LaMa and Stable Diffusion (SD), Inpaint Anything is able to remove the object smoothly (i.e., Remove Anything). Further, prompted by user input text, Inpaint Anything can fill the object with any desired content (i.e., Fill Anything) or replace the background of it arbitrarily (i.e., Replace Anything).

Inpaint Anything can be used for mask-free image inpainting and provides a user-friendly interface for solving inpainting-related problems. Businesses can leverage this technology to improve their image editing processes and workflows.

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text

Web corpora
Machine learning
Data mining
Vision and language models
Image analysis
Natural language processing

Multimodal C4 is a corpus of 103M documents containing 585M images interleaved with 43B English tokens. It spans everyday topics like cooking, travel, technology, etc., and can support in-context vision and language models like Flamingo. Businesses can leverage this dataset for training and evaluating their vision and language models.

Multimodal C4 provides a large-scale dataset for training and evaluating in-context vision and language models that can support various business operations, such as image and text analysis, natural language processing, and recommendation systems.

Delta Denoising Score

Text-to-image diffusion models
Machine learning
Natural language processing
Image editing
Text-based image-to-image translation

Delta Denoising Score (DDS) is a novel scoring function for text-based image editing that guides minimal modifications of an input image towards the content described in a target prompt. DDS utilizes the Score Distillation Sampling (SDS) mechanism for the purpose of image editing and can be used as a loss term in an optimization problem to steer an image towards a desired direction dictated by a text. DDS outperforms existing methods in terms of stability and quality, highlighting its potential for real-world applications in text-based image editing.

DDS can be used for text-based image editing and can guide minimal modifications of an input image towards a desired direction dictated by a text. Businesses can leverage this technology to improve their image editing processes and workflows.

Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study

Transformer-based Models
Machine Learning
Artificial Intelligence
Natural Language Processing
Text Generation
Language Models

Presents RETRO++, which significantly outperforms retrieval-augmented GPT across different model sizes.

Pretraining autoregressive LMs with retrieval can lead to better text generation quality and downstream task accuracy. RETRO++ outperforms retrieval-augmented GPT across different model sizes.

Soundini: Sound-Guided Diffusion for Natural Video Editing

Probabilistic Models
Machine Learning
Artificial Intelligence
Video Editing
Natural Video Editing
Computer Vision

Proposes a method for adding sound-guided visual effects to specific regions of videos with a zero-shot setting.

Sound-guided natural video editing using denoising diffusion probabilistic models and audio latent representation is a promising direction for creating more realistic visual effects. Optical flow-based guidance ensures temporal consistency between adjacent frames.

Thu Apr 13 2023
Wed Apr 12 2023
Tue Apr 11 2023
Mon Apr 10 2023