Thu Mar 09 2023
Wed Mar 08 2023

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

Chatbots
Natural Language Processing
Computer Vision
Customer service communication
Collaborative content creation
Visual editing and proofreading

Visual ChatGPT allows the user to interact with ChatGPT by sending and receiving not only languages but also images, providing complex visual questions or visual editing instructions that require the collaboration of multiple AI models with multi-steps, and providing feedback and asking for corrected results.

Visual ChatGPT can improve customer experience by allowing them to communicate with the company through a co-creative process that includes visual elements and AI models, potentially improving the effectiveness and accuracy of the interaction.

Magnushammer: A Transformer-based Approach to Premise Selection

Theorem proving systems
Natural Language Processing
Automated theorem proving
Mathematics research
Computer science research

Magnushammer is a neural transformer-based approach that can outperform traditional symbolic systems in solving the fundamental problem of automated theorem proving, achieving a 59.5% proof rate compared to a 38.3% proof rate of Sledgehammer, the most mature and popular symbolic-based solver.

Magnushammer can increase the efficiency and accuracy of automated theorem proving tasks, potentially saving time and resources in fields such as mathematics and computer science.

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

Image segmentation systems
Computer Vision
Natural Language Processing
E-commerce product image segmentation
Advertising visual content creation
Medical image analysis

ODISE is a system that unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation, achieving significant improvements over the previous state of the art in both open-vocabulary panoptic and semantic segmentation tasks.

ODISE can improve the accuracy and quality of image segmentation tasks, potentially benefiting industries such as e-commerce and advertising that rely on visual content.

TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation

Computer Vision
AI model training
generative sampling

Improves FID of single step diffusion by up to 2.4x and achieves new single-step DDIM SotA FID (7.4 for ImageNet64).

Can improve generative sampling through denoising diffusion models with a new method called TRAnsitive Closure Time-distillation (TRACT). This method extends binary time-distillation (BTD) and can improve the FID of single-step diffusion by up to 2.4x on the same architecture. The PyTorch implementation will be released soon. It can be used to generate good samples with fewer iterations in processes that require generative sampling.

X-Avatar: Expressive Human Avatars

Computer Vision
Computer Graphics
telepresence, AR/VR

A new avatar model that captures the full expressiveness of digital humans to bring about life-like experiences in telepresence, AR/VR and beyond.

X-Avatar is a new avatar model that can capture the full expressiveness of digital humans to create life-like experiences in telepresence, AR/VR, and other areas. It can be learned from either full 3D scans or RGB-D data and models bodies, hands, facial expressions, and appearance in a holistic fashion. X-Avatar outperforms strong baselines in both data domains both quantitatively and qualitatively on the animation task. To facilitate research on expressive avatars, the authors also contribute a new dataset called X-Humans, containing 233 sequences of high-quality textured scans from 20 participants, totalling 35,500 data frames.

Tue Mar 07 2023
Sun Mar 05 2023
Thu Mar 02 2023
Wed Mar 01 2023