Wed Mar 08 2023 - Top Trending AI Papers

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

Chatbots

Natural Language Processing

Computer Vision

Customer service communication

Collaborative content creation

Visual editing and proofreading

Visual ChatGPT allows the user to interact with ChatGPT by sending and receiving not only languages but also images, providing complex visual questions or visual editing instructions that require the collaboration of multiple AI models with multi-steps, and providing feedback and asking for corrected results.

Visual ChatGPT can improve customer experience by allowing them to communicate with the company through a co-creative process that includes visual elements and AI models, potentially improving the effectiveness and accuracy of the interaction.

https://arxiv.org/pdf/2303.04671.pdf

https://arxiv.org/abs/2303.04671

https://twitter.com/arankomatsuzaki/status/1633643718325735424/photo/1

Magnushammer: A Transformer-based Approach to Premise Selection

Theorem proving systems

Natural Language Processing

Automated theorem proving

Mathematics research

Computer science research

Magnushammer is a neural transformer-based approach that can outperform traditional symbolic systems in solving the fundamental problem of automated theorem proving, achieving a 59.5% proof rate compared to a 38.3% proof rate of Sledgehammer, the most mature and popular symbolic-based solver.

Magnushammer can increase the efficiency and accuracy of automated theorem proving tasks, potentially saving time and resources in fields such as mathematics and computer science.

https://arxiv.org/pdf/2303.04488.pdf

https://arxiv.org/abs/2303.04488

https://twitter.com/arankomatsuzaki/status/1633656468334587904/photo/1

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

Image segmentation systems

Computer Vision

Natural Language Processing

E-commerce product image segmentation

Advertising visual content creation

Medical image analysis

ODISE is a system that unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation, achieving significant improvements over the previous state of the art in both open-vocabulary panoptic and semantic segmentation tasks.

ODISE can improve the accuracy and quality of image segmentation tasks, potentially benefiting industries such as e-commerce and advertising that rely on visual content.

https://jerryxu.net/ODISE/

https://arxiv.org/pdf/2303.04803.pdf

https://arxiv.org/abs/2303.04803

https://twitter.com/arankomatsuzaki/status/1633642031129194499/video/1

TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation

Computer Vision

AI model training

generative sampling

Improves FID of single step diffusion by up to 2.4x and achieves new single-step DDIM SotA FID (7.4 for ImageNet64).

Can improve generative sampling through denoising diffusion models with a new method called TRAnsitive Closure Time-distillation (TRACT). This method extends binary time-distillation (BTD) and can improve the FID of single-step diffusion by up to 2.4x on the same architecture. The PyTorch implementation will be released soon. It can be used to generate good samples with fewer iterations in processes that require generative sampling.

https://arxiv.org/pdf/2303.04248.pdf

https://arxiv.org/abs/2303.04248

https://twitter.com/arankomatsuzaki/status/1633642395807150081/photo/1

X-Avatar: Expressive Human Avatars

Computer Vision

Computer Graphics

telepresence, AR/VR

A new avatar model that captures the full expressiveness of digital humans to bring about life-like experiences in telepresence, AR/VR and beyond.

X-Avatar is a new avatar model that can capture the full expressiveness of digital humans to create life-like experiences in telepresence, AR/VR, and other areas. It can be learned from either full 3D scans or RGB-D data and models bodies, hands, facial expressions, and appearance in a holistic fashion. X-Avatar outperforms strong baselines in both data domains both quantitatively and qualitatively on the animation task. To facilitate research on expressive avatars, the authors also contribute a new dataset called X-Humans, containing 233 sequences of high-quality textured scans from 20 participants, totalling 35,500 data frames.

https://skype-line.github.io/projects/X-Avatar/

https://github.com/Skype-line/X-Avatar

https://arxiv.org/pdf/2303.04805.pdf

https://arxiv.org/abs/2303.04805

https://twitter.com/arankomatsuzaki/status/1633641468983402496/video/1