Wed Apr 12 2023 - Top Trending AI Papers

DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion

Fashion technology

Computer vision

Image-to-video synthesis

Fashion marketing and advertising

Virtual try-ons

Online shopping avatars

Method for generating animated fashion videos from still images using a pretrained text-to-image model, finetuned with a novel strategy and architectural changes. Results in state-of-the-art fashion video animation.

Can improve marketing and advertising campaigns by creating more engaging and realistic fashion videos. Can be used to showcase products and designs in a more interactive way. Can also be used for virtual try-ons or creating realistic avatars for online shopping.

https://arxiv.org/pdf/2304.06025.pdf

https://arxiv.org/abs/2304.06025

https://grail.cs.washington.edu/projects/dreampose/

https://github.com/johannakarras/DreamPose

https://twitter.com/_akhaliq/status/1646335862391681026/video/1

Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA

Machine learning

Neural networks

Artificial intelligence

Text-to-image models

Product and service customization

Recommendation systems

Image classification

Proposes a new method for continually self-regularized low-rank adaptation in cross attention layers of text-to-image diffusion models, to prevent catastrophic forgetting when adding new concepts. Outperforms baselines for continual customization and achieves new state-of-the-art for rehearsal-free continual learning in image classification.

Can improve customization and personalization of products and services, such as recommending new products to customers based on their previous purchases or preferences. Can also be used for image classification tasks that require continual learning or adaptation to new concepts.

https://arxiv.org/pdf/2304.06027.pdf

https://arxiv.org/abs/2304.06027

https://jamessealesmith.github.io/continual-diffusion/

https://twitter.com/_akhaliq/status/1646336099113893894/photo/1

ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning

Artificial intelligence

Machine learning

Natural language processing

Large language models

Multilingual NLP applications

Language model selection and customization

Evaluates the performance of ChatGPT, a large language model, on 7 different tasks and 37 diverse languages with high, medium, low, and extremely low resources. Results show worse performance for different NLP tasks and languages compared to previous models, highlighting the need for further research in multilingual learning.

Can inform the development of multilingual NLP applications and technologies, and improve the understanding of the limitations and challenges in this field. Can also guide the selection and customization of language models for specific tasks and languages.

https://arxiv.org/pdf/2304.05613.pdf

https://arxiv.org/abs/2304.05613

https://twitter.com/_akhaliq/status/1646349179541659648/photo/1

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation

Generative models

Artificial Intelligence

Text-to-image generation

ImageReward is a general-purpose text-to-image human preference reward model that aligns generative models with human values and preferences. It outperforms existing methods in terms of understanding human preferences in text-to-image synthesis, making it a promising automatic metric for evaluating and improving text-to-image synthesis.

Businesses can use ImageReward to improve their text-to-image generation processes by better aligning them with human preferences and values.

https://arxiv.org/pdf/2304.05977.pdf

https://arxiv.org/abs/2304.05977

https://github.com/THUDM/ImageReward

https://twitter.com/_akhaliq/status/1646365315977158656/photo/1

Training Large Language Models Efficiently with Sparsity and Dataflow

Sparsity

Dataflow

Artificial Intelligence

Language models

Text generation

This paper demonstrates an end-to-end training flow on a large language model - 13 billion GPT - using sparsity and dataflow, which enables efficient on-chip irregular memory accesses and native kernel fusion and pipelined parallelism. The resulting model achieves the same quality as the dense GPT 13B model while achieving an end-end speedup of 4.5x over a dense A100 baseline.

Businesses can use sparsity and dataflow to train large language models more efficiently, reducing the compute power required and making it easier to train larger models.

https://arxiv.org/pdf/2304.05511.pdf

https://arxiv.org/abs/2304.05511

https://twitter.com/_akhaliq/status/1646349743725879297/photo/1