DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion
Method for generating animated fashion videos from still images using a pretrained text-to-image model, finetuned with a novel strategy and architectural changes. Results in state-of-the-art fashion video animation.
Can improve marketing and advertising campaigns by creating more engaging and realistic fashion videos. Can be used to showcase products and designs in a more interactive way. Can also be used for virtual try-ons or creating realistic avatars for online shopping.
Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA
Proposes a new method for continually self-regularized low-rank adaptation in cross attention layers of text-to-image diffusion models, to prevent catastrophic forgetting when adding new concepts. Outperforms baselines for continual customization and achieves new state-of-the-art for rehearsal-free continual learning in image classification.
Can improve customization and personalization of products and services, such as recommending new products to customers based on their previous purchases or preferences. Can also be used for image classification tasks that require continual learning or adaptation to new concepts.
ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning
Evaluates the performance of ChatGPT, a large language model, on 7 different tasks and 37 diverse languages with high, medium, low, and extremely low resources. Results show worse performance for different NLP tasks and languages compared to previous models, highlighting the need for further research in multilingual learning.
Can inform the development of multilingual NLP applications and technologies, and improve the understanding of the limitations and challenges in this field. Can also guide the selection and customization of language models for specific tasks and languages.
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
ImageReward is a general-purpose text-to-image human preference reward model that aligns generative models with human values and preferences. It outperforms existing methods in terms of understanding human preferences in text-to-image synthesis, making it a promising automatic metric for evaluating and improving text-to-image synthesis.
Businesses can use ImageReward to improve their text-to-image generation processes by better aligning them with human preferences and values.
Training Large Language Models Efficiently with Sparsity and Dataflow
This paper demonstrates an end-to-end training flow on a large language model - 13 billion GPT - using sparsity and dataflow, which enables efficient on-chip irregular memory accesses and native kernel fusion and pipelined parallelism. The resulting model achieves the same quality as the dense GPT 13B model while achieving an end-end speedup of 4.5x over a dense A100 baseline.
Businesses can use sparsity and dataflow to train large language models more efficiently, reducing the compute power required and making it easier to train larger models.