Thu Apr 06 2023
Wed Apr 05 2023

Generative Novel View Synthesis with 3D-Aware Diffusion Models

3D Modeling
Computer Vision
Generative Models
Virtual product prototyping

A diffusion-based model is presented for 3D-aware generative novel view synthesis from a single input image, using a 3D feature volume as a latent feature field to improve generation of view-consistent novel renderings, as well as synthesize 3D-consistent sequences.

Offers potential for generating diverse and plausible novel views from limited input, and can aid in tasks such as virtual product prototyping.

Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models

Image Generation
Generative Models
Computer Vision
E-commerce
Retail

This paper presents a framework using an encoder to capture high-level identifiable semantics of objects and produce an object-specific embedding for image generation, which is then passed to a text-to-image synthesis model using a joint training scheme to preserve object identity. The proposed method allows for generation of diverse content and styles, conditioned on both text and objects, without the need for test-time optimization.

Has potential to improve product customization and personalization in e-commerce and retail.

HNeRV: A Hybrid Neural Representation for Videos

Video Regression
Computer Vision
Deep Learning
Video processing
Video compression
Video editing

HNeRV is a hybrid neural representation for videos that uses a learnable encoder to generate content-adaptive embeddings for decoding, allowing for better internal generalization and regression capacity. The model outperforms implicit methods in video regression tasks and shows decoding advantages for speed, flexibility, and deployment, compared to traditional codecs and learning-based compression methods. The effectiveness of HNeRV is explored on downstream tasks such as video compression and video inpainting.

Has potential to improve video processing tasks for businesses such as video compression and editing.

Evaluation of Large Language Models in Arithmetic Tasks

Mathematics
Natural Language Processing
Automated math problem solving

Large language models have shown abilities in solving math word problems but there is no work that focuses on evaluating their arithmetic ability. In this paper, the authors propose an arithmetic dataset to test the latest large language models and provide a detailed analysis of their arithmetic ability.

The paper provides insights on the ability of large language models in solving arithmetic problems which can be useful in developing AI-powered solutions for automated arithmetic tasks.

Bimodality Driven 3D Dance Generation via Music-Text Integration

Dance motion generation
Computer Vision
Natural Language Processing
Entertainment industry

The paper proposes a novel task for generating 3D dance movements that incorporate both text and music modalities. The authors propose to utilize a 3D human motion VQ-VAE to project the motions of the two datasets into a latent space consisting of quantized vectors, and a cross-modal transformer to integrate text instructions into motion generation architecture for generating 3D dance movements. The paper also introduces two novel metrics to measure the coherence and freezing percentage of the generated motion.

The paper presents a new approach to generate 3D dance movements that integrate both text and music modalities, which can be useful in developing AI-powered solutions for the entertainment industry.

Tue Apr 04 2023
Mon Apr 03 2023
Sun Apr 02 2023
Fri Mar 31 2023