Wed Apr 05 2023 - Top Trending AI Papers

Generative Novel View Synthesis with 3D-Aware Diffusion Models

3D Modeling

Computer Vision

Generative Models

Virtual product prototyping

A diffusion-based model is presented for 3D-aware generative novel view synthesis from a single input image, using a 3D feature volume as a latent feature field to improve generation of view-consistent novel renderings, as well as synthesize 3D-consistent sequences.

Offers potential for generating diverse and plausible novel views from limited input, and can aid in tasks such as virtual product prototyping.

https://arxiv.org/pdf/2304.02602.pdf

https://arxiv.org/abs/2304.02602

https://nvlabs.github.io/genvs/

https://twitter.com/_akhaliq/status/1643790003779059715/video/1

Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models

Image Generation

Generative Models

Computer Vision

E-commerce

Retail

This paper presents a framework using an encoder to capture high-level identifiable semantics of objects and produce an object-specific embedding for image generation, which is then passed to a text-to-image synthesis model using a joint training scheme to preserve object identity. The proposed method allows for generation of diverse content and styles, conditioned on both text and objects, without the need for test-time optimization.

Has potential to improve product customization and personalization in e-commerce and retail.

https://arxiv.org/pdf/2304.02642.pdf

https://arxiv.org/abs/2304.02642

https://twitter.com/_akhaliq/status/1643790485306081280/photo/1

HNeRV: A Hybrid Neural Representation for Videos

Video Regression

Computer Vision

Deep Learning

Video processing

Video compression

Video editing

HNeRV is a hybrid neural representation for videos that uses a learnable encoder to generate content-adaptive embeddings for decoding, allowing for better internal generalization and regression capacity. The model outperforms implicit methods in video regression tasks and shows decoding advantages for speed, flexibility, and deployment, compared to traditional codecs and learning-based compression methods. The effectiveness of HNeRV is explored on downstream tasks such as video compression and video inpainting.

Has potential to improve video processing tasks for businesses such as video compression and editing.

https://arxiv.org/pdf/2304.02633.pdf

https://arxiv.org/abs/2304.02633

https://haochen-rye.github.io/HNeRV/

https://github.com/haochen-rye/HNeRV

https://twitter.com/_akhaliq/status/1643815982283083777/photo/1

Evaluation of Large Language Models in Arithmetic Tasks

Mathematics

Natural Language Processing

Automated math problem solving

Large language models have shown abilities in solving math word problems but there is no work that focuses on evaluating their arithmetic ability. In this paper, the authors propose an arithmetic dataset to test the latest large language models and provide a detailed analysis of their arithmetic ability.

The paper provides insights on the ability of large language models in solving arithmetic problems which can be useful in developing AI-powered solutions for automated arithmetic tasks.

https://arxiv.org/pdf/2304.02015.pdf

https://arxiv.org/abs/2304.02015

https://twitter.com/_akhaliq/status/1643790826672103425/photo/1

Bimodality Driven 3D Dance Generation via Music-Text Integration

Dance motion generation

Computer Vision

Natural Language Processing

Entertainment industry

The paper proposes a novel task for generating 3D dance movements that incorporate both text and music modalities. The authors propose to utilize a 3D human motion VQ-VAE to project the motions of the two datasets into a latent space consisting of quantized vectors, and a cross-modal transformer to integrate text instructions into motion generation architecture for generating 3D dance movements. The paper also introduces two novel metrics to measure the coherence and freezing percentage of the generated motion.

The paper presents a new approach to generate 3D dance movements that integrate both text and music modalities, which can be useful in developing AI-powered solutions for the entertainment industry.

https://arxiv.org/pdf/2304.02419.pdf

https://arxiv.org/abs/2304.02419

https://garfield-kh.github.io/TM2D/

https://twitter.com/_akhaliq/status/1643814801364201474/photo/1