Mon Jan 23 2023 - Top Trending AI Papers

Zorro: the masked multimodal transformer

Computer vision

Multimodal processing

Deep learning

Audio-visual self-supervised learning

Inference evaluation of audio-visual models on audio or video benchmarks

Achieves SotA on most relevant benchmarks for multimodal tasks (AudioSet and VGGSound) by using masks to control how inputs from each modality are routed inside Transformers.

Provides a technique for multimodal processing that keeps some parts of the representation modality-pure, leading to state-of-the-art results on most relevant benchmarks for multimodal tasks.

https://arxiv.org/pdf/2301.09595.pdf

https://arxiv.org/abs/2301.09595

https://twitter.com/arankomatsuzaki/status/1617704907041312768/photo/1

InfiniCity: Infinite-Scale City Synthesis

Computer graphics

3D modeling

Neural rendering

3D city environment synthesis

Proposes a novel framework, InfiniCity, which constructs and renders an unconstrainedly large and 3D-grounded environment from random noises.

Provides a framework for arbitrary-scale and traversable 3D city environments synthesis, allowing flexible and interactive editing from users.

https://hubert0527.github.io/infinicity/

https://arxiv.org/pdf/2301.09637.pdf

https://arxiv.org/abs/2301.09637

https://twitter.com/arankomatsuzaki/status/1617701509810253826/video/1

Is ChatGPT A Good Translator? A Preliminary Study

Language translation

Natural language processing

Deep learning

Machine translation

Translation robustness

ChatGPT performs competitively with commercial translation products (e.g., Google Translate) on high-resource European languages but lags behind significantly on low-resource or distant languages.

Provides a preliminary evaluation of ChatGPT for machine translation, showing that it performs well on high-resource European languages but struggles on low-resource or distant languages. A strategy named 'pivot translation' is proposed to overcome this issue.

https://arxiv.org/pdf/2301.08745.pdf

https://arxiv.org/abs/2301.08745

https://twitter.com/arankomatsuzaki/status/1617708239906549761/photo/1

StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis

Generative Adversarial Networks

Computer Vision

Natural Language Processing

Image generation from text for advertising or product visualization

Automated report generation with accompanying images

Image-based learning and training for employees

Proposes StyleGAN-T, a model that significantly improves over previous GANs and outperforms distilled diffusion models in terms of sample quality and speed for large-scale text-to-image synthesis.

StyleGAN-T can provide businesses with a faster and more efficient method for generating high-quality images from text.

https://sites.google.com/view/stylegan-t/

https://arxiv.org/pdf/2301.09515.pdf

https://arxiv.org/abs/2301.09515

https://twitter.com/arankomatsuzaki/status/1617702831150239746/photo/1

Improving Performance of Chain-of-Thought Prompting via Self-Consistency Decoding Strategy

Pre-trained large language models

Natural Language Processing

Reasoning and Decision Making

Automated reasoning for fraud detection in finance

Crisis management decision-making with automated reasoning

Automated legal reasoning for compliance and contract review

Introduces self-consistency, a decoding strategy that significantly boosts the performance of chain-of-thought prompting on complex reasoning tasks such as arithmetic and commonsense reasoning benchmarks.

Self-consistency can improve the accuracy and efficiency of automated reasoning, which can be useful in decision-making processes and problem-solving for businesses.

https://arxiv.org/pdf/2203.11171.pdf

https://arxiv.org/abs/2203.11171

https://twitter.com/arankomatsuzaki/status/1617545566774255616/photo/1