Tue Jan 24 2023
Mon Jan 23 2023

Zorro: the masked multimodal transformer

Computer vision
Multimodal processing
Deep learning
Audio-visual self-supervised learning
Inference evaluation of audio-visual models on audio or video benchmarks

Achieves SotA on most relevant benchmarks for multimodal tasks (AudioSet and VGGSound) by using masks to control how inputs from each modality are routed inside Transformers.

Provides a technique for multimodal processing that keeps some parts of the representation modality-pure, leading to state-of-the-art results on most relevant benchmarks for multimodal tasks.

InfiniCity: Infinite-Scale City Synthesis

Computer graphics
3D modeling
Neural rendering
3D city environment synthesis

Proposes a novel framework, InfiniCity, which constructs and renders an unconstrainedly large and 3D-grounded environment from random noises.

Provides a framework for arbitrary-scale and traversable 3D city environments synthesis, allowing flexible and interactive editing from users.

Is ChatGPT A Good Translator? A Preliminary Study

Language translation
Natural language processing
Deep learning
Machine translation
Translation robustness

ChatGPT performs competitively with commercial translation products (e.g., Google Translate) on high-resource European languages but lags behind significantly on low-resource or distant languages.

Provides a preliminary evaluation of ChatGPT for machine translation, showing that it performs well on high-resource European languages but struggles on low-resource or distant languages. A strategy named 'pivot translation' is proposed to overcome this issue.

StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis

Generative Adversarial Networks
Computer Vision
Natural Language Processing
Image generation from text for advertising or product visualization
Automated report generation with accompanying images
Image-based learning and training for employees

Proposes StyleGAN-T, a model that significantly improves over previous GANs and outperforms distilled diffusion models in terms of sample quality and speed for large-scale text-to-image synthesis.

StyleGAN-T can provide businesses with a faster and more efficient method for generating high-quality images from text.

Improving Performance of Chain-of-Thought Prompting via Self-Consistency Decoding Strategy

Pre-trained large language models
Natural Language Processing
Reasoning and Decision Making
Automated reasoning for fraud detection in finance
Crisis management decision-making with automated reasoning
Automated legal reasoning for compliance and contract review

Introduces self-consistency, a decoding strategy that significantly boosts the performance of chain-of-thought prompting on complex reasoning tasks such as arithmetic and commonsense reasoning benchmarks.

Self-consistency can improve the accuracy and efficiency of automated reasoning, which can be useful in decision-making processes and problem-solving for businesses.

Thu Jan 19 2023
Wed Jan 18 2023
Tue Jan 17 2023
Mon Jan 16 2023