Sun Apr 02 2023 - Top Trending AI Papers

A Survey of Large Language Models

Neural networks

Natural Language Processing

NLP

Language understanding and generation

This paper reviews the recent advances of Large Language Models (LLMs) by introducing the background, key findings, and mainstream techniques. In particular, it focuses on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation.

LLMs could revolutionize the way how we develop and use AI algorithms.

https://arxiv.org/pdf/2303.18223.pdf

https://arxiv.org/abs/2303.18223

https://twitter.com/arankomatsuzaki/status/1642686213147738112/photo/1

3D-aware Image Generation using 2D Diffusion Models

Generative models

Computer Vision

Multiview image set generation

This paper introduces a novel 3D-aware image generation method that leverages 2D diffusion models. It allows generation of high-quality images that significantly outperform prior methods, even with large view angles, using a large-scale dataset (ImageNet).

This approach can improve the generative modeling power of businesses with respect to 3D images.

https://arxiv.org/pdf/2303.17905.pdf

https://arxiv.org/abs/2303.17905

https://jeffreyxiang.github.io/ivid/

https://twitter.com/_akhaliq/status/1642691321163915265/video/1

Self-Refine: Iterative Refinement with Self-Feedback

Neural networks

Natural Language Processing

Text generation

This paper presents a novel approach that allows Large Language Models (LLMs) to iteratively refine outputs and incorporate feedback along multiple dimensions to improve performance on diverse tasks. It does not require supervised training data or reinforcement learning and works with a single LLM.

Self-Refine could improve the quality of text generated by businesses for tasks ranging from review rewriting to math reasoning, without the need for additional training data or models.

https://selfrefine.info/

https://arxiv.org/pdf/2303.17651.pdf

https://arxiv.org/abs/2303.17651

https://twitter.com/arankomatsuzaki/status/1642689008491962368/photo/1

GlyphDraw: Learning to Draw Chinese Characters in Image Synthesis Models Coherently

OCR techniques

Image generation

Improve text coherence in generated images

Recent breakthroughs in the field of language-guided image generation have yielded impressive achievements, enabling the creation of high-quality and diverse images based on user instructions. Although the synthesis performance is fascinating, one significant limitation of current image generation models is their insufficient ability to generate coherent text within images, particularly for complex glyph structures like Chinese characters.

This research introduces GlyphDraw, a general learning framework aiming at endowing image generation models with the capacity to generate images embedded with coherent text, specifically Chinese characters. The model is open-domain, and extensive qualitative and quantitative experiments demonstrate that it produces accurate Chinese characters as in prompts, and also naturally blends the generated text into the background.

https://arxiv.org/pdf/2303.17870.pdf

https://arxiv.org/abs/2303.17870

https://twitter.com/_akhaliq/status/1642696550529867779/photo/1

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

Masked Auto-encoding (MAE)

Pre-trained visual representations (PVRs)

Embodied artificial intelligence (EAI) tasks

Improve pre-trained visual representations (PVRs)

Advance embodied artificial intelligence (EAI) tasks

This paper presents the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) or visual foundation models for Embodied AI. They curated CortexBench, consisting of 17 different tasks spanning locomotion, navigation, dexterous, and mobile manipulation. They systematically evaluated existing PVRs and found that none are universally dominant.

The paper introduces VC-1, the largest model outperforming all prior PVRs on average but does not universally dominate either. They show that task or domain-specific adaptation of VC-1 leads to substantial gains with VC-1 (adapted) achieving competitive or superior performance than the best-known results on all of the benchmarks in CortexBench. These models required over 10,000 GPU-hours to train and can be found on their website for the benefit of the research community.

https://arxiv.org/pdf/2303.18240.pdf

https://arxiv.org/abs/2303.18240

https://eai-vc.github.io/

https://github.com/facebookresearch/eai-vc

https://twitter.com/_akhaliq/status/1642718436118724609/photo/1