Mon Apr 03 2023
Sun Apr 02 2023

A Survey of Large Language Models

Neural networks
Natural Language Processing
NLP
Language understanding and generation

This paper reviews the recent advances of Large Language Models (LLMs) by introducing the background, key findings, and mainstream techniques. In particular, it focuses on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation.

LLMs could revolutionize the way how we develop and use AI algorithms.

3D-aware Image Generation using 2D Diffusion Models

Generative models
Computer Vision
Multiview image set generation

This paper introduces a novel 3D-aware image generation method that leverages 2D diffusion models. It allows generation of high-quality images that significantly outperform prior methods, even with large view angles, using a large-scale dataset (ImageNet).

This approach can improve the generative modeling power of businesses with respect to 3D images.

Self-Refine: Iterative Refinement with Self-Feedback

Neural networks
Natural Language Processing
Text generation

This paper presents a novel approach that allows Large Language Models (LLMs) to iteratively refine outputs and incorporate feedback along multiple dimensions to improve performance on diverse tasks. It does not require supervised training data or reinforcement learning and works with a single LLM.

Self-Refine could improve the quality of text generated by businesses for tasks ranging from review rewriting to math reasoning, without the need for additional training data or models.

GlyphDraw: Learning to Draw Chinese Characters in Image Synthesis Models Coherently

OCR techniques
Image generation
Improve text coherence in generated images

Recent breakthroughs in the field of language-guided image generation have yielded impressive achievements, enabling the creation of high-quality and diverse images based on user instructions. Although the synthesis performance is fascinating, one significant limitation of current image generation models is their insufficient ability to generate coherent text within images, particularly for complex glyph structures like Chinese characters.

This research introduces GlyphDraw, a general learning framework aiming at endowing image generation models with the capacity to generate images embedded with coherent text, specifically Chinese characters. The model is open-domain, and extensive qualitative and quantitative experiments demonstrate that it produces accurate Chinese characters as in prompts, and also naturally blends the generated text into the background.

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

Masked Auto-encoding (MAE)
Pre-trained visual representations (PVRs)
Embodied artificial intelligence (EAI) tasks
Improve pre-trained visual representations (PVRs)
Advance embodied artificial intelligence (EAI) tasks

This paper presents the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) or visual foundation models for Embodied AI. They curated CortexBench, consisting of 17 different tasks spanning locomotion, navigation, dexterous, and mobile manipulation. They systematically evaluated existing PVRs and found that none are universally dominant.

The paper introduces VC-1, the largest model outperforming all prior PVRs on average but does not universally dominate either. They show that task or domain-specific adaptation of VC-1 leads to substantial gains with VC-1 (adapted) achieving competitive or superior performance than the best-known results on all of the benchmarks in CortexBench. These models required over 10,000 GPU-hours to train and can be found on their website for the benefit of the research community.

Fri Mar 31 2023
Thu Mar 30 2023
Wed Mar 29 2023
Tue Mar 28 2023