Wed Dec 21 2022
Tue Dec 20 2022

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta Optimizers

Transformer
Language Models
NLP

Explains language models as meta-optimizers and understands ICL as a kind of implicit finetuning.

Insights on the working mechanism of ICL and the potential to utilize this understanding for future model designing.

Self-Instruct: Aligning Language Model with Self Generated Instructions

GPT-3
Pre-trained Language Models
NLP
Language Modelling

Provides an almost annotation-free method for aligning pre-trained language models with instructions.

Improvement of instruction-following capabilities of language models without relying on extensive manual annotation.

Pretraining Without Attention

Transformer
Pre-training
State-Space Models
NLP

Proposes Bidirectional Gated SSM (BiGS) which replicates BERT pretraining results without attention.

Employs recent routing layers based on state-space models (SSM) and multiplicative gating model architectures.

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization

Social commonsense knowledge graphs
Dialogue systems
Natural language processing
Training a generalizable conversation agent
Contextualizing social commonsense knowledge for dialogue distillation
Improving dialogue consistency, specificity, and naturalness

Distills 1.5M socially-grounded dialogues from InstructGPT, which human evaluation shows are more consistent, specific, and natural than prior human-authored datasets. Trains a generalizable conversation agent outperforming previous best-performing agents on both in- and out-of-domain datasets.

Can use SODA to train a conversation agent that performs better than previous best-performing agents on both in- and out-of-domain datasets. SODA dialogues are more consistent, specific, and natural than prior human-authored datasets. Can contextualize social commonsense knowledge from a knowledge graph to distill dialogue. COSMO (trained using SODA) is significantly more natural and consistent on unseen datasets than best-performing dialogue models.

Character-Aware Models Improve Visual Text Rendering

Character-level input features
Image generation
Text-to-image models
Improving text rendering
Visual spelling improvements

Trains a suite of image generation models, and show that character-aware variants outperform their character-blind counterparts across a range of novel text rendering tasks. Investigates the extent to which popular text-to-image models lack character-level input features, making it much harder to predict a word's visual makeup as a series of glyphs.

Character-aware models provide large gains on a novel spelling task and outperform their character-blind counterparts on a range of text rendering tasks. Investigates the extent to which popular text-to-image models lack character-level input features, making it much harder to predict a word's visual makeup as a series of glyphs. These models can set a higher state-of-the-art on visual spelling and can achieve 30+ point accuracy gains over competitors on rare words despite training on fewer examples.

Mon Dec 19 2022
Sun Dec 18 2022
Thu Dec 15 2022
Wed Dec 14 2022