Tue Dec 20 2022 - Top Trending AI Papers

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta Optimizers

Transformer

Language Models

NLP

Explains language models as meta-optimizers and understands ICL as a kind of implicit finetuning.

Insights on the working mechanism of ICL and the potential to utilize this understanding for future model designing.

https://arxiv.org/pdf/2212.10559.pdf

https://arxiv.org/abs/2212.10559

https://twitter.com/arankomatsuzaki/status/1605377138181697537/photo/1

Self-Instruct: Aligning Language Model with Self Generated Instructions

GPT-3

Pre-trained Language Models

NLP

Language Modelling

Provides an almost annotation-free method for aligning pre-trained language models with instructions.

Improvement of instruction-following capabilities of language models without relying on extensive manual annotation.

https://arxiv.org/pdf/2212.10560.pdf

https://arxiv.org/abs/2212.10560

https://github.com/yizhongw/self-instruct

https://twitter.com/arankomatsuzaki/status/1605376611007614976/photo/1

Pretraining Without Attention

Transformer

Pre-training

State-Space Models

NLP

Proposes Bidirectional Gated SSM (BiGS) which replicates BERT pretraining results without attention.

Employs recent routing layers based on state-space models (SSM) and multiplicative gating model architectures.

https://arxiv.org/pdf/2212.10544.pdf

https://arxiv.org/abs/2212.10544

https://twitter.com/arankomatsuzaki/status/1605381338697060352/photo/1

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization

Social commonsense knowledge graphs

Dialogue systems

Natural language processing

Training a generalizable conversation agent

Contextualizing social commonsense knowledge for dialogue distillation

Improving dialogue consistency, specificity, and naturalness

Distills 1.5M socially-grounded dialogues from InstructGPT, which human evaluation shows are more consistent, specific, and natural than prior human-authored datasets. Trains a generalizable conversation agent outperforming previous best-performing agents on both in- and out-of-domain datasets.

Can use SODA to train a conversation agent that performs better than previous best-performing agents on both in- and out-of-domain datasets. SODA dialogues are more consistent, specific, and natural than prior human-authored datasets. Can contextualize social commonsense knowledge from a knowledge graph to distill dialogue. COSMO (trained using SODA) is significantly more natural and consistent on unseen datasets than best-performing dialogue models.

https://arxiv.org/pdf/2212.10465.pdf

https://arxiv.org/abs/2212.10465

https://twitter.com/arankomatsuzaki/status/1605389730182332416/photo/1

Character-Aware Models Improve Visual Text Rendering

Character-level input features

Image generation

Text-to-image models

Improving text rendering

Visual spelling improvements

Trains a suite of image generation models, and show that character-aware variants outperform their character-blind counterparts across a range of novel text rendering tasks. Investigates the extent to which popular text-to-image models lack character-level input features, making it much harder to predict a word's visual makeup as a series of glyphs.

Character-aware models provide large gains on a novel spelling task and outperform their character-blind counterparts on a range of text rendering tasks. Investigates the extent to which popular text-to-image models lack character-level input features, making it much harder to predict a word's visual makeup as a series of glyphs. These models can set a higher state-of-the-art on visual spelling and can achieve 30+ point accuracy gains over competitors on rare words despite training on fewer examples.

https://arxiv.org/pdf/2212.10562.pdf

https://arxiv.org/abs/2212.10562

https://twitter.com/arankomatsuzaki/status/1605375956562382849/photo/1