Mon Feb 13 2023
Sun Feb 12 2023

Binarized Neural Machine Translation

Natural Language Processing
Neural networks
Machine translation

Proposes a novel binarization technique for Transformers applied to machine translation (BMT) that leverages additional LayerNorms and residual connections to improve binarization quality. Experiments show that a one-bit weight-only Transformer can achieve the same quality as a float one, while being 16x smaller in size.

Implementing BMT can improve machine translation models by reducing their size without compromising performance.

Scaling Vision Transformers to 22 Billion Parameters

Computer Vision
Vision Models
Image and video modelling

Presents a recipe for highly efficient and stable training of a 22B-parameter ViT (ViT-22B) and demonstrates improving performance, fairness, robustness, and alignment with scale.

Scaling ViT-22B can lead to improvements in image and video modelling for various use cases.

The Wisdom of Hindsight Makes Language Models Better Instruction Followers

Reinforcement Learning
Natural Language Processing
Language instruction alignment

Proposes a novel algorithm, Hindsight Instruction Relabeling (HIR), for aligning language models with instructions, which converts feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner. HIR outperforms the baseline algorithms and is comparable to supervised finetuning on 12 challenging BigBench reasoning tasks.

Implementing HIR can improve alignment between language models and instructions without additional training pipelines.

Sat Feb 11 2023
Thu Feb 09 2023
Wed Feb 08 2023
Mon Feb 06 2023