Few-shot Learning with Retrieval Augmented Language Model
Atlas reaches over 42% accuracy on Natural Questions using only 64 examples, outperforming PaLM by 3% despite having 50x fewer parameters.
Atlas is able to learn knowledge intensive tasks with very few training examples. It performs well on a wide range of tasks, including question answering and fact checking. It reaches over 42% accuracy on Natural Questions using only 64 examples, outperforming a larger model by 3% with significantly fewer parameters.
Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models
22B BTM, a communication-efficient LLM, performs as well as a Transformer LM trained with 2.5x more compute.
BTM is an algorithm for embarrassingly parallel training of large language models. It learns independent expert LMs specialized to different textual domains, which can be added or removed to update data coverage, ensembled to generalize to new domains, or averaged to collapse back to a single LM for efficient inference. Experiments show that BTM improves perplexities when compared to Transformer LMs, when controlling for training cost.