reStructured Pre-training
Proposes a new learning paradigm for NLP tasks called reStructured Pre-training (RST) that views model pre-training and fine-tuning as a process of data storing and accessing. Achieves superior performance on 52/55 popular datasets from a variety of NLP tasks and outperforms GPT3 with 1/16 parameters on the National College Entrance Examination - English in China. Released the Gaokao Benchmark with an online submission platform.
Implement reStructured Pre-training to improve NLP tasks and language models. Utilize the Gaokao Benchmark for evaluation purposes.
Questions Are All You Need to Train a Dense Passage Retriever
Introduces ART, a corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data. ART obtains state-of-the-art results on multiple QA retrieval benchmarks with only generic initialization from a pre-trained language model
Use ART to train dense retrieval models for open-domain tasks such as Open QA without the need for labeled data and task-specific losses.
GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Introduces GEMv2, a modular infrastructure for dataset, model, and metric developers to benefit from each other's work in natural language generation. GEMv2 supports 40 documented datasets in 51 languages and provides online evaluation and interactive data card creation and rendering tools.
Utilize GEMv2 for benchmarking NLG models and stay up to date with best model evaluation practices.