Thu Jun 23 2022
Wed Jun 22 2022

reStructured Pre-training

Data Science
Machine Learning
Linguistics
Natural Language Processing
Language Modeling
Educational Assessment/Testing

Proposes a new learning paradigm for NLP tasks called reStructured Pre-training (RST) that views model pre-training and fine-tuning as a process of data storing and accessing. Achieves superior performance on 52/55 popular datasets from a variety of NLP tasks and outperforms GPT3 with 1/16 parameters on the National College Entrance Examination - English in China. Released the Gaokao Benchmark with an online submission platform.

Implement reStructured Pre-training to improve NLP tasks and language models. Utilize the Gaokao Benchmark for evaluation purposes.

Questions Are All You Need to Train a Dense Passage Retriever

Natural Language Processing
Machine Learning
Open Domain Question Answering

Introduces ART, a corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data. ART obtains state-of-the-art results on multiple QA retrieval benchmarks with only generic initialization from a pre-trained language model

Use ART to train dense retrieval models for open-domain tasks such as Open QA without the need for labeled data and task-specific losses.

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Data Science
Machine Learning
Linguistics
Natural Language Generation

Introduces GEMv2, a modular infrastructure for dataset, model, and metric developers to benefit from each other's work in natural language generation. GEMv2 supports 40 documented datasets in 51 languages and provides online evaluation and interactive data card creation and rendering tools.

Utilize GEMv2 for benchmarking NLG models and stay up to date with best model evaluation practices.

Tue Jun 21 2022
Sun Jun 19 2022
Thu Jun 16 2022
Wed Jun 15 2022