Wed Jun 22 2022 - Top Trending AI Papers

reStructured Pre-training

Data Science

Machine Learning

Linguistics

Natural Language Processing

Language Modeling

Educational Assessment/Testing

Proposes a new learning paradigm for NLP tasks called reStructured Pre-training (RST) that views model pre-training and fine-tuning as a process of data storing and accessing. Achieves superior performance on 52/55 popular datasets from a variety of NLP tasks and outperforms GPT3 with 1/16 parameters on the National College Entrance Examination - English in China. Released the Gaokao Benchmark with an online submission platform.

Implement reStructured Pre-training to improve NLP tasks and language models. Utilize the Gaokao Benchmark for evaluation purposes.

https://arxiv.org/pdf/2206.11147.pdf

https://arxiv.org/abs/2206.11147

https://twitter.com/arankomatsuzaki/status/1539779699399745536/photo/1

Questions Are All You Need to Train a Dense Passage Retriever

Natural Language Processing

Machine Learning

Open Domain Question Answering

Introduces ART, a corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data. ART obtains state-of-the-art results on multiple QA retrieval benchmarks with only generic initialization from a pre-trained language model

Use ART to train dense retrieval models for open-domain tasks such as Open QA without the need for labeled data and task-specific losses.

https://arxiv.org/pdf/2206.10658.pdf

https://arxiv.org/abs/2206.10658

https://twitter.com/arankomatsuzaki/status/1539778356995948546/photo/1

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Data Science

Machine Learning

Linguistics

Natural Language Generation

Introduces GEMv2, a modular infrastructure for dataset, model, and metric developers to benefit from each other's work in natural language generation. GEMv2 supports 40 documented datasets in 51 languages and provides online evaluation and interactive data card creation and rendering tools.

Utilize GEMv2 for benchmarking NLG models and stay up to date with best model evaluation practices.

https://arxiv.org/pdf/2206.11249.pdf

https://arxiv.org/abs/2206.11249

https://twitter.com/arankomatsuzaki/status/1539774739018788865/photo/1