Thu Jul 28 2022 - Top Trending AI Papers

Tue Aug 02 2022

Thu Jul 28 2022

Efficient Training of Language Models to Fill in the Middle

Language modeling

Natural Language Processing

Machine Learning

Improving language model accuracy

Automated text generation

Text augmentation for natural language processing tasks

Shows that autoregressive LMs can learn to infill text after we apply a straightforward transformation to the dataset, which simply moves a span of text from the middle of a document to its end.

Training autoregressive language models with a large fraction of data transformed using fill-in-the-middle (FIM) data augmentation does not harm the original generative capability. FIM is a simple and efficient method that can be used to improve language models.

https://arxiv.org/pdf/2207.14255.pdf

https://arxiv.org/abs/2207.14255

https://twitter.com/arankomatsuzaki/status/1552824408263118849/photo/1