Sun Feb 19 2023 - Top Trending AI Papers

Mon Feb 20 2023

Sun Feb 19 2023

Pretraining Language Models with Human Preferences

Human-Computer Interaction

Natural Language Processing

Machine Learning

Improving text generation in natural language processing

Enhancing customer service chatbots

Developing language models for automated content creation

Pretraining with human feedback results in much better preference satisfaction than standard LM pretraining followed by finetuning with feedback.

Implementing conditional training, or learning distribution over tokens conditional on their human preference scores given by a reward model, can reduce the rate of undesirable content by up to an order of magnitude when generating without a prompt and with an adversarially-chosen prompt. Pretraining with human feedback results in much better preference satisfaction than standard LM pretraining followed by finetuning with feedback.

https://arxiv.org/pdf/2302.08582.pdf

https://arxiv.org/abs/2302.08582

https://twitter.com/arankomatsuzaki/status/1627482114982830080/photo/1