Mon Feb 20 2023
Sun Feb 19 2023

Pretraining Language Models with Human Preferences

Human-Computer Interaction
Natural Language Processing
Machine Learning
Improving text generation in natural language processing
Enhancing customer service chatbots
Developing language models for automated content creation

Pretraining with human feedback results in much better preference satisfaction than standard LM pretraining followed by finetuning with feedback.

Implementing conditional training, or learning distribution over tokens conditional on their human preference scores given by a reward model, can reduce the rate of undesirable content by up to an order of magnitude when generating without a prompt and with an adversarially-chosen prompt. Pretraining with human feedback results in much better preference satisfaction than standard LM pretraining followed by finetuning with feedback.

Thu Feb 16 2023
Wed Feb 15 2023
Tue Feb 14 2023
Mon Feb 13 2023