Thu Oct 20 2022
Wed Oct 19 2022

Scaling Laws for Reward Model Overoptimization

N/A
AI alignment
Reinforcement learning
N/A

In reinforcement learning from human feedback, it is common to optimize against a reward model trained to predict human preferences. This research studies how the gold reward model score changes as we optimize against the proxy reward model using either reinforcement learning or best-of- sampling.

This research can help businesses in the field of AI alignment with reinforcement learning from human feedback. It provides insights into the effects of optimizing against a reward model trained to predict human preferences.

Mon Oct 17 2022
Thu Oct 13 2022
Wed Oct 12 2022
Tue Oct 11 2022