Wed Oct 19 2022 - Top Trending AI Papers

Thu Oct 20 2022

Wed Oct 19 2022

Scaling Laws for Reward Model Overoptimization

N/A

AI alignment

Reinforcement learning

N/A

In reinforcement learning from human feedback, it is common to optimize against a reward model trained to predict human preferences. This research studies how the gold reward model score changes as we optimize against the proxy reward model using either reinforcement learning or best-of- sampling.

This research can help businesses in the field of AI alignment with reinforcement learning from human feedback. It provides insights into the effects of optimizing against a reward model trained to predict human preferences.

https://arxiv.org/pdf/2210.10760.pdf

https://arxiv.org/abs/2210.10760

https://twitter.com/arankomatsuzaki/status/1582894440225509376/photo/1