Rewarding Chatbots for Real-World Engagement with Millions of Users
This paper presents an approach that prioritizes user engagement to enhance retention of social chatbots, specifically examining the use of human feedback to efficiently develop highly engaging chatbots. The proposed approach uses automatic pseudo-labels collected from user interactions to train a reward model that can be used to reject low-scoring sample responses generated by the chatbot model at inference time. A/B testing on groups of 10,000 new daily chatbot users on the Chai Research platform shows that this approach increases the mean conversation length by up to 70%, which translates to a more than 30% increase in user retention for a GPT-J 6B model.
Implementing this approach could improve user engagement and retention for social chatbots, which could lead to increased customer satisfaction and sales for businesses that use them. Training chatbots using automatic pseudo-labels and a reward model could also lead to more efficient development of highly engaging chatbots.