Human trainers offer conversations and rank the responses. These reward models support figure out the ideal solutions. To maintain education the chatbot, customers can upvote or downvote its response by clicking on thumbs-up or thumbs-down icons beside the answer. Consumers may give additional created responses to further improve and high-quality-t