While training a language model using reinforcement learning from human feedback (RLHF), reward models are typically tuned to ...