While the research involved models trained specifically to conceal motives from automated software evaluators called reward models (RMs), the broader purpose of studying hidden objectives is to ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results