久久影院一区二区三区-久久影院午夜伦手机不四虎卡-久久影院毛片一区二区-久久影视一区-在线精品91青草国产在线观看-在线激情小视频

<Back

Reinforcement Learning from Diverse Human Preferences

Wanqi Xue, Bo An, Shuicheng Yan, Zhongwen Xu

IJCAI 2024 Conference

August 2024

Keywords: Reinforcement Learning, Human Preferences, Human Feedback, Rewards

Abstract:

The complexity of designing reward functions has been a major obstacle to the wide application of deep reinforcement learning (RL) techniques. Describing an agent s desired behaviors and properties can be difficult, even for experts. A new paradigm called reinforcement learning from human preferences (or preference-based RL) has emerged as a promising solution, in which reward functions are learned from human preference labels among behavior trajectories. However, existing methods for preference-based RL are limited by the need for accurate oracle preference labels. This paper addresses this limitation by developing a method for crowd-sourcing preference labels and learning from diverse human preferences. The key idea is to stabilize reward learning through regularization and correction in a latent space. To ensure temporal consistency, a strong constraint is imposed on the reward model that forces its latent space to be close to the prior distribution. Additionally, a confidence-based reward model ensembling method is designed to generate more stable and reliable predictions. The proposed method is tested on a variety of tasks in DMcontrol and Meta-world and has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback, paving the way for real-world applications of RL methods.

View More PDF>>

主站蜘蛛池模板: 建阳市| 垫江县| 苗栗市| 昆明市| 新建县| 瑞丽市| 平阴县| 长海县| 鸡泽县| 卓资县| 涪陵区| 额济纳旗| 泉州市| 中山市| 团风县| 福鼎市| 永宁县| 杭锦旗| 惠州市| 洛川县| 鄂托克前旗| 马尔康县| 大荔县| 连南| 西藏| 于田县| 张家界市| 株洲市| 开封县| 碌曲县| 磐石市| 长岭县| 奉贤区| 康定县| 延长县| 安康市| 乐山市| 从化市| 桐城市| 宽甸| 永川市|