HIL-SERL is a system for training state of the art manipulation policies using Reinforcement Learning.
- We first tele-operate the robot to collect positive and negative samples and train a binary reward classifier.
- We then collect a small set of demonstrations, which is added to the demo buffer at the start of RL training.
- During online training, we use the binary classifier as a sparse reward signal and provide human interventions. Initially, we provide more frequent interventions to demonstrate ways of solving the task from various states and prevent the robot from performing undesirable behavior. We gradually reduce the amount of interventions as the policy reaches higher success rate and faster cycle times.