The Technology
Consider the task of designing a robot capable of performing a complex human task such as dishwashing, driving or clothes ironing. Although natural for adult humans, designing a hard-coded algorithm for such a robot can be a daunting challenge. Difficulties in accurately modeling the robot and its interaction with the environment, creating hand-crafted features from the high-dimensional sensor data, and the requirement that the robot be able to adapt to new situations are just a few of these obstacles.
This technology is based on a general scheme that combines several reinforcement learning techniques that might be used to tackle such challenges. As a proof a concept, the scheme’s was implemented and applied it to the challenging problem of autonomous highway steering.
Advantages
- Leveraging the weak supervision abilities of a (human) instructor, who can provide coherent and learnable instantaneous reward signals to the computerized trainee.
- Effective acquisition of instantaneous reward from an instructor and accurate
- modeling of the reward function are required for a successful application of the proposed framework.
Applications and Opportunities
- Harnessing the supervision abilities of a (human) instructor, for the purpose of learning an effective reward model, will become a critical building block in creating robots capable of adjusting themselves to human needs.