Carlos Florensa - Reverse Curriculum Generation for Reinforcement Learning (2017)
History /
Edit /
PDF /
EPUB /
BIB /
Created: November 4, 2017 / Updated: November 2, 2024 / Status: finished / 3 min read (~402 words)
Created: November 4, 2017 / Updated: November 2, 2024 / Status: finished / 3 min read (~402 words)
- Given a start and end state, how is inversing the two actually any better?
- It looks like this is basically creating some sort of field that indicates the direction to take from the current position to the end position
- It would mean this method needs the ability to generate cases from the end, which is not something that is possible in video games for instances
- It looks like this is basically creating some sort of field that indicates the direction to take from the current position to the end position
- How is the basic RL approach different than A*?
- RL applied to the robotics domain where one can start from the end state and learn its way backwards until it reaches the start state
- We propose a method to learn tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved. The robot is trained in "reverse", gradually learning to reach the goal from a set of start states increasingly far from the goal
- In our work, we avoid all reward engineering or use of demonstrations by exploiting two key insights
- It's easier to reach the goal from states nearby the goal, or from states nearby where the agent already knows how to reach the goal
- Applying random actions from one such state leads the agent to new feasible nearby states, from where it is not too much harder to reach the goal
- Our approach can be understood as sequentially composing locally stabilizing controllers by growing a tree of stabilized trajectories backwards from the goal state
- This can be viewed as a "funnel" which takes start states to the goal state via a series of locally valid policies
- Assumption 1: We can arbitrarily reset the agent into any start state $s_0 \in \mathcal{S}$ at the beginning of all trajectories
- Assumption 2: At least one state $s^g$ is provided such that $s^g \in S^g$
- Assumption 3: The Markov Chain induced by taking uniformly sampled random actions has a communicating class including all start states $S^0$ and the given goal state $s^g$
- A limitation of the current approach is that it generates start states that grow from a single goal uniformly outwards, until they cover the original start state distribution $\text{Unif}(S^0)$
- Florensa, Carlos, et al. "Reverse Curriculum Generation for Reinforcement Learning." arXiv preprint arXiv:1707.05300 (2017).