David Silver - The Predictron: End-To-End Learning and Planning (2016)
History /
Edit /
PDF /
EPUB /
BIB /
Created: June 23, 2017 / Updated: November 2, 2024 / Status: finished / 1 min read (~126 words)
Created: June 23, 2017 / Updated: November 2, 2024 / Status: finished / 1 min read (~126 words)
- The predictron is composed of four main components
- A state representation $\textbf{s} = f(s)$ that encodes raw input $s$
- A model $\textbf{s}'$, $\textbf{r}$, $\boldsymbol{\gamma} = m(\textbf{s}, \beta)$ that maps from internal state $\textbf{s}$ to subsequent internal state $\textbf{s}'$, internal reward $\textbf{r}$, and internal discount $\boldsymbol{\gamma}$
- A value function $v$ that outputs internal values $\textbf{v} = v(\textbf{s})$ representing the future, internal return from internal state $\textbf{s}$ onwards
- An accumulator, which combines together internal rewards, discounts, and values, into an overall estimate of value $\textbf{g}$
- Silver, David, et al. "The predictron: End-to-end learning and planning." arXiv preprint arXiv:1612.08810 (2016).