Home ML Papers David Silver - The Predictron: End-To-End Learning and Planning (2016)

David Silver - The Predictron: End-To-End Learning and Planning (2016)

History / Edit / PDF / EPUB / BIB /
Created: June 23, 2017 / Updated: December 21, 2025 / Status: finished / Readability: technical / 1 min read (~126 words)
machine-learning

The predictron is composed of four main components
- A state representation $\textbf{s} = f(s)$ that encodes raw input $s$
- A model $\textbf{s}'$, $\textbf{r}$, $\boldsymbol{\gamma} = m(\textbf{s}, \beta)$ that maps from internal state $\textbf{s}$ to subsequent internal state $\textbf{s}'$, internal reward $\textbf{r}$, and internal discount $\boldsymbol{\gamma}$
- A value function $v$ that outputs internal values $\textbf{v} = v(\textbf{s})$ representing the future, internal return from internal state $\textbf{s}$ onwards
- An accumulator, which combines together internal rewards, discounts, and values, into an overall estimate of value $\textbf{g}$

Silver, David, et al. "The predictron: End-to-end learning and planning." arXiv preprint arXiv:1612.08810 (2016).