Jacob Devlin - RobustFill: Neural Program Learning under Noisy I/O (2017)
History /
Edit /
PDF /
EPUB /
BIB /
Created: April 22, 2017 / Updated: November 2, 2024 / Status: finished / 4 min read (~617 words)
Created: April 22, 2017 / Updated: November 2, 2024 / Status: finished / 4 min read (~617 words)
- What is so special about GetSpan() that it is used to solve about 20% of the test instances?
- 2 competing approaches for automatic program learning have received significant attention
- Neural program synthesis, where a neural network is conditioned on input/output examples and learns to generate a program
- Neural program induction, where a neural network generates new outputs directly using a latent program representation
- The primary task evaluated for this work is a Programming By Example (PBE) system for string transformations similar to FlashFill
- We develop a novel variants of the attentional RNN architecture to encode a variable-length unordered set of input-output examples
- For program representation, we have developed a domain-specific language (DSL) that defines an expressive class of regular expression-based string transformations
- The neural network is then used to generate a program in the DSL (for synthesis) or an output string (for induction)
- A set of input-output string examples $(I_1, O_1), \dots, (I_n, O_n)$
- A set of input strings $I_1^y, \dots, I_m^y$
- The goal is to generate the corresponding output strings $O_1^y, \dots, O_n^y$
- For each example set, we assume there exists at least one program $P$ that will correctly transform all these examples
- $P(I_1) \rightarrow O_1, \dots, P(I_n) \rightarrow O_n, P(I_1^y) \rightarrow O_1^y, \dots, P(I_n^y) \rightarrow O_n^y$
- The success metric is whether a generated program generalizes to the corresponding assessment examples, $P(I_j^y) \rightarrow O_j^y$
- $GetSpan(r_1, i_1, y_1, r_2, i_2, y_2)$: returns the substring between the $i_1^{th}$ occurrence of regex $r_1$ and the $i_2^{th}$ ocurrence of regex $r_2$, where $y_1$ and $y_2$ denotes either the start or end of the corresponding regex matches
- GetSpan() seems like a generalization (the encapsulation) of the composition of 2 regexes search with starting position for each regex, in other words, a much more advanced/complex form of SubStr()
- Using the DSL, sample programs are generated and verified to ensure they are executable, do not throw exceptions, etc.
- Given these programs, a set of input strings are generated, and their associated output strings computed using the sampled program
- We model program synthesis as a sequence-to-sequence generation task
- The observed input-output are encoded using a series of recurrent neural networks, and generate P using another RNN one token at a time
- Late pooling allows us to effectively incorporate powerful attention mechanism into the model
- The previous SotA (Parisotto et al. 2017) performed pooling at the I/O encoding level and as such, it could not exploit the attention mechanisms developed here
- The DSL used here is more expressive, especially the GetSpan() function, which was required to solve approximately 20% of the test instances
- It is possible to model both approaches (synthesis and induction) using nearly-identical network architectures
- The induction model evaluated is identical to synthesis Attention-A with late pooling, except for the following two modifications:
- Instead of generating P, the system generates the new output string $O^y$ character-by-character
- There is an additional LSTM to encode $I^y$. The decoder layer $O^y$ uses double attention on $O_j$ and $I^y$
- Induction is comparable to synthesis with a Beam = 1
- The induction model uses a beam of 3, and does not improve with a larger search because there is no way to evaluate candidates after decoding
- Devlin, Jacob, et al. "RobustFill: Neural Program Learning under Noisy I/O." arXiv preprint arXiv:1703.07469 (2017).