Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision

Chen Liang∗, Jonathan Berant†, Quoc Le, Kenneth D. Forbus, Ni Lao

  • Northwestern University, Evanston, IL {chenliang2013,forbus}
  • Tel-Aviv University, Tel Aviv-Yafo, Israel
  • Google Inc., Mountain View, CA {qvl,nlao}


  • To use NN to language understanding and symbolic reasoning
    • complex operations
    • external memory
  • Existing neural program induction methods are
    • differentiable (Memory Network)
    • low-level (Neural Turing Machine)
    • limited to small synthetic dataset
  • The proposed MPC framework
    • non-differentiable
    • support abstract, scalable and precise operations

the Neural Symbolic Machines

the metaphor:

  • Manager: input data (and reward as weak supervision)
  • Programmer: seq-to-seq induction model
  • Computer: high-level language program execution



Interpreter for Lisp-style programs:


the program is equivalent to a subset of lambda calculus


Tokens are restricted based on the current state: $ ( \to Hop \to v $, the next must be a relation reachable from v.

like typing constraint in function matching may adopt derivation rather than program tokens



  • GRU for both encoder and decoder
  • attention similar to (Dong and Lapata, 2016)


key-variable memory

  • key: embedding
  • value: a reference in computer, thus symbolic and non-differentiable

differs with TransE, where each entities have their own embedding and differs with MemNet and Hybrid computing (published in Nature)


at encoding

resolved (similar to STAGG (Yih et al. 2015)) entities get:

  • key: avg. of GRU embeddings within the entity span
  • value: the entity token (“m.12345”)

input: GloVe 300d embedding -> 50d embedding

at decoding

an expression gets executed immediately when ended (encounter the right parenthesis “)”), the executed result is saved to memory:

  • key: GRU output at the last step of the expression
  • value: executed result (an entity or a list of entity)

freebase id: /domain/type/property, e.g. /people/person/parents concatenate:

  • avg “people” and “person” (GloVe)
  • “parents”

300+300=600d -> 50d


  • abstract: lisp expressions rather than low-level
  • scalable: executable on the whole freebase
  • precise: use exact tokens rather than embeddings


reinforcement narrative

given query $q$: state, action, reward $(s_t, a_t, r_t)$

$a_t \in A(s_t)$ is the possible actions given by the computer

REINFORCE algorithm (Williams, 1992)

Use beam search rather than sampling to compute the gradient


sparse reward signal in a large search space: non-zero reward program receive very low probability

  • converge slowly
  • local optima

Find Approximate Gold Program

new training objective:

the $a_{0:T}^{best}(q)$ is the program shortest length and highest reward (excluding question that gives no positive reward)

Still large search space for some complex programs: use curriculum learning: gradually add the program complexity (functions used and length of program)

  • first 10 iterations for ML training, using only Hop function and length 2
  • iterations again, using both Hop and Equal with length 3

Augmented REINFORCE Training

like linear combined ML and RL objectives in [Game Go] and [Google Trans]

the $a_{0:T}^{best}(q)$ receive probability $\alpha=0.1$ and others $1-\alpha$.






different training methods:


curriculum methods:


to avoid overfitting: