Bordes et al. 2014b

Open Question Answering with Weakly Supervised Embedding Models


ECML PKDD 2014, ReVerb, paraphrase, multitask learning, WikiAnswers


question -> vector embedding triples -> vector embedding

then (simple factual) questions can be answered by comparing vector similarities


  • no grammars predefined
  • usable on any KB schema
  • without the needs to define an intermediate LF

question generation

  1. pick a triple randomly
  2. generate randomly one of seed questions


weak training signal and noisy data.

  • syntactically simple
  • may be semantically wrong


scoring function

where f and g are embedding vectors, and phi and psi are binary representation of questions (bag of words) and triples


generated (weak) supervised training set:

  1. sample a pair from D
  2. create a negative pair
  3. SGD update to
  4. normalize embedding vectors

multitask learning: for the paraphrase task, to optimize

optimize the both. (alternating training steps)

fine tuning

when SGD stops, many correct answers are near the top but not at the first place.

since the only use of embedding is to do dot-product, add another dot matrix:

and loss function is changed as:

Bordes et al. 2014a

Question Answering with Subgraph Embeddings


EMNLP-2014, WebQuestions

  • encoding not only a triple but also a subgraph
  • inference for longer path

training and loss function is similar above, and so is the paraphrase multitask training

one-hot vector representation

to train an embedding matrix, there kinds of one-hot representation are tried:

  • single entity: only the answer entity is set 1 in the one-hot vector
  • path representation: answer is encoded as path the question entity to answer entity, (triple or quadruplet)
  • subgraph representation: all path, and all entity and connected entities are included


to avoid enumerate all triples, choose a candidate answer set first. either one:

  • C1: all direct connected entities
  • C2: all 1-hop triples (1.5x score) and beam search top 10 relation type (and thus corresponding entities)

for the question with multiple answers, the binary representation of these entities are averaged.

Bordes et al. 2015

Large-scale Simple Question Answering with Memory Networks



old system problem:

  • how do existing systems perform outside the existing question template
  • whether model trained on a dataset transfers well on other datasets
  • whether such systems can learn from different training sources (to capture all questions)
  • Reasoning depends on KB structure, but deliberatly curated KB like freebase can answer much more questions


Input module:

  • multiple objects for the same subject and relation are merged into a set
  • mediator nodes are removed
  • facts in binary vector, similar for ReVerb facts but processed with Generalization module

Generalization module:

link ReVerb to freebase:

  • use (Lin et al. 2012)
  • at least one alias of freebase entity matched with Reverb entity string

otherwise, use bag of words for Reverb facts

Output module:

  1. generate a candidate set: question -> freebase entities -> filter some -> find two with the most links
  2. scoring using cosine distance for two embedding vectors

Response module:

just returns the set of objects of the selecting supporting fact



  • SimpleQuestions: question, triple fact
  • WebQuestions(distant supervision): question, answer entity alias
  • automatic question generated from FB: question, triple
  • paraphrase (multitask training)

loss function is similar