Bordes et al. 2014b

Open Question Answering with Weakly Supervised Embedding Models

tags

ECML PKDD 2014, ReVerb, paraphrase, multitask learning, WikiAnswers

Intro

question -> vector embedding triples -> vector embedding

then (simple factual) questions can be answered by comparing vector similarities

merit:

  • no grammars predefined
  • usable on any KB schema
  • without the needs to define an intermediate LF

question generation

  1. pick a triple randomly
  2. generate randomly one of seed questions

QQ20161014-0@2x.png

weak training signal and noisy data.

  • syntactically simple
  • may be semantically wrong

embedding

scoring function

where f and g are embedding vectors, and phi and psi are binary representation of questions (bag of words) and triples

training

generated (weak) supervised training set:

  1. sample a pair from D
  2. create a negative pair
  3. SGD update to
  4. normalize embedding vectors

multitask learning: for the paraphrase task, to optimize

optimize the both. (alternating training steps)

fine tuning

when SGD stops, many correct answers are near the top but not at the first place.

since the only use of embedding is to do dot-product, add another dot matrix:

and loss function is changed as:

Bordes et al. 2014a

Question Answering with Subgraph Embeddings

tags

EMNLP-2014, WebQuestions

  • encoding not only a triple but also a subgraph
  • inference for longer path

training and loss function is similar above, and so is the paraphrase multitask training

one-hot vector representation

to train an embedding matrix, there kinds of one-hot representation are tried:

  • single entity: only the answer entity is set 1 in the one-hot vector
  • path representation: answer is encoded as path the question entity to answer entity, (triple or quadruplet)
  • subgraph representation: all path, and all entity and connected entities are included

inference

to avoid enumerate all triples, choose a candidate answer set first. either one:

  • C1: all direct connected entities
  • C2: all 1-hop triples (1.5x score) and beam search top 10 relation type (and thus corresponding entities)

for the question with multiple answers, the binary representation of these entities are averaged.

Bordes et al. 2015

Large-scale Simple Question Answering with Memory Networks

tags

SimpleQuestions

old system problem:

  • how do existing systems perform outside the existing question template
  • whether model trained on a dataset transfers well on other datasets
  • whether such systems can learn from different training sources (to capture all questions)
  • Reasoning depends on KB structure, but deliberatly curated KB like freebase can answer much more questions

MemNN

Input module:

  • multiple objects for the same subject and relation are merged into a set
  • mediator nodes are removed
  • facts in binary vector, similar for ReVerb facts but processed with Generalization module

Generalization module:

link ReVerb to freebase:

  • use (Lin et al. 2012)
  • at least one alias of freebase entity matched with Reverb entity string

otherwise, use bag of words for Reverb facts

Output module:

  1. generate a candidate set: question -> freebase entities -> filter some -> find two with the most links
  2. scoring using cosine distance for two embedding vectors

Response module:

just returns the set of objects of the selecting supporting fact

train

dataset:

  • SimpleQuestions: question, triple fact
  • WebQuestions(distant supervision): question, answer entity alias
  • automatic question generated from FB: question, triple
  • paraphrase (multitask training)

loss function is similar