## Bordes et al. 2014b

Open Question Answering with Weakly Supervised Embedding Models

### Intro

question -> vector embedding triples -> vector embedding

then (simple factual) questions can be answered by comparing vector similarities

merit:

• no grammars predefined
• usable on any KB schema
• without the needs to define an intermediate LF

### question generation

1. pick a triple randomly
2. generate randomly one of seed questions

weak training signal and noisy data.

• syntactically simple
• may be semantically wrong

### embedding

scoring function

where f and g are embedding vectors, and phi and psi are binary representation of questions (bag of words) and triples

### training

generated (weak) supervised training set:

1. sample a pair $(q_i, t_i)$ from D
2. create a negative pair $t'_i, \text{ such that }t'_i\ne t_i$
3. SGD update to $\min\, [0.1 - f(q_i)^Tg(t_i) + f(q_i)^Tg(t'_i)]_+$
4. normalize embedding vectors

optimize the both. (alternating training steps)

### fine tuning

when SGD stops, many correct answers are near the top but not at the first place.

since the only use of embedding is to do dot-product, add another dot matrix:

and loss function is changed as:

## Bordes et al. 2014a

### tags

EMNLP-2014, WebQuestions

• encoding not only a triple but also a subgraph
• inference for longer path

training and loss function is similar above, and so is the paraphrase multitask training

### one-hot vector representation

to train an embedding matrix, there kinds of one-hot representation are tried:

• single entity: only the answer entity is set 1 in the one-hot vector
• path representation: answer is encoded as path the question entity to answer entity, (triple or quadruplet)
• subgraph representation: all path, and all entity and connected entities are included

### inference

to avoid enumerate all triples, choose a candidate answer set first. either one:

• C1: all direct connected entities
• C2: all 1-hop triples (1.5x score) and beam search top 10 relation type (and thus corresponding entities)

for the question with multiple answers, the binary representation of these entities are averaged.

## Bordes et al. 2015

Large-scale Simple Question Answering with Memory Networks

### tags

SimpleQuestions

old system problem:

• how do existing systems perform outside the existing question template
• whether model trained on a dataset transfers well on other datasets
• whether such systems can learn from different training sources (to capture all questions)
• Reasoning depends on KB structure, but deliberatly curated KB like freebase can answer much more questions

### MemNN

#### Input module:

• multiple objects for the same subject and relation are merged into a set
• mediator nodes are removed
• facts in binary vector, similar for ReVerb facts but processed with Generalization module

#### Generalization module:

• use (Lin et al. 2012)
• at least one alias of freebase entity matched with Reverb entity string

otherwise, use bag of words for Reverb facts

#### Output module:

1. generate a candidate set: question -> freebase entities -> filter some -> find two with the most links
2. scoring using cosine distance for two embedding vectors

#### Response module:

just returns the set of objects of the selecting supporting fact

### train

dataset:

• SimpleQuestions: question, triple fact
• WebQuestions(distant supervision): question, answer entity alias
• automatic question generated from FB: question, triple