What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment

Hongyuan Mei, Mohit Bansal, Matthew R. Walter Toyota Technological Institute at Chicago Chicago, IL 60637 {hongyuan,mbansal,mwalter}@ttic.edu

## Introduction

the task: given a set of structural dataset, generate a description sentence (two main phases)

Previous work is either:

• template-based
• domain-dependent(parsers, NER tools, features)
• learning contenct selection and surface realization separately

A neural encoder-aligner-decoder

• encoder is essential
• coarse-to-fine aligner because selective generation requires identifying the small number of salient records among an over-determined database
• beam search v.s. greedy decoder

## model

LSTM-based encoder-decoder with attention:

### LSTM Encoder

each output $h_j$ is a concatenation of output in both direction

### Coarse-to-fine Aligner

Only a small subset of salient records are relevant to the output sentence.

1. concat both hidden repr and original record

2. use a pre-selector assign a prob. to each record

the summation of all p_j can be regarded as an approx. to the total number of selected records(gamma)

3. use a standard aligner to compute alignment prob. at timestep t

4. the refiner produce final selection decision

### training

regularization term:

• prob. in pre-selector should be close to some value
• at least one record should be selected

## Experiments

Setup:

• hidden units = 500 from {250, 500, 750} (WeatherGov)
• gamma = 8.5 from {6.5, 7.5, 8.5, 10.5, 12.5} (WeatherGov)
• gamma = 5.0 from {1.0, 2.0, …, 6.0} (RoboCup)
• mini-batch = 100
• Adam, coverage up to 30

Metric:

• F1 for content selection
• s(tandard)BLEU (4-gram) for surface realization
• c(ustomized)BLEU not to penalize numerical derivations within 5 (low 58 = low 60)
• sBLEUg for the case that ground-truth content selection is given

### Beam Filter with k-NN

beam search performs worse than greedy search:

k-NN steps:

1. do a standard beam search with size M
2. find the K neighbors (examples: pair of database,description) from training data for the given record
3. compute BLEU score with these K neighbors for each of the M candidates
4. choose the best candidate (not clear about sum over K or just choose the best)

### ablation on WeatherGov

aligner(compared with attention in NMT(Bahdanau et al. 2014))

encoder:

### qualitative analysis on WeatherGov

• good match: windDir and southeast, temperature and 71
• bad match: cloudy and temperature/precipitation

### Embedding matrix

• init. with the pretrained and then refine (better)
• concatenate the matrix and the pretrained