What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment

Hongyuan Mei, Mohit Bansal, Matthew R. Walter Toyota Technological Institute at Chicago Chicago, IL 60637 {hongyuan,mbansal,mwalter}@ttic.edu


the task: given a set of structural dataset, generate a description sentence (two main phases)


Previous work is either:

  • template-based
  • domain-dependent(parsers, NER tools, features)
  • learning contenct selection and surface realization separately

A neural encoder-aligner-decoder

  • encoder is essential
  • coarse-to-fine aligner because selective generation requires identifying the small number of salient records among an over-determined database
  • beam search v.s. greedy decoder


LSTM-based encoder-decoder with attention:


Common prob. model

LSTM Encoder

each output $h_j$ is a concatenation of output in both direction

Coarse-to-fine Aligner

Only a small subset of salient records are relevant to the output sentence.

1. concat both hidden repr and original record

2. use a pre-selector assign a prob. to each record

the summation of all p_j can be regarded as an approx. to the total number of selected records(gamma)

3. use a standard aligner to compute alignment prob. at timestep t

4. the refiner produce final selection decision



regularization term:

  • prob. in pre-selector should be close to some value
  • at least one record should be selected



  • hidden units = 500 from {250, 500, 750} (WeatherGov)
  • gamma = 8.5 from {6.5, 7.5, 8.5, 10.5, 12.5} (WeatherGov)
  • gamma = 5.0 from {1.0, 2.0, …, 6.0} (RoboCup)
  • mini-batch = 100
  • Adam, coverage up to 30


  • F1 for content selection
  • s(tandard)BLEU (4-gram) for surface realization
  • c(ustomized)BLEU not to penalize numerical derivations within 5 (low 58 = low 60)
  • sBLEUg for the case that ground-truth content selection is given



Beam Filter with k-NN

beam search performs worse than greedy search:


k-NN steps:

  1. do a standard beam search with size M
  2. find the K neighbors (examples: pair of database,description) from training data for the given record
  3. compute BLEU score with these K neighbors for each of the M candidates
  4. choose the best candidate (not clear about sum over K or just choose the best)


ablation on WeatherGov

aligner(compared with attention in NMT(Bahdanau et al. 2014))




qualitative analysis on WeatherGov

  • good match: windDir and southeast, temperature and 71
  • bad match: cloudy and temperature/precipitation


Embedding matrix

  • init. with the pretrained and then refine (better)
  • concatenate the matrix and the pretrained