Kwiatkowski, Zettlemoyer, Goldwater, Steedman 2011
Lexical Generalization in CCG Grammar Induction for Semantic Parsing
EMNLP 2011, GeoQuery, ATIS
lexicon item -> (lexeme, template) pair
- lexeme: (word span, [constant1, constant2])
all the constants of h are included in lexeme.
Kwiatkowski, Choi, Artzi, Zettlemoyer, 2013
Scaling Semantic Parsers with On-the-fly Ontology Matching
EMNLP 2013, Freebase QA, GeoQuery
ontological mismatch problem
At first, GeoQuery / ATIS dataset is too small, predicates and utterances are not that much. Learning a parsing model is easy.
If a database has more predicates and thus more capable to answer more questions in theory, the amount of possible utterance can go even further.
What’s worse, new utterances linguistically involve more predicates in theory, but database schema is fixed and supports only limited predicates.
convert to underspecified LF
- predefined set 56 lexical categories (WordNet)
- 49 domain-independent lexical items (English only)
- underspefified constants are type placeholders
list of operators:
- collapsing operator
- expansion operator
- constant matching
CKY-style chart parser, threshold pruning, …
find correct samples and wrong samples, and update parameter
Goldwasser et al. 2011
Confidence Driven Unsupervised Semantic Parsing
if a non-random model produces a prediction pattern multiplt times it is likely to be an indication of an underlying phenomenon in the data.
output structures which fall close to the center of mass of these statistics will receive a high confidence score.
use a confidence driven EM-like learning will significantly improve the model compared with using only prediction score
- Prop(x, z): proportion of #pred_in_z and #words_in_x
- AvProp(S): Average over sets
- PropScore(S, (x,z)) = AvProp(S) - Prop(x, z)
use the latter approach to filter out unlikely candidates and ranks the remaining ones using the former approaches
Grounded Unsupervised Semantic Parsing
ACL 2013, unsupervised, database schema, ATIS
- leverage database schema
- start from dep-parse, and added states for mismatching between dep- and sem-parse
- semantic not needed to train: datetime, logical connector, numerics
- superlatives are applied to the most restricted case
assign nodes and edges in a dep-parse to various states.
- these states directly come from database schema
- for NL/parse-MR mismatch, add more states (Raising / Sinking)
- devise a lexical trigger from DB values, DASH(Pantel et al. 2009) is used to get additional word-pair
- inference using tree-Viterbi and inside-outside algo.
- weights learned from feature-rich EM