Hongyu Gong and Suma Bhat and Pramod Viswanath email@example.com, firstname.lastname@example.org, email@example.com Department of Electrical and Computer Engineering University of Illinois at Urbana Champaign, USA
How are meaning of words composed in a whole? How to get a sentence representation from word embeddings?
The non-compositionality phenomenons:
- Idiomatic expressions:
- by and large, spill the beans, part of speech
- Figurative speech
- Extended meaning:
- Ensure that one bad egg doesn’t spoil good businesses for those who care for their clientele
- I don’t know which hen is laying the bad egg but it explodes when I crask it.
- Sarcastic sentence:
- I love going to the dentist. Being waiting for it all week!
- The girl is an angel because she is so kind to the children.
- First study on context-dependent phrase compositionality with embeddings
- First resource-independent study on sarcasm and metaphor identification
- The context word vectors lie in a low-dimensional linear subspace.
- compositionality turns out to project the word embeddings onto the context subspace.
Compositionality and Geometry
Given a sentences with words and embeddings
PCA subspace spaned by a set of vectors:
settings $d = 200, n\approx [10 , 20]$ and hyperparameter $m \approx 3$
Given the embedding $v$ of a single word (metaphor and sarcasm) or a bigram phrase (MWE), the projection
compositionality score: cosine similarity of $v$ and $v’_p$
to consider multiple word senses, use MSSG representation (Neelakantan et al., 2014)
Multi-Word Expression: compositionality
- polysemous words and phrase from: The FreeDictionary and ChineseDictionary
- context from polyglot and GoogleBooks
Multi-Word Expression: Idiomaticity
- English Noun Compounds dataset (Reddy et al., 2011)
- English Verb Particle Constructions (Bannard, 2006)
- German Noun Compounds (Schulte im Walde et al., 2013)
- PMI, higher more non-compositional
- average sentence embeddings
- state-of-the-art (Salehi et al., 2014a)
- methods use word definitions(ALLDEFS), synonyms(SYN) and idiom tags(ITAG) from wikitionary
It’s so nice that a cute video of saving an animal can quickly turn the comments into political debates and racist attacks.
nice contradicts debate and attack
- Use selected twitter data. (Ghosh et al., 2015)
- Choose only six words occuring frequently enough.
and using a theshold-based classifier gives a result which could serve as a baseline for the future work
- Reddit irony dataset (Wallace et al., 2014): 3020 comments, 10401 sentences annotated.
- (Wallace et al., 2014) baseline:
- feature: bag-of-words and punctuations
- linear kernel SVM
- grid search on 5-fold cross validation
- Use smallest k scores of different POS words as feautres
- same supervised system with baseline
- Dataset: English sentences with uses of S+V+O and Adj+Noun structures (Tsvetkov et al. 2014)
- baseline (Tsvetkov et al. 2014) use feature engineering and WordNet and MRC psycholinguistic database
use unsupervised compositionality score for some POS words (since the dataset has specific syntactic structure)
- lowest score in SVO (at least on word is inconsistent with the context)
- verb score (verbs are often used in metaphor)
- ration between the lowest ratio and the highest (relative score rather than absolute score value)
- the minimum of (v/subj, subj/v, v/obj, obj/v) (relative score between verb and subj or object)
- lowest score in AN
- highest score
- ratio between the highest and the lowest
all these are fed into a random forest (Tsvetkov et al. 2014)
study on methods with neural networks such as LSTM in the future.