SlideShare a Scribd company logo
UWB at SemEval-2016 Task 1: Semantic Textual Similarity using Lexical,
Syntactic, and Semantic Information
Tom´aˇs Brychc´ın
NTIS – New Technologies
for the Information Society,
Faculty of Applied Sciences,
University of West Bohemia,
Technick´a 8, 306 14 Plzeˇn
Czech Republic
brychcin@kiv.zcu.cz
Luk´aˇs Svoboda
Department of Computer
Science and Engineering,
Faculty of Applied Sciences,
University of West Bohemia,
Technick´a 8, 306 14 Plzeˇn
Czech Republic
svobikl@kiv.zcu.cz
Abstract
We present our UWB system for Seman-
tic Textual Similarity (STS) task at SemEval
2016. Given two sentences, the system esti-
mates the degree of their semantic similarity.
We use state-of-the-art algorithms for the
meaning representation and combine them
with the best performing approaches to STS
from previous years. These methods benefit
from various sources of information, such as
lexical, syntactic, and semantic.
In the monolingual task, our system achieve
mean Pearson correlation 75.7% compared
with human annotators. In the cross-lingual
task, our system has correlation 86.3% and is
ranked first among 26 systems.
1 Introduction
Semantic Textual Similarity (STS) is one of the
core disciplines in Natural Language Processing
(NLP). Assume we have two textual fragments
(word phrases, sentences, paragraphs, or full doc-
uments), the goal is to estimate the degree of their
semantic similarity.
STS systems are usually compared with the man-
ually annotated data. In the case of SemEval the data
consist of pairs of sentences with a score between 0
and 5 (higher number means higher semantic simi-
larity). For example, English pair
Two dogs play in the grass.
Two dogs playing in the snow.
has a score 2.8, i.e. the sentences are not equivalent,
but share some information.
This year, SemEval’s STS is extended with the
Spanish-English cross-lingual subtask, where e.g.
the pair
Tuve el mismo problema que t´u.
I had the same problem.
has a score 4.8, which means nearly equivalent.
Each year STS belongs to one of the most pop-
ular tasks at SemEval competition. The best STS
system at SemEval 2012 (B¨ar et al., 2012) used lexi-
cal similarity and Explicit Semantic Analysis (ESA)
(Gabrilovich and Markovitch, 2007). In SemEval
2013, the best model (Han et al., 2013) used se-
mantic models such as Latent Semantic Analysis
(LSA) (Deerwester et al., 1990), external informa-
tion sources (WordNet) and n-gram matching tech-
niques. For SemEval 2014 and 2015 the best sys-
tem comes from (Sultan et al., 2014a; Sultan et al.,
2014b; Sultan et al., 2015). They introduced new
algorithm, which align the words between two sen-
tences. They showed that this approach can be effi-
ciently used also for STS. Overview of systems par-
ticipating in previous SemEval competitions can be
found in (Agirre et al., 2012; Agirre et al., 2013;
Agirre et al., 2014; Agirre et al., 2015).
The best performing systems from previous years
are based on various architectures benefiting from
lexical, syntactic, and semantic information. In this
work we try to use the best techniques presented dur-
ing last years, enhance them, and combine into a sin-
gle model.
2 Semantic Textual Similarity
This section describes various techniques for esti-
mating the text similarity.
2.1 Lexical and Syntactic Similarity
This section presents the techniques exploiting lexi-
cal and syntactic information in the text. Some of
them have been successfully used by (B¨ar et al.,
2012). Many of the following techniques benefit
from the weighing of words in a sentence using Term
Frequency - Inverse Document Frequency (TF-IDF)
(Manning and Sch¨utze, 1999).
• Lemma n-gram overlaps: We compare word
n-grams in both sentences using Jaccard Simi-
larity Coefficient (JSC) (Manning and Sch¨utze,
1999). We do it separately for different or-
ders n ∈ {1, 2, 3, 4}. Containment Coefficient
(Broder, 1997) is used for orders n ∈ {1, 2}.
We extend original metrics by weighing of n-
grams. We define this weight as a sum of IDF
values of words in n-gram. N-gram match
is not counted as 1 but as the weight of this
n-gram. According to our experiments, this
weighing significantly improves performance.
We also use information about the length of
Longest Common Subsequence compared to
the length of the sentences.
• POS n-gram overlaps: In similar way as for
lemmas, we calculate Jaccard Similarity Co-
efficient and Containment Coefficient for n-
grams of part-of-speech (POS) tags. Again,
we use n-gram weighing and n ∈ {1, 2, 3, 4}.
These features exploit syntactic similarity of
the sentences.
• Character n-gram overlaps: Similarly to
lemma or POS n-grams, we use Jaccard Sim-
ilarity Coefficient and Containment Coefficient
for comparing common substrings in both sen-
tences. Here the IDF weights are computed on
character n-gram level. We use n-gram weigh-
ing and n ∈ {2, 3, 4, 5}.
We enrich these features also by Greedy String
Tiling(Wise, 1996) allowing to deal with re-
ordered text parts and by Longest Common
Substring (LCS) measuring the ration between
LCS and length of the sentences.
• TF-IDF: For each word in a sentence we calcu-
late TF-IDF. Given the word vocabulary V , the
sentence is represented as a vector of dimen-
sion |V | with TF-IDF values of words present
in the sentence. The similarity between two
sentences is expressed as cosine similarity be-
tween corresponding TF-IDF vectors.
2.2 Semantic Similarity
In this section we describe in detail the techniques
that are more semantically oriented and are based
on Distributional Hypothesis. This principle states
that we can induce (to some degree) the mean-
ing of words from their distribution in the text.
This claim has multiple theoretical roots in psychol-
ogy, structural linguistics, or lexicography (Firth,
1957; Rubenstein and Goodenough, 1965; Miller
and Charles, 1991).
• Semantic composition: This approach is
based on Frege’s principle of compositional-
ity, which states that the meaning of a com-
plex expression is determined as a compo-
sition of its parts, i.e. words. To repre-
sent the meaning of a sentence we use sim-
ple linear combination of word vectors, where
weights are represented by the TF-IDF values
of appropriate words. We use state-of-the-art
word embedding methods, namely Continuous
Bag of Words (CBOW) (Mikolov et al., 2013)
and Global Vectors (GloVe) (Pennington et al.,
2014). We use cosine similarity to compare
vectors.
• Paragraph2Vec: Paragraph vectors were pro-
posed in (Le and Mikolov, 2014) as an unsuper-
vised method of learning text representation.
Resulting feature vector has fixed dimension
while the input text can be of any length. The
paragraph vectors and word vectors are con-
catenated to predict the next word in a context.
The paragraph token acts as a memory that re-
members what information is missing from the
current context. We use cosine similarity for
comparing two paragraph vectors.
• Tree LSTM: Long Short-Term Memory
(LSTM) is a type of Recurrent Neural Network
(RNN) with a complex computational unit. We
use tree-structured representation of LSTM
presented in (Tai et al., 2015). Tree model
represents the sentence structure. RNN pro-
cesses input sentences of variable length via
recursive application of a transition function
on a hidden state vector ht. For each sentence
pair it creates sentence representations hL and
hR using Tree-LSTM model. Given these
representations, model predicts the similarity
score using a neural network considering
distance and angle between vectors.
• Word alignment: Method presented in (Sul-
tan et al., 2014a; Sultan et al., 2014b; Sultan
et al., 2015) has been very successful in last
years. Given two sentence we want to compare,
this method finds and aligns the words that have
similar meaning and similar role in these sen-
tences.
Unlike the original method, we assume that not
all word alignments have the same importance
for the meaning of the sentences. The weight
of a set of words A is a sum of word’s IDF val-
ues ω(A) =
w∈A
IDF (w), where w is a word.
Then the sentence similarity is given by
sim (S1, S2) =
ω(A1) + ω(A2)
ω(S1) + ω(S2)
, (1)
where S1 and S2 are input sentences (repre-
sented as sets of words). A1 and A2 denote the
sets of aligned words for S1 and S2, respec-
tively. The weighing of alignments improves
our results significantly.
2.3 Similarity Combination
The combination of STS techniques is in fact a re-
gression problem where the goal is to find the map-
ping from input space xi ∈ Rd of d-dimensional
real-valued vectors (each value xi,a, where 1 ≤ a ≤
d represents the single STS technique) to an output
space yi ∈ R of real-valued targets (desired seman-
tic similarity). These mapping are learned from the
training data {xi, yi}N
i=1 of size N. There exist a lot
of regression methods. We experiment with several
of them:
• Linear Regression: Linear Regression (LR) is
probably the simplest regression method. It is
defined as yi = λxi, where λ is a vector of
weights that can be estimated for example by
the least squares method.
• Gaussian processes regression: Gaussian
process regression (GPR) is nonparametric
kernel-based probabilistic model for non-linear
regression (Rasmussen and Williams, 2005).
• SVM Regression: We use Support Vector
Machines (SVM) for regression with the ra-
dial basis functions (RBF) as a kernel. We
use improved Sequential Minimal Optimiza-
tion (SMO) algorithm for parameter estimation
introduce in (Shevade et al., 2000).
• Decision Trees Regression: The output of the
Decision Trees Regression (DTR) (Breiman et
al., 1984) is predicted by the sequence of deci-
sions organized in a tree.
• Perceptron Regression: Multilayer Percep-
tron (MLP) is feed-forward artificial neural net-
work that uses back-propagation to classify in-
stances.
3 System Description
This section describes the settings of our final STS
system. For monolingual STS task we submitted
two runs. First is based on supervised learning and
the second is unsupervised system:
• UWB sup: Supervised system based on SVM
regression with RBF kernel. We use all tech-
niques described in 2 as features for regression.
During the regression we also use the simple
trick. We create another features represented as
a product of each pair of features xi,a × xi,b for
a = b. We do so to better model the depen-
dencies between single features. Together, we
have 301 STS features. The system is trained
on all SemEval datasets from prior years (see
Table 1).
• UWB unsup: Unsupervised system based only
on weighted word alignment (Section 2.2).
We handled with the cross-lingual STS task with
Spanish-English bilingual sentence pairs in two
steps. Firstly, we translated Spanish sentences to En-
glish via Google translator. The English sentences
Corpora Pairs
SemEval 2012 Train 2,234
SemEval 2012 Test 3,108
SemEval 2013 Test 1,500
SemEval 2014 Test 3,750
SemEval 2015 Test 3,000
Table 1: STS gold data from prior years.
News
Multi-
Mean RR TR
Source
UWB sup 0.9062 0.8190 0.8631 1 1
UWB unsup 0.9124 0.8082 0.8609 2 1
Table 4: Pearson correlations on cross-lingual STS
task of SemEval 2016. RR denote the run (system)
ranking and TR denote our team ranking.
were left untouched. Secondly, we used the same
STS systems as for monolingual task.
For preprocessing pipeline we used Standford
CoreNLP library (Manning et al., 2014), i.e. for to-
kenization, lemmatization and POS tagging. Most
of our STS techniques (apart from word alignment
and POS n-gram overlaps) work with lemmas in-
stead of word forms (this leads to slightly better
performance). Some of our STS techniques are
based on unsupervised learning and thus they need
large unannotated corpora to train. We trained Para-
graph2Vec, GloVe and CBOW models on One bil-
lion word benchmark presented in (Chelba et al.,
2014). Dimension of vectors for all these models
was set to 300. TF-IDF values were also estimated
on this corpus.
All regression methods mentioned in Section 2.3
are implemented in WEKA (Hall et al., 2009).
4 Results and Discussion
This section presents the results of our systems
for both English monolingual and Spanish-English
cross-lingual STS task of SemEval 2016. In addi-
tion we present detailed results on the test data from
SemEval 2015. As an evaluation measure we use
Pearson correlation between system output and hu-
man annotations.
In the tables we present the correlation for each
individual test set. Column Mean represents the
weighted sum of all correlations, where the weight
are given by the ratio of data set length compared
to the full length of all datasets together. The mean
value of Pearson correlations is also used as the main
evaluation measure for ranking the system submis-
sions.
In the Table 2 we show the results for the test data
from 2015. We trained our systems on SemEval STS
data from years 2012–2014. We provide compari-
son of individual STS techniques as well as of dif-
ferent types of regressions. Clearly, the SVM regres-
sion and Gaussian processes regression perform best
and with our feature set it is 1% better than the win-
ning system of SemEval 2015. The best performing
single technique is indisputably the weighed word
alignment correlated by 79.6% with gold data. Note
that without weighing, we achieved only 74.2% on
this data. The original result from authors of this
approach was, however, 79.2%. This is probably
caused by some inaccuracies in our implementation.
Anyway, the weighing improves the correlation even
if we compare it to the original results. Note that
for estimating regression parameters we use the data
from all years apart from 2015 (see Table 1).
The results for monolingual STS task of SemEval
2016 are shown in Table 3. In the time of writing
this paper the ranks of submitted systems were not
known. Thus we present only our correlations. We
can see that our supervised system (SVM regres-
sion) performs approximately 3% better than the un-
supervised one (weighed word alignment). On the
data from SemEval 2015 this difference was not so
significant (approximately 1.5%).
Finally, the results for cross-lingual STS task of
SemEval 2016 are shown in Table 4. We achieved
very high correlations. To be honest we must say
that we expected much lower correlation through the
fact that we use the machine translation via Google
translator causing certainly some inaccuracies (at
least in the syntax of the sentence). On the other
hand, it proves that our model efficiently generalizes
the learned patterns. Here, there is almost no differ-
ence in performance between supervised and unsu-
pervised version of submitted systems. Our submit-
ted runs finished first and second among 26 compet-
ing systems.
Model Corpora
Answers- Answers-
Belief Headlines Images Mean
forums students
Winner of SemEval 2015 0.7390 0.7725 0.7491 0.8250 0.8644 0.8015
Linear regression – all lexical 0.7053 0.7656 0.7190 0.7887 0.8246 0.7728
Linear regression – all syntactic 0.3089 0.3165 0.4570 0.2900 0.1862 0.2939
Tf-idf 0.5629 0.6043 0.6762 0.6603 0.7530 0.6593
Tree LSTM 0.4181 0.5490 0.5863 0.7324 0.8168 0.6501
Paragraph2Vec 0.5228 0.7017 0.6643 0.6562 0.7385 0.6725
CBOW composition 0.6216 0.6846 0.7258 0.6927 0.7831 0.7085
GloVe composition 0.5820 0.6311 0.7164 0.6969 0.7972 0.6936
Weighted word alignment 0.7171 0.7752 0.7632 0.8179 0.8525 0.7964
Linear regression 0.7411 0.7589 0.7739 0.8193 0.8568 0.7982
Gaussian processes regression 0.7363 0.7701 0.7846 0.8393 0.8749 0.8112
Decision trees regression 0.6700 0.6991 0.7281 0.7792 0.8206 0.7495
Perceptron regression 0.7060 0.7481 0.7467 0.8093 0.8594 0.7858
SVM regression 0.7375 0.7678 0.7846 0.8398 0.8776 0.8116
Table 2: Pearson correlations on SemEval 2015 evaluation data and comparison with the best performing
system in this year.
Model Corpora
Answer-
Headlines Plagiarism Postediting
Question-
Mean
answer question
UWB sup 0.6215 0.8189 0.8236 0.8209 0.7020 0.7573
UWB unsup 0.6444 0.7935 0.8274 0.8121 0.5338 0.7262
Table 3: Pearson correlations on monolingual STS task of SemEval 2016.
5 Conclusion
In this paper we described our UWB system par-
ticipating in SemEval 2016 competition in the task
of Semantic Textual Similarity. We participated on
both monolingual and cross-lingual parts of compe-
tition.
Our best results have been achieved by SVM re-
gression of various STS techniques based on lexical,
syntactic, and semantic information. This approach
has been shown to work well for both subtasks.
Acknowledgments
This publication was supported by the project
LO1506 of the Czech Ministry of Education, Youth
and Sports and by Grant No. SGS-2016-018 Data
and Software Engineering for Advanced Applica-
tions. Computational resources were provided by
the CESNET LM2015042 and the CERIT Scientific
Cloud LM2015085, provided under the programme
”Projects of Large Research, Development, and In-
novations Infrastructures”.
References
Eneko Agirre, Mona Diab, Daniel Cer, and Aitor
Gonzalez-Agirre. 2012. Semeval-2012 task 6: A pilot
on semantic textual similarity. In Proceedings of the
First Joint Conference on Lexical and Computational
Semantics - Volume 1: Proceedings of the Main Con-
ference and the Shared Task, and Volume 2: Proceed-
ings of the Sixth International Workshop on Seman-
tic Evaluation, SemEval ’12, pages 385–393, Strouds-
burg, PA, USA. Association for Computational Lin-
guistics.
Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-
Agirre, and Weiwei Guo. 2013. *sem 2013 shared
task: Semantic textual similarity. In Second Joint
Conference on Lexical and Computational Semantics
(*SEM), Volume 1: Proceedings of the Main Confer-
ence and the Shared Task: Semantic Textual Similarity,
pages 32–43, Atlanta, Georgia, USA, June. Associa-
tion for Computational Linguistics.
Eneko Agirre, Carmen Banea, Claire Cardie, Daniel
Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo,
Rada Mihalcea, German Rigau, and Janyce Wiebe.
2014. Semeval-2014 task 10: Multilingual seman-
tic textual similarity. In Proceedings of the 8th Inter-
national Workshop on Semantic Evaluation (SemEval
2014), pages 81–91, Dublin, Ireland, August. Associ-
ation for Computational Linguistics and Dublin City
University.
Eneko Agirre, Carmen Banea, Claire Cardie, Daniel
Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo,
Inigo Lopez-Gazpio, Montse Maritxalar, Rada Mihal-
cea, German Rigau, Larraitz Uria, and Janyce Wiebe.
2015. Semeval-2015 task 2: Semantic textual similar-
ity, english, spanish and pilot on interpretability. In
Proceedings of the 9th International Workshop on Se-
mantic Evaluation (SemEval 2015), pages 252–263,
Denver, Colorado, June. Association for Computa-
tional Linguistics.
Daniel B¨ar, Chris Biemann, Iryna Gurevych, and Torsten
Zesch. 2012. Ukp: Computing semantic textual sim-
ilarity by combining multiple content similarity mea-
sures. In Proceedings of the 6th International Work-
shop on Semantic Evaluation, held in conjunction with
the 1st Joint Conference on Lexical and Computa-
tional Semantics, pages 435–440, Montreal, Canada,
June.
Leo Breiman, Jerome Friedman, Charles J. Stone, and
R.A. Olshen. 1984. Classification and Regression
Trees. Wadsworth and Brooks, Monterey, CA.
Andrei Z. Broder. 1997. On the resemblance and con-
tainment of documents. In SEQUENCES ’97 Proceed-
ings of the Compression and Complexity of Sequences,
pages 21–29, Jun.
Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge,
Thorsten Brants, Phillipp Koehn, and Tony Robin-
son. 2014. One billion word benchmark for mea-
suring progress in statistical language modeling. In
Proceedings of the 15th Annual Conference of the In-
ternational Speech Communication Association, pages
2635–2639, Singapore, September.
Scott Deerwester, Susan T. Dumais, George W. Furnas,
Thomas K. Landauer, and Richard Harshman. 1990.
Indexing by latent semantic analysis. Journal of the
American Society for Information Science 41, pages
391–407.
John R. Firth. 1957. A Synopsis of Linguistic Theory,
1930-1955. Studies in Linguistic Analysis, pages 1–
32.
Evgeniy Gabrilovich and Shaul Markovitch. 2007. Com-
puting semantic relatedness using wikipedia-based ex-
plicit semantic analysis. In Proceedings of the 20th
International Joint Conference on Artifical Intelli-
gence, IJCAI’07, pages 1606–1611, San Francisco,
CA, USA. Morgan Kaufmann Publishers Inc.
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard
Pfahringer, Peter Reutemann, and Ian H. Witten.
2009. The weka data mining software: An update.
ACM SIGKDD Explorations Newsletter, 11(1):10–18,
November.
Lushan Han, Abhay L. Kashyap, Tim Finin, James May-
field, and Jonathan Weese. 2013. Umbc ebiquity-
core: Semantic textual similarity systems. In Second
Joint Conference on Lexical and Computational Se-
mantics (*SEM), Volume 1: Proceedings of the Main
Conference and the Shared Task: Semantic Textual
Similarity, pages 44–52, Atlanta, Georgia, USA, June.
Association for Computational Linguistics.
Quoc V. Le and Tomas Mikolov. 2014. Distributed
representations of sentences and documents. In Pro-
ceedings of the 31th International Conference on Ma-
chine Learning, ICML 2014, Beijing, China, 21-26
June 2014, pages 1188–1196.
Christopher D. Manning and Hinrich Sch¨utze. 1999.
Foundations of Statistical Natural Language Process-
ing. MIT Press, Cambridge, MA, USA.
Christopher D. Manning, Mihai Surdeanu, John Bauer,
Jenny Finkel, Steven J. Bethard, and David McClosky.
2014. The Stanford CoreNLP natural language pro-
cessing toolkit. In Association for Computational Lin-
guistics (ACL) System Demonstrations, pages 55–60.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey
Dean. 2013. Efficient estimation of word represen-
tations in vector space.
George A. Miller and Walter G. Charles. 1991. Contex-
tual correlates of semantic similarity. Language and
Cognitive Processes, 6(1):1–28.
Jeffrey Pennington, Richard Socher, and Christopher D
Manning. 2014. Glove: Global vectors for word rep-
resentation. In Proceedings of the 2014 Conference
on Empirical Methods in Natural Language Process-
ing (EMNLP), pages 1532–1543.
Carl Edward Rasmussen and Christopher K. I. Williams.
2005. Gaussian Processes for Machine Learning
(Adaptive Computation and Machine Learning). The
MIT Press.
Herbert Rubenstein and John B. Goodenough. 1965.
Contextual correlates of synonymy. Communications
of the ACM, 8(10):627–633, October.
Shirish K. Shevade, Sathiya S. Keerthi, Chiru Bhat-
tacharyya, and K.R.K. Murthy. 2000. Improvements
to the smo algorithm for svm regression. IEEE Trans-
actions on Neural Networks, 11(5):1188–1193, Sep.
Md Sultan, Steven Bethard, and Tamara Sumner. 2014a.
Back to basics for monolingual alignment: Exploit-
ing word similarity and contextual evidence. Transac-
tions of the Association for Computational Linguistics,
2:219–230.
Md Arafat Sultan, Steven Bethard, and Tamara Sum-
ner. 2014b. Dls@cu: Sentence similarity from word
alignment. In Proceedings of the 8th International
Workshop on Semantic Evaluation (SemEval 2014),
pages 241–246, Dublin, Ireland, August. Association
for Computational Linguistics and Dublin City Uni-
versity.
Md Arafat Sultan, Steven Bethard, and Tamara Sumner.
2015. Dls@cu: Sentence similarity from word align-
ment and semantic vector composition. In Proceed-
ings of the 9th International Workshop on Semantic
Evaluation (SemEval 2015), pages 148–153, Denver,
Colorado, June. Association for Computational Lin-
guistics.
Kai Sheng Tai, Richard Socher, and Christopher D. Man-
ning. 2015. Improved semantic representations from
tree-structured long short-term memory networks. In
Proceedings of the 53rd Annual Meeting of the Associ-
ation for Computational Linguistics and the 7th Inter-
national Joint Conference on Natural Language Pro-
cessing (Volume 1: Long Papers), pages 1556–1566,
Beijing, China, July. Association for Computational
Linguistics.
Michael J. Wise. 1996. Yap3: Improved detection of
similarities in computer program and other texts. In
Proceedings of the Twenty-seventh SIGCSE Technical
Symposium on Computer Science Education, SIGCSE
’96, pages 130–134, New York, NY, USA. ACM.

More Related Content

PDF
Text smilarity02 corpus_based
PDF
Learning to summarize using coherence
ODP
Ihi2012 semantic-similarity-tutorial-part1
PDF
Text summarization
PPTX
Introduction to Distributional Semantics
PDF
text summarization using amr
PDF
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
PPTX
An introduction to compositional models in distributional semantics
Text smilarity02 corpus_based
Learning to summarize using coherence
Ihi2012 semantic-similarity-tutorial-part1
Text summarization
Introduction to Distributional Semantics
text summarization using amr
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
An introduction to compositional models in distributional semantics

What's hot (20)

PDF
graph_embeddings
PDF
P13 corley
PPTX
Understanding GloVe
PDF
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
PDF
Word2Vec
PDF
Understanding Natural Languange with Corpora-based Generation of Dependency G...
PPTX
Neural Models for Information Retrieval
PPTX
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
PDF
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
PDF
Algorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
PDF
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
PDF
Improving Neural Abstractive Text Summarization with Prior Knowledge
PDF
Latent dirichletallocation presentation
PDF
[Emnlp] what is glo ve part i - towards data science
PDF
About the paper: Graph Connectivity Measures for Unsupervised Word Sense Disa...
PPTX
Word embedding
PDF
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
PDF
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
PDF
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
PPTX
Topic model, LDA and all that
graph_embeddings
P13 corley
Understanding GloVe
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Word2Vec
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Neural Models for Information Retrieval
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
Algorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
Improving Neural Abstractive Text Summarization with Prior Knowledge
Latent dirichletallocation presentation
[Emnlp] what is glo ve part i - towards data science
About the paper: Graph Connectivity Measures for Unsupervised Word Sense Disa...
Word embedding
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
Topic model, LDA and all that
Ad

Similar to semeval2016 (20)

PDF
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
PDF
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
PDF
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
PDF
TSD2013.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFORMATION
PDF
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
PDF
ssc_icml13
PDF
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
PDF
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
PDF
Automatic Essay Grading With Probabilistic Latent Semantic Analysis
PDF
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
PDF
Sentence Validation by Statistical Language Modeling and Semantic Relations
PPTX
Chat bot using text similarity approach
PDF
Fast and Accurate Preordering for SMT using Neural Networks
 
DOC
Discovering Novel Information with sentence Level clustering From Multi-docu...
PDF
New word analogy corpus
PDF
Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence Information
PDF
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
PDF
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
TSD2013.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFORMATION
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
ssc_icml13
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
Automatic Essay Grading With Probabilistic Latent Semantic Analysis
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
International Journal of Engineering Research and Development (IJERD)
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
Sentence Validation by Statistical Language Modeling and Semantic Relations
Chat bot using text similarity approach
Fast and Accurate Preordering for SMT using Neural Networks
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
New word analogy corpus
Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence Information
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
Ad

semeval2016

  • 1. UWB at SemEval-2016 Task 1: Semantic Textual Similarity using Lexical, Syntactic, and Semantic Information Tom´aˇs Brychc´ın NTIS – New Technologies for the Information Society, Faculty of Applied Sciences, University of West Bohemia, Technick´a 8, 306 14 Plzeˇn Czech Republic [email protected] Luk´aˇs Svoboda Department of Computer Science and Engineering, Faculty of Applied Sciences, University of West Bohemia, Technick´a 8, 306 14 Plzeˇn Czech Republic [email protected] Abstract We present our UWB system for Seman- tic Textual Similarity (STS) task at SemEval 2016. Given two sentences, the system esti- mates the degree of their semantic similarity. We use state-of-the-art algorithms for the meaning representation and combine them with the best performing approaches to STS from previous years. These methods benefit from various sources of information, such as lexical, syntactic, and semantic. In the monolingual task, our system achieve mean Pearson correlation 75.7% compared with human annotators. In the cross-lingual task, our system has correlation 86.3% and is ranked first among 26 systems. 1 Introduction Semantic Textual Similarity (STS) is one of the core disciplines in Natural Language Processing (NLP). Assume we have two textual fragments (word phrases, sentences, paragraphs, or full doc- uments), the goal is to estimate the degree of their semantic similarity. STS systems are usually compared with the man- ually annotated data. In the case of SemEval the data consist of pairs of sentences with a score between 0 and 5 (higher number means higher semantic simi- larity). For example, English pair Two dogs play in the grass. Two dogs playing in the snow. has a score 2.8, i.e. the sentences are not equivalent, but share some information. This year, SemEval’s STS is extended with the Spanish-English cross-lingual subtask, where e.g. the pair Tuve el mismo problema que t´u. I had the same problem. has a score 4.8, which means nearly equivalent. Each year STS belongs to one of the most pop- ular tasks at SemEval competition. The best STS system at SemEval 2012 (B¨ar et al., 2012) used lexi- cal similarity and Explicit Semantic Analysis (ESA) (Gabrilovich and Markovitch, 2007). In SemEval 2013, the best model (Han et al., 2013) used se- mantic models such as Latent Semantic Analysis (LSA) (Deerwester et al., 1990), external informa- tion sources (WordNet) and n-gram matching tech- niques. For SemEval 2014 and 2015 the best sys- tem comes from (Sultan et al., 2014a; Sultan et al., 2014b; Sultan et al., 2015). They introduced new algorithm, which align the words between two sen- tences. They showed that this approach can be effi- ciently used also for STS. Overview of systems par- ticipating in previous SemEval competitions can be found in (Agirre et al., 2012; Agirre et al., 2013; Agirre et al., 2014; Agirre et al., 2015). The best performing systems from previous years are based on various architectures benefiting from lexical, syntactic, and semantic information. In this work we try to use the best techniques presented dur- ing last years, enhance them, and combine into a sin- gle model. 2 Semantic Textual Similarity This section describes various techniques for esti- mating the text similarity.
  • 2. 2.1 Lexical and Syntactic Similarity This section presents the techniques exploiting lexi- cal and syntactic information in the text. Some of them have been successfully used by (B¨ar et al., 2012). Many of the following techniques benefit from the weighing of words in a sentence using Term Frequency - Inverse Document Frequency (TF-IDF) (Manning and Sch¨utze, 1999). • Lemma n-gram overlaps: We compare word n-grams in both sentences using Jaccard Simi- larity Coefficient (JSC) (Manning and Sch¨utze, 1999). We do it separately for different or- ders n ∈ {1, 2, 3, 4}. Containment Coefficient (Broder, 1997) is used for orders n ∈ {1, 2}. We extend original metrics by weighing of n- grams. We define this weight as a sum of IDF values of words in n-gram. N-gram match is not counted as 1 but as the weight of this n-gram. According to our experiments, this weighing significantly improves performance. We also use information about the length of Longest Common Subsequence compared to the length of the sentences. • POS n-gram overlaps: In similar way as for lemmas, we calculate Jaccard Similarity Co- efficient and Containment Coefficient for n- grams of part-of-speech (POS) tags. Again, we use n-gram weighing and n ∈ {1, 2, 3, 4}. These features exploit syntactic similarity of the sentences. • Character n-gram overlaps: Similarly to lemma or POS n-grams, we use Jaccard Sim- ilarity Coefficient and Containment Coefficient for comparing common substrings in both sen- tences. Here the IDF weights are computed on character n-gram level. We use n-gram weigh- ing and n ∈ {2, 3, 4, 5}. We enrich these features also by Greedy String Tiling(Wise, 1996) allowing to deal with re- ordered text parts and by Longest Common Substring (LCS) measuring the ration between LCS and length of the sentences. • TF-IDF: For each word in a sentence we calcu- late TF-IDF. Given the word vocabulary V , the sentence is represented as a vector of dimen- sion |V | with TF-IDF values of words present in the sentence. The similarity between two sentences is expressed as cosine similarity be- tween corresponding TF-IDF vectors. 2.2 Semantic Similarity In this section we describe in detail the techniques that are more semantically oriented and are based on Distributional Hypothesis. This principle states that we can induce (to some degree) the mean- ing of words from their distribution in the text. This claim has multiple theoretical roots in psychol- ogy, structural linguistics, or lexicography (Firth, 1957; Rubenstein and Goodenough, 1965; Miller and Charles, 1991). • Semantic composition: This approach is based on Frege’s principle of compositional- ity, which states that the meaning of a com- plex expression is determined as a compo- sition of its parts, i.e. words. To repre- sent the meaning of a sentence we use sim- ple linear combination of word vectors, where weights are represented by the TF-IDF values of appropriate words. We use state-of-the-art word embedding methods, namely Continuous Bag of Words (CBOW) (Mikolov et al., 2013) and Global Vectors (GloVe) (Pennington et al., 2014). We use cosine similarity to compare vectors. • Paragraph2Vec: Paragraph vectors were pro- posed in (Le and Mikolov, 2014) as an unsuper- vised method of learning text representation. Resulting feature vector has fixed dimension while the input text can be of any length. The paragraph vectors and word vectors are con- catenated to predict the next word in a context. The paragraph token acts as a memory that re- members what information is missing from the current context. We use cosine similarity for comparing two paragraph vectors. • Tree LSTM: Long Short-Term Memory (LSTM) is a type of Recurrent Neural Network (RNN) with a complex computational unit. We use tree-structured representation of LSTM presented in (Tai et al., 2015). Tree model
  • 3. represents the sentence structure. RNN pro- cesses input sentences of variable length via recursive application of a transition function on a hidden state vector ht. For each sentence pair it creates sentence representations hL and hR using Tree-LSTM model. Given these representations, model predicts the similarity score using a neural network considering distance and angle between vectors. • Word alignment: Method presented in (Sul- tan et al., 2014a; Sultan et al., 2014b; Sultan et al., 2015) has been very successful in last years. Given two sentence we want to compare, this method finds and aligns the words that have similar meaning and similar role in these sen- tences. Unlike the original method, we assume that not all word alignments have the same importance for the meaning of the sentences. The weight of a set of words A is a sum of word’s IDF val- ues ω(A) = w∈A IDF (w), where w is a word. Then the sentence similarity is given by sim (S1, S2) = ω(A1) + ω(A2) ω(S1) + ω(S2) , (1) where S1 and S2 are input sentences (repre- sented as sets of words). A1 and A2 denote the sets of aligned words for S1 and S2, respec- tively. The weighing of alignments improves our results significantly. 2.3 Similarity Combination The combination of STS techniques is in fact a re- gression problem where the goal is to find the map- ping from input space xi ∈ Rd of d-dimensional real-valued vectors (each value xi,a, where 1 ≤ a ≤ d represents the single STS technique) to an output space yi ∈ R of real-valued targets (desired seman- tic similarity). These mapping are learned from the training data {xi, yi}N i=1 of size N. There exist a lot of regression methods. We experiment with several of them: • Linear Regression: Linear Regression (LR) is probably the simplest regression method. It is defined as yi = λxi, where λ is a vector of weights that can be estimated for example by the least squares method. • Gaussian processes regression: Gaussian process regression (GPR) is nonparametric kernel-based probabilistic model for non-linear regression (Rasmussen and Williams, 2005). • SVM Regression: We use Support Vector Machines (SVM) for regression with the ra- dial basis functions (RBF) as a kernel. We use improved Sequential Minimal Optimiza- tion (SMO) algorithm for parameter estimation introduce in (Shevade et al., 2000). • Decision Trees Regression: The output of the Decision Trees Regression (DTR) (Breiman et al., 1984) is predicted by the sequence of deci- sions organized in a tree. • Perceptron Regression: Multilayer Percep- tron (MLP) is feed-forward artificial neural net- work that uses back-propagation to classify in- stances. 3 System Description This section describes the settings of our final STS system. For monolingual STS task we submitted two runs. First is based on supervised learning and the second is unsupervised system: • UWB sup: Supervised system based on SVM regression with RBF kernel. We use all tech- niques described in 2 as features for regression. During the regression we also use the simple trick. We create another features represented as a product of each pair of features xi,a × xi,b for a = b. We do so to better model the depen- dencies between single features. Together, we have 301 STS features. The system is trained on all SemEval datasets from prior years (see Table 1). • UWB unsup: Unsupervised system based only on weighted word alignment (Section 2.2). We handled with the cross-lingual STS task with Spanish-English bilingual sentence pairs in two steps. Firstly, we translated Spanish sentences to En- glish via Google translator. The English sentences
  • 4. Corpora Pairs SemEval 2012 Train 2,234 SemEval 2012 Test 3,108 SemEval 2013 Test 1,500 SemEval 2014 Test 3,750 SemEval 2015 Test 3,000 Table 1: STS gold data from prior years. News Multi- Mean RR TR Source UWB sup 0.9062 0.8190 0.8631 1 1 UWB unsup 0.9124 0.8082 0.8609 2 1 Table 4: Pearson correlations on cross-lingual STS task of SemEval 2016. RR denote the run (system) ranking and TR denote our team ranking. were left untouched. Secondly, we used the same STS systems as for monolingual task. For preprocessing pipeline we used Standford CoreNLP library (Manning et al., 2014), i.e. for to- kenization, lemmatization and POS tagging. Most of our STS techniques (apart from word alignment and POS n-gram overlaps) work with lemmas in- stead of word forms (this leads to slightly better performance). Some of our STS techniques are based on unsupervised learning and thus they need large unannotated corpora to train. We trained Para- graph2Vec, GloVe and CBOW models on One bil- lion word benchmark presented in (Chelba et al., 2014). Dimension of vectors for all these models was set to 300. TF-IDF values were also estimated on this corpus. All regression methods mentioned in Section 2.3 are implemented in WEKA (Hall et al., 2009). 4 Results and Discussion This section presents the results of our systems for both English monolingual and Spanish-English cross-lingual STS task of SemEval 2016. In addi- tion we present detailed results on the test data from SemEval 2015. As an evaluation measure we use Pearson correlation between system output and hu- man annotations. In the tables we present the correlation for each individual test set. Column Mean represents the weighted sum of all correlations, where the weight are given by the ratio of data set length compared to the full length of all datasets together. The mean value of Pearson correlations is also used as the main evaluation measure for ranking the system submis- sions. In the Table 2 we show the results for the test data from 2015. We trained our systems on SemEval STS data from years 2012–2014. We provide compari- son of individual STS techniques as well as of dif- ferent types of regressions. Clearly, the SVM regres- sion and Gaussian processes regression perform best and with our feature set it is 1% better than the win- ning system of SemEval 2015. The best performing single technique is indisputably the weighed word alignment correlated by 79.6% with gold data. Note that without weighing, we achieved only 74.2% on this data. The original result from authors of this approach was, however, 79.2%. This is probably caused by some inaccuracies in our implementation. Anyway, the weighing improves the correlation even if we compare it to the original results. Note that for estimating regression parameters we use the data from all years apart from 2015 (see Table 1). The results for monolingual STS task of SemEval 2016 are shown in Table 3. In the time of writing this paper the ranks of submitted systems were not known. Thus we present only our correlations. We can see that our supervised system (SVM regres- sion) performs approximately 3% better than the un- supervised one (weighed word alignment). On the data from SemEval 2015 this difference was not so significant (approximately 1.5%). Finally, the results for cross-lingual STS task of SemEval 2016 are shown in Table 4. We achieved very high correlations. To be honest we must say that we expected much lower correlation through the fact that we use the machine translation via Google translator causing certainly some inaccuracies (at least in the syntax of the sentence). On the other hand, it proves that our model efficiently generalizes the learned patterns. Here, there is almost no differ- ence in performance between supervised and unsu- pervised version of submitted systems. Our submit- ted runs finished first and second among 26 compet- ing systems.
  • 5. Model Corpora Answers- Answers- Belief Headlines Images Mean forums students Winner of SemEval 2015 0.7390 0.7725 0.7491 0.8250 0.8644 0.8015 Linear regression – all lexical 0.7053 0.7656 0.7190 0.7887 0.8246 0.7728 Linear regression – all syntactic 0.3089 0.3165 0.4570 0.2900 0.1862 0.2939 Tf-idf 0.5629 0.6043 0.6762 0.6603 0.7530 0.6593 Tree LSTM 0.4181 0.5490 0.5863 0.7324 0.8168 0.6501 Paragraph2Vec 0.5228 0.7017 0.6643 0.6562 0.7385 0.6725 CBOW composition 0.6216 0.6846 0.7258 0.6927 0.7831 0.7085 GloVe composition 0.5820 0.6311 0.7164 0.6969 0.7972 0.6936 Weighted word alignment 0.7171 0.7752 0.7632 0.8179 0.8525 0.7964 Linear regression 0.7411 0.7589 0.7739 0.8193 0.8568 0.7982 Gaussian processes regression 0.7363 0.7701 0.7846 0.8393 0.8749 0.8112 Decision trees regression 0.6700 0.6991 0.7281 0.7792 0.8206 0.7495 Perceptron regression 0.7060 0.7481 0.7467 0.8093 0.8594 0.7858 SVM regression 0.7375 0.7678 0.7846 0.8398 0.8776 0.8116 Table 2: Pearson correlations on SemEval 2015 evaluation data and comparison with the best performing system in this year. Model Corpora Answer- Headlines Plagiarism Postediting Question- Mean answer question UWB sup 0.6215 0.8189 0.8236 0.8209 0.7020 0.7573 UWB unsup 0.6444 0.7935 0.8274 0.8121 0.5338 0.7262 Table 3: Pearson correlations on monolingual STS task of SemEval 2016. 5 Conclusion In this paper we described our UWB system par- ticipating in SemEval 2016 competition in the task of Semantic Textual Similarity. We participated on both monolingual and cross-lingual parts of compe- tition. Our best results have been achieved by SVM re- gression of various STS techniques based on lexical, syntactic, and semantic information. This approach has been shown to work well for both subtasks. Acknowledgments This publication was supported by the project LO1506 of the Czech Ministry of Education, Youth and Sports and by Grant No. SGS-2016-018 Data and Software Engineering for Advanced Applica- tions. Computational resources were provided by the CESNET LM2015042 and the CERIT Scientific Cloud LM2015085, provided under the programme ”Projects of Large Research, Development, and In- novations Infrastructures”. References Eneko Agirre, Mona Diab, Daniel Cer, and Aitor Gonzalez-Agirre. 2012. Semeval-2012 task 6: A pilot on semantic textual similarity. In Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the Main Con- ference and the Shared Task, and Volume 2: Proceed- ings of the Sixth International Workshop on Seman- tic Evaluation, SemEval ’12, pages 385–393, Strouds- burg, PA, USA. Association for Computational Lin- guistics. Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez- Agirre, and Weiwei Guo. 2013. *sem 2013 shared task: Semantic textual similarity. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Confer- ence and the Shared Task: Semantic Textual Similarity, pages 32–43, Atlanta, Georgia, USA, June. Associa- tion for Computational Linguistics. Eneko Agirre, Carmen Banea, Claire Cardie, Daniel
  • 6. Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2014. Semeval-2014 task 10: Multilingual seman- tic textual similarity. In Proceedings of the 8th Inter- national Workshop on Semantic Evaluation (SemEval 2014), pages 81–91, Dublin, Ireland, August. Associ- ation for Computational Linguistics and Dublin City University. Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Inigo Lopez-Gazpio, Montse Maritxalar, Rada Mihal- cea, German Rigau, Larraitz Uria, and Janyce Wiebe. 2015. Semeval-2015 task 2: Semantic textual similar- ity, english, spanish and pilot on interpretability. In Proceedings of the 9th International Workshop on Se- mantic Evaluation (SemEval 2015), pages 252–263, Denver, Colorado, June. Association for Computa- tional Linguistics. Daniel B¨ar, Chris Biemann, Iryna Gurevych, and Torsten Zesch. 2012. Ukp: Computing semantic textual sim- ilarity by combining multiple content similarity mea- sures. In Proceedings of the 6th International Work- shop on Semantic Evaluation, held in conjunction with the 1st Joint Conference on Lexical and Computa- tional Semantics, pages 435–440, Montreal, Canada, June. Leo Breiman, Jerome Friedman, Charles J. Stone, and R.A. Olshen. 1984. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA. Andrei Z. Broder. 1997. On the resemblance and con- tainment of documents. In SEQUENCES ’97 Proceed- ings of the Compression and Complexity of Sequences, pages 21–29, Jun. Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, and Tony Robin- son. 2014. One billion word benchmark for mea- suring progress in statistical language modeling. In Proceedings of the 15th Annual Conference of the In- ternational Speech Communication Association, pages 2635–2639, Singapore, September. Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, pages 391–407. John R. Firth. 1957. A Synopsis of Linguistic Theory, 1930-1955. Studies in Linguistic Analysis, pages 1– 32. Evgeniy Gabrilovich and Shaul Markovitch. 2007. Com- puting semantic relatedness using wikipedia-based ex- plicit semantic analysis. In Proceedings of the 20th International Joint Conference on Artifical Intelli- gence, IJCAI’07, pages 1606–1611, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The weka data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1):10–18, November. Lushan Han, Abhay L. Kashyap, Tim Finin, James May- field, and Jonathan Weese. 2013. Umbc ebiquity- core: Semantic textual similarity systems. In Second Joint Conference on Lexical and Computational Se- mantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, pages 44–52, Atlanta, Georgia, USA, June. Association for Computational Linguistics. Quoc V. Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Pro- ceedings of the 31th International Conference on Ma- chine Learning, ICML 2014, Beijing, China, 21-26 June 2014, pages 1188–1196. Christopher D. Manning and Hinrich Sch¨utze. 1999. Foundations of Statistical Natural Language Process- ing. MIT Press, Cambridge, MA, USA. Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language pro- cessing toolkit. In Association for Computational Lin- guistics (ACL) System Demonstrations, pages 55–60. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word represen- tations in vector space. George A. Miller and Walter G. Charles. 1991. Contex- tual correlates of semantic similarity. Language and Cognitive Processes, 6(1):1–28. Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word rep- resentation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Process- ing (EMNLP), pages 1532–1543. Carl Edward Rasmussen and Christopher K. I. Williams. 2005. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press. Herbert Rubenstein and John B. Goodenough. 1965. Contextual correlates of synonymy. Communications of the ACM, 8(10):627–633, October. Shirish K. Shevade, Sathiya S. Keerthi, Chiru Bhat- tacharyya, and K.R.K. Murthy. 2000. Improvements to the smo algorithm for svm regression. IEEE Trans- actions on Neural Networks, 11(5):1188–1193, Sep. Md Sultan, Steven Bethard, and Tamara Sumner. 2014a. Back to basics for monolingual alignment: Exploit- ing word similarity and contextual evidence. Transac- tions of the Association for Computational Linguistics, 2:219–230.
  • 7. Md Arafat Sultan, Steven Bethard, and Tamara Sum- ner. 2014b. Dls@cu: Sentence similarity from word alignment. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 241–246, Dublin, Ireland, August. Association for Computational Linguistics and Dublin City Uni- versity. Md Arafat Sultan, Steven Bethard, and Tamara Sumner. 2015. Dls@cu: Sentence similarity from word align- ment and semantic vector composition. In Proceed- ings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 148–153, Denver, Colorado, June. Association for Computational Lin- guistics. Kai Sheng Tai, Richard Socher, and Christopher D. Man- ning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Associ- ation for Computational Linguistics and the 7th Inter- national Joint Conference on Natural Language Pro- cessing (Volume 1: Long Papers), pages 1556–1566, Beijing, China, July. Association for Computational Linguistics. Michael J. Wise. 1996. Yap3: Improved detection of similarities in computer program and other texts. In Proceedings of the Twenty-seventh SIGCSE Technical Symposium on Computer Science Education, SIGCSE ’96, pages 130–134, New York, NY, USA. ACM.