An enhanced l esk word sense disambiguation algorithm through a distributional semantic model. Banerjee and pedersen 1 began this line of research by adapting the lesk algorithm 2 for word sense disambiguation to wordnet. The lesk algorithm is a classical algorithm for word sense disambiguation introduced by michael e. Word sense disambiguation wsd is an important and challenging task for natural language. Performs the classic lesk algorithm for word sense disambiguation wsd using a the definitions of the ambiguous word. It is the essence of communication in natural language processing. This paper presents an adaptation of lesks dictionarybased word sense disambiguation algorithm.
The solution to this problem impacts other computerrelated writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference the human brain is quite proficient at wordsense disambiguation. A version of lesk algorithm in combination with wordnet has recently been reported for achieving good word sense disambiguation results ramakrishnan, prithviraj, bhattacharyya 2004. If we replace the word motorcar in 1 with automobile, to get 2, the meaning of the sentence stays pretty much the same. We used the national library of medicines wsd nlm wsd and msh wsd datasets to evaluate the adapted lesk algorithm. Evaluations of lesk algorithm initial evaluation by m. Wsd method that uses word and sense embeddings to compute the similarity between the gloss of a sense and the context of the word. A comparative study of svm and new lesk algorithm for. The evaluation performed on semeval20 multilingual word sense disambiguation shows that our algorithm goes beyond the most frequent sense baseline and the simplified version of. Pdf an enhanced lesk word sense disambiguation algorithm. Ive read similar questions like word sense disambiguation in nltk python but they give nothing but a reference to a nltk book, which is not very into wsd problem. This software implements a word sense disambiguation algorithm based on the simple lesk approach integrating distributional semantics to compute the overlap between glosses.
Banerjee and pedersen in 2002 adapted the original lesk algorithm using the information from wordnet and found an overall accuracy of 32% which was double of the original lesk algorithm. Word sense disambiguation using wordnet and the lesk algorithm. Webbased variant of the lesk approach to word sense. Details about the algorithm are published in the following paper. Lesks algorithm disambiguates a target word by selecting the sense whose dictionary gloss shares the largest number of words with the glosses of neighboring words. It disambiguates through the intersection of a set of dictionary definitions senses and a set of words extracted of the current context window. To address this problem, a simplified version of this algorithm was proposed, where the sense of the ambiguous word is selected by. This paper describes a new word sense disambiguation wsd algorithm which extends two wellknown variations of the lesk wsd method. Introduction in hindi language a single word has different meaning. Wsd simply finds the correct sense of a given word. Given a word and its context, lesk algorithm exploits the idea.
Lesk 5070% on short samples of text manually annotated set, with respect to oxford advanced learners dictionary set of senses are coarsegrained senseval conferences have shared tasks involving data for word sense disambiguation. Rather than using a standard dictionary as the source of glosses for our approach, the lexical database wordnet is employed. This paper generalizes the adapted lesk algorithm of banerjee and pedersen 2002 to a method of word sense disambiguation based on semantic relatedness. Pierpaolo basile, annalina caputo, giovanni semeraro. This work shows some improvements for increasing this. An adapted lesk algorithm for word sense disambiguation using hindi wordnet submitted in partial fulfillment for the requirement of the award of degree of master of science in computer science from assam university silchar submitted by arpita mitra mazumder exam roll.
Ted pedersen department of computer science university of minnesota duluth, minnesota 55812 u. The sense definition chosen as correct is the one that has the largest number of words in common with the definitions of the surrounding words. The accuracy of their algorithm was found in the range from 40% to 70%. Proceedings of coling 2014, the 25th international conference on computational linguistics. Word sense disambiguation what you should know word senses distinguish different meanings of same word sense inventories annotation issues and annotator agreement kappa definition of word sense disambiguation task an unsupervised approach. An enhanced lesk word sense disambiguation algorithm through a distributional semantic model general info.
Using measures of semantic relatedness for word sense. Unsupervised word sense disambiguation with multilingual. I have heard pos tagging helps to improve efficiency can anyone tell me how to add pos tagging to above lesk code and are there any methods where i can get maximum correctness of a particular sense python nlp nltk wordnet wordsensedisambiguation. It picks that sense of the target word whose definition has the most words in common with the definitions of other words in a given window of content. Id be happy even with a naive implementation like lesk algorithm. Word sense disambiguation wsd, an aicomplete problem, is shown to be able to solve the essential problems of artificial intelligence, and has received increasing attention due to its promising applications in the fields of sentiment analysis, information retrieval, information extraction. Multilingual word sense disambiguation we approach the wsd task using an unsupervised method based on the lesk algorithm lesk, 1986. This algorithm depends on the overlap of the dictionary definitions of the words in a sentence. This paper presents an adaptation of lesk s dictionarybased word sense disambiguation algorithm. Its not quite clear whether there is something in nltk that can help me.
Wsd using random walk algorithms 54% accuracy on semcor corpus which has a baseline accuracy of 37%. An adapted lesk algorithm for word sense disambiguation. It only requires large unlabeled corpora and a sense inventory such. The task of word sense disambiguation consists of associating words in context with the most suitable entry in a predened sense inventory.
In nlp area, ambiguity is recognized as a barrier to human language understanding. We conclude the paper with a discussion in the final section. Comparing similarity measures for original wsd lesk algorithm. Personalizing pagerank for word sense disambiguation. Adapted lesk algorithm based word sense disambiguation. Moving down the long tail of word sense disambiguation with glossinformed biencoders. Maximizing semantic relatedness to perform word sense. The major objective of his idea is to count the number of words that are shared between two glosses. Harmony search algorithm for word sense disambiguation. University of washington 0 share terra blevins, et al.
Download word sense disambiguation pdf books pdfbooks. Word sense disambiguation algorithm in python stack overflow. The principal statistical wsd approaches are supervised and unsupervised learning. Word sense disambiguation using wordnet the concept of sense ambiguity means that a word which has more than one meaning is used in a context and it needs to be clari ed that which sense is actually referred. Algorithm accuracy wsd using selectional restrictions 44% on brown corpus lesksalgorithm 5060% on short samples of pride and prejudice and some news stories. More precisely, for each sense of the word a sense bag is formed using the wordnet definition and the definitions of all the hypernyms associated with the nouns and verbs in the senses definition. Depending on their nature, wsd systems are divided into two main groups. Knowledgebased word sense disambiguation using topic. Word sense disambiguation using cosine similarity collaborates with word2vec and wordnet. The lesk method is an example of unsupervised disambiguation.
Supervised wsd systems are the best performing in public evaluations palmer et al. Many subsequent knowledgebased systems are based on the lesk algorithm. A sentence is considered ambiguous if it contains ambiguous words. However, the simplified lesk algorithm has a low performance. Word sense disambiguation by using simplified and extended lesk algorithm. Details of the suggested algorithm are presented in section iii. Word sense disambiguation for arabic language using the variants of the lesk algorithm conference paper pdf available january 2011 with 1,026 reads how we measure reads. Abstract word sense disambiguation wsd is the task of selecting the meaning of a word based on the context in which the word occurs. Practically, any sentence that has been classified as ambiguous usually has multiple interpretations, but just one of them presents.
Wordnet lesk algorithm preprocessing senses and synonyms consider the sentence in 1. Word sense disambiguation wsd is the concept of identifying which sense of a word is used in a sentence or context. Knowledgebased biomedical word sense disambiguation. This is possible since lesks original algorithm 1986 is based on gloss overlaps which can. In computational linguistics, wordsense disambiguation wsd is an open problem concerned with identifying which sense of a word is used in a sentence. Pdf word sense disambiguation for arabic language using. Word sense disambiguation wsd is the task of determining which sense of an ambiguous word word with multiple meanings is chosen in a particular use of that word, by considering its context.
In 4, authors used hindi wordnet for word sense disambiguation in hindi language. Semeraro, an enhanced lesk word sense disambiguation algorithm through a distributional semantic model, in coling, pp. Semantic relatedness to perform word sense disambiguation is measured by an algorithm. Word sense disambiguation through associative dictionaries. In this approach 24, 25, first of all a short phrase containing an ambiguous word is selected from the sentence. Given an ambiguous word and the context in which the word occurs, lesk returns a synset with the highest number of overlapping words between the context sentence and different definitions from each synset.
Support vector machine, nlp, word sense disambiguation, new lesk approach, comparison 1. In cases where no appropriate umls concept existed. Adapting the lesk algorithm for word sense disambiguation to wordnet by satanjeev banerjee december 2002 submitted in partial ful. In what follows we summarize the current state of these two types of approach. It covers major algorithms, techniques, performance measures, results, philosophical issues and applications. Pdf word sense disambiguation by using simplified and. Moving down the long tail of word sense disambiguation. The evaluation performed on semeval20 multilingual word sense disambiguation shows that our algorithm goes beyond the most frequent sense baseline and the simplified version of the lesk algorithm. An enhanced lesk word sense disambiguation algorithm. It finds its root in the original lesk algorithm which disambiguates a polysemous word. The main disadvantage of the lesk algorithm is its exponential complexity i.
146 940 84 1065 1554 553 1267 463 955 122 1238 294 19 1569 596 1327 1261 1177 1252 166 1188 493 1507 1373 1047 604 891 1264 178 470 88 467