Improving Word Similarity Using PPMIC with Estimates of Word Polysemy
Measuring the semantic similarity between words is an important component in various tasks on the web such as relation extraction, community mining, document clustering, and automatic metadata extraction. But accurately measuring semantic similarity between two words or entities remains a challenging task. Point wise mutual information (PMI) is a widely used word similarity measure and it generates single sense for given word, but it lacks a clear explanation of how it works. PMI differs from distributional similarity, a novel metric is introduced PMImax, that augments PMI with information about a word’s number of senses. PMImax estimates the maximum correlation between two words, i.e., the correlation between their closest senses. The existence system found out the PMImax and also produced an empirical method to estimate semantic similarity using page counts and text snippets retrieved from a web search engine for two words. PMImax can only find synonymous concepts and “siblings” concepts (e.g., “train” and “truck”) but miss the “cousin” concepts. So the proposed system PPMIC (Positive Pointwise Mutual Information Cousins) concept can implement the cousin concept and also generates the top 50 most similar words for the noun. PPMIC has an amazing ability to improve the word similarity with word polysemy.
Nagajothi P, Hemalatha L, Kumari K, Jeevarathinam S
To read the full article Download Full Article