ISSN ONLINE(2319-8753)PRINT(2347-6710)

All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

Sentiment Classification in Multi-Domain adaptation Using Sentiment Sensitive Thesaurus

Nancy S1, Uma maheswaran S2,Askerunisa A3
  1. P.G Scholar, Vickram College of Engineering, Enathi India
  2. Assistant Professor, Vickram College of Engineering, Enathi India
  3. Head of the Department, Vickram College of Engineering, Enathi, India
Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Innovative Research in Science, Engineering and Technology

Abstract

Sentiment Analysis is an emerging field in Natural Language Processing (NLP) with very interesting application such as opinion mining, Opinion Summarization, Market Analysis. Sentiment Classifier trained for a single domain when used to classify reviews on a different domain results in poor performance. To overcome the problem in single domain classification, Sentiment Sensitive distributional thesaurus is created using unlabeled data for both source and target domains. A Binary Classifier was constructed using reviews to classify the SST as a positive and negative review and by using the created thesaurus the feature vectors are expanded during train and test times using Cosine Similarity and Point wise Mutual Information (PMI).The Performance of Multi-Domain and Single Domain Sentiment classification were compared and the results show that the Multi-Domain Adaptation outperforms numerous baselines.

Index Terms

Opinion Mining, Natural Language Processing (NLP), Cosine Similarity, Point wise Mutual Information (PMI), Domain Adaptation

INTRODUCTION

As an emerging communication platform, web 2.0 leads the Internet to be more and more user interactive .People can express and share their opinions and concerns in the cyberspace. Opinions are important because decision can be made by hear others opinions. One can express opinions on anything in reviews, Forums discussion groups, blogs….Sentiment Classification is an important tasks in various applications such as Opinion Mining, Opinion Summarization and Contextual Advertisement. Sentiment Analysis has been used to help political strategies gauge public opinion on the Internet as Yahoo News shows (Weber).In an Opinion Summarization System, it is useful to first classify all reviews into positive and negative sentiment and then create a summary for each sentiment for a particular domain. Sentiment is expressed differently in different domains, and it is costly to collect corpus for each new domain in which we would like to apply a sentiment classifier. For example, in the movie domain the words “entertainment” and “enjoyable” are express positive sentiment. On the other side, If we consider the book domain the words “exciting” and “thriller” are express positive sentiment. A Classifier trained on one domain might not perform well on a different domain because it fails to learn the sentiment of the unseen words. A Multi- Domain Sentiment Classification focuses on the challenge of training a classifier for one domain and applying the trained classifier for a different domains using measure the relatedness between the source and target domains. The higher performance was achieved when use in multiple domains than used those domains individually.

II. LITERARTURE SURVEY

Chihli Hung et al.[23] had proposed word-of-mouth sentiment classification, It reevaluates the objective sentiment words in the SentiWordNet sentiment lexicon and improves the performance. Bollegala D et al.[3] had proposed a novel pattern extraction algorithm and a pattern clustering algorithm identify the numerous semantic relations that exist between two given words. Chenghua Lin et al.[4] have proposed a novel probabilistic modelling framework called joint sentimenttopic (JST) model based on latent Dirichlet allocation (LDA), which detects sentiment and topic simultaneously from text.
Danushka Bollegala et al.[9] has used a Domain Adaptation method for different domains by using Point wise Mutual Information(PMI). Alexandre Trilla et al [5] Evaluated TTS Scenario with different combinations of textual features and classifiers to determine the most appropriate adaptation procedure. Erik Cambria et al. [16] They proposed a International Survey of Emotion Antecedents and Reactions (ISEAR, an emotion related dataset) and ISEAR distance-based measures, including point-wise mutual information, and emotional affinity.
Xueke et al. [12] had used a novel generative topic model, the Joint Aspect/Sentiment (JAS) model, to jointly extract aspects and aspect-dependent sentiment lexicons from online customer reviews. Rui Xia et al.[13] had introduced the feature ensemble plus sample selection (SS-FE) approach to learn a new labeling function in a feature reweighting manner.
Malandrakis N et al [22] presented an affective text analysis model that can directly estimate and combine affective ratings of multi-word terms. Krcadinac U et al. [20] had proposed recognition approach works at the sentence level and uses the standard Ekman emotion classification. Chien-Liang Liu et al.[11] a novel approach based on latent semantic analysis (LSA) to identify product features. The rating and review-summarization system can be extended to other product-review domains easily.
Xiaohui Yu et al. [19] had used Sentiment PLSA (SPLSA), in which a review is considered as a document generated by a number of hidden sentiment factors, in order to capture the complex nature of sentiments.

III. PROPOSED ARCHITECHTURE

Sentiment Classifier trained using labeled data for a particular domain to classify sentiment of user reviews on a different domain often results in poor performance and Corpora was developed for each possible domain of interest is costly. Multi-Domain Adaptation system was proposed to overcome the limitation in the existing system. Thesaurus was created that is sensitive to the sentiment of words expressed in different domains and Feature vectors can be expanded at train and test times in a binary classifier (Feature Expansion) by using the created thesaurus. Sentiment Classification performance was improved by reducing the execution time of proposed method against single domain. Corpora were not needed to be developed for each domain due to which the cost gets reduced. The model of Sentiment Classification in Multi- Domain Adaptation system consists of Sentiment Sensitive Thesaurus, Binary classifier, Feature Expansion and Evaluation process. Fig 1 shows the block diagram of the proposed Multi-Domain Sentiment Classification with its major processing steps. Sentiment Sensitive Thesaurus has been created from a Unlabeled Reviews was taken from Amazon Reviews.
Unigrams and bigrams were selected for trained a Sentiment classifier. Binary Classifier was used to classify the reviews and convert the Unlabeled Review into Labeled Reviews. Labeled Reviews for the four domains was selected to expand the feature vectors in a Feature Expansion using PMI and Cosine Similarity.The relatedness measure has been important decision in a thesauri-based approach. The relatedness measure on the performance of the proposed method, from the four sentiment sensitive thesauri using No Adaptation was evaluated.
Feature expansion for all four sentiment sensitive thesauri with the Domain Adaptation was conducted using numerous relatedness measures. Expansion candidates have higher relatedness in Domain Adaptation than the No Adaptation. No Adaptation and Domain Adaptation was compared with the results of Sentiment Similarity and Document Similarity. The accuracy that obtain from multi source domains is always greater than the accuracy from used those domains individually.
Sentiment Classification performance was improved by reducing the execution time of proposed method against single source domain. Performance of Sentiment classification has been improved. Corpora were not need to develop for each multiple domains of interest leads to reduce the cost. The review has a 800 positive and 800 negative sentiment elements for a single domain it can be reduced to 400 positive and 400 negative labeled sentiment elements when using multiple domains

IV. MODULE DESCRIPTION

A. SST Creation
Sentiment Sensitive Thesaurus has been created from a Unlabeled reviews. Unlabeled Reviews was taken from Amazon Reviews. Unlabeled Review has taken for the four domains such as Book, Movie, Electronics, Kitchen Appliances. Unlabeled review has splitted into individual words as unigrams and bigrams. The data preparation step performs necessary data preprocessing and cleaning on the dataset for reducing the dataset size. Preprocessing steps include Removing bag of words, Stemming, Stop words removal.
image
B. Binary Classifier
Binary Classification is the task of classifying the members of given set of objects into two groups on the basis of whether they have some property or not. The values of the positive score and the negative score (both ranging from 0 to 1) of each remaining word has been compared. If the positive score is greater than the negative score, value 1 has been labeled in the SST and declared as positive review. Otherwise, polarity value 0 has been labeled in the SST and declared as negative review.
C. Feature Expansion
Classifier has been trained using labeled data from the source domains and it cannot readily used to classify test domains. Feature Expansion method has augmented a feature vector with additional related features selected from the SST.Point wise mutual information(PMI) is known to be biased toward infrequent elements and features. It accurately captures words that express similar sentiments. Point wise Mutual Information (PMI) has used to expand the feature vectors. Statistical measures denoted by f(u,w) has been used to capture the interdependence between two words by a equation 1.
D. Performance Analysis
The relatedness measure on the performance of the proposed method, from the four sentiment sensitive thesauri using No Adaptation was evaluated and domain adaptation was conducted using numerous relatedness measure. Expansion candidates in domain adaptation has higher accuracy than No Adaptation by using the equation 2
image
image
 

V IMPLEMENTATION

Where, f (u,w)-Point wise Mutual Information between a sentiment element u and a feature w. c(u, w) number of review sentences in which a sentiment element u and a feature w co-occur. Sentiment sensitivity was achieved in the thesaurus by incorporating document level sentiment labels in the context vectors used as the basis for measuring the distributional similarity between words.
Cosine Similarity is widely used as a measure of relatedness among the documents in numerous tasks in sentiment classification. In this Section, datasets and experimental results are discussed in detail.
A. Experimental Setup
a) Hardware Intel(R) Core (2) DUO CPU T5550 512MB RAM; 150GB disk space was used. b) Software
Multi-Domain Adaptation System was implemented using Python 2.7 with nltk 3.0(Natural Language Toolkit) c) Datasets Unlabeled Review has selected for the four domains from the Amazon.com. Book Review can be taken for Story Books in Amazon.com, Movie Review can be taken for the Comic Movies in Amazon.com, Electronics Review is from Pyrus Electronics 4gb Mp3 / mp4 /mp5 Player with 2.8 Inch Touch Screen and All Stainless Steel Casing (Electronics) Kitchen Appliances review is from Zojirushi ECBD15BA Fresh Brew Thermal Carafe Coffee Maker (Kitchen).
image
TABLE1 and TABLEII describes sentiment similarity between the source domain and target domain
Book Elecronics Kitchen Appliances
Considering Electronics and Book reviews has the cosine similarity values are 32.95 and 50.58% respectively when considered as single domain but combining the two reviews(Book+ Electronics) has the cosine similarity value is 68.92% with an increase of 30 percent approximately.

VI .CONCLUSION AND FUTURE WORK

Sentiment Sensitivity was achieved in the thesaurus by incorporating document level sentiment classification in the context vectors used as the basis for measuring the distributional similarity between words. The performance of Multi-Domain and the single domain was compared with the results based on numerous similarity measures has been improved in the domain adaptation. The accuracy that obtain from Multiple domains was always greater than the accuracy from used those domains individually. In a future work, More combination of domains can be considered and their relatedness measures can be analyzed. Other types of relatedness measures like Lin’s similarity can be calculated and analyzed.

References

  1. BaoShenghuaXu, Shengliang Zhang LiYan, Rong “Mining Social Emotions from Affective Text” IEEE Transactions on Knowledge and Data Engineering Volume:24 Issue: 9 Publication Year: 2011 , Page(s): 1658 – 1670
  2. Ghose, A.,Ipeirotis, P.G. “Estimating the Helpfulness and Economic Impact of product Reviews:Mining Text and Reviewer Characteristics”IEEE Transactions on Knowledge and Data Engineering Volume: 23 Issue: 10 Publication Year: 2011 Page(s):1498-1512
  3. Bollegala D, Matsuo, Y.,Ishizuka M. “A Web Search Engine-Based Approach to Measure Semantic Similarity between Words” IEEE Transactions on Knowledge and Data Engineering Volume: 23 ,Issue: 7 Publication Year: 2011,Page(s): 977 -990
  4. ChenghuaLin,Yulan He, Everson R.,Ruger S “Weakly Supervised Joint Sentiment- Topic Detection from Text” IEEE Transactions on Knowledge and Data Engineering,Volume:24,Issue:6, Publication Year:2012, Page(s):1134-1145
  5. AlexandreTrilla and FrancescAlías “Sentence-Based Sentiment Analysis for Expressive Text-to-Speech” IEEE Transactions on Audio, Speech, and Language Processing, Volume.21, NO. 2, FEBRUARY 2013,Pages: 223-233
  6. Xuan-HieuPhan ,Cam Nguyen ,Dieu-Thu Le ,Le-Minh Nguyen Horiguchi “A Hidden Topic-Based Framework toward Building Applications with Short Web Documents IEEE Transactions on Knowledge and Data Engineering, Volume: 23,Issue: 7 Publication Year: 2011 , Page(s): 961-976
  7. FuzhenZhuang ,PingLuo,ZhiyongShen, Qing He “Mining Distinction and Commonality across Multiple Domains Using Generative Model for Text Classification” IEEETransactionsonKnowledgeandDataEngineering,Volume :24,Issue:11,PublicationYear:2012 , Page(s): 2025 - 2039
  8. Chen Bo, Lam Wai ,Tsang Ivor , Wong Tak-Lam “Discovering Low-Rank Shared Concept Space for Adapting Text Mining Models” IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume: 35,Issue: 6 Publication Year: 2013, Page(s): 1284 – 1297
  9. DanushkaBollegala, David Weir, and John Carroll “Cross-Domain Sentiment Classification Using a Sentiment Sensitive Thesaurus” IEEE Transactions On Knowledge and Data Engineering, VOL. 25, NO. 8, AUGUST 2013
  10. NikolaosMalandrakis,AlexandrosPotamianos,EliasIosif, and Shrikanth Narayanan,”Distributional Semantic Models for Affective Text Analysis” IEEE Transactions on Audio, Speech,and Language Processing, VOL. 21, NO. 11, NOVEMBER 2013 23-29
  11. Chien-Liang Liu, Wen-Hoar Hsaio, Chia-Hoang Gen-chi, and Emery Jou “Movie Rating and Summarization Mobile Environment TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS REVIEWS, VOL. 42, NO. 3, MAY 2012
  12. XuXueke,ChengXueqi,TanSongbo, Liu Yue,Shen Huawei “Aspect-level opinion mining of online customer reviews” IEEE Transactions on Intelligent Systems Volume: 10, Issue: 3 Publication Year: 2013 Page(s): 25 – 41
  13. Rui Xia, ChengqingZong, Xuelei Hu Cambria, E. “Feature Ensemble Plus Sample Selection: Domain Adaptation for SentimentClassification”IEEE Transactions on Intelligent SystemsVolume:28Issue:3 Publication Year: 2013 , Page(s): 10 – 18
  14. Erik Cambri, jörnSchuller, Bing Liu, Haixun Wang, Catherine Havasi “Statistical Approaches to Concept-Level Sentiment Analysis” IEEE Transactions on Intelligent Systems PublicationYear:2013 Page(s):12-15
  15. Krcadinac U, Pasquier P, Jovanovich J, Davidic V. “Synesketch: An Open Source Library for Sentence-Based Emotion Recognition” IEEE Transactions on Affective Computing, Volume: 4 , Issue:3 Publication Year: 2013 , Page(s): 312 – 325
  16. Erik Cambria, BjörnSchuller,Bing Liu, Haixun Wang, Catherine Havasi, “Knowledge-Based Approaches to Concept-Level Sentiment Analysis”IEEE Transactions on Intelligent Systems PublicationYear:2013 Page(s):12-15 Yan Dang, Yulei Zhang, and Hsinchun Chen “A Lexicon-EnhancedMethod for Sentiment Classification:AnExperiment on Online Product Reviews” IEEE Transaction.