ISSN ONLINE(2320-9801) PRINT (2320-9798)
K. Dhanalakshmi, A.Anitha, G. Michael, K.G.S. Venkatesan
|
Related article at Pubmed, Scholar Google |
Visit for more related articles at International Journal of Innovative Research in Computer and Communication Engineering
Cluster-based recommendation is best thought of as a variant on user-based recommendation. Instead of recommending items to users, items are recommended to clusters of similar users. This entails a pre processing phase, in which all users are partitioned into clusters. Recommendations are then produced for each cluster, such that the recommended items are most interesting to the largest number of users. The upside of this approach is that recommendation is fast at runtime because almost everything is pre computed. In this paper, we describe the problem of recommending conference sessions to attendees and show how novel extensions to traditional modelbased recommender systems, as suggested in Adomavicius and Tuzhilin can address this problem. We introduce Recommendation Engine by Conjoint Decomposition of items and Users (RECONDITUS)-a technique that is an extension of preference-based recommender systems to recommend items from a new disjoint set to users from a new disjoint set.
I. INTRODUCTION |
Wal-Mart claimed to have the largest data warehouse with 500 terabytes storage (equivalent to 50 printed collections of the US Library of Congress). In 2009, eBay storage amounted to eight petabytes (think of 104 years of HD-TV video). Two years later, the Yahoo warehouse totalled 170 petabytes1 (8.5 times of all hard disk drives created in 1995). Since the rise of digitisation, enterprises from various verticals have amassed burgeoning amounts of digital data, capturing trillions of bytes of information about their customers, suppliers and operations. Data volume is also growing exponentially due to the explosion of machine-generated data (data records, web-log files, sensor data) and from growing human engagement within the social networks [2]. The growth of data will never stop. According to the 2011 IDC Digital Universe Study, 130 exabytes of data were created and stored in 2005. The amount grew to 1,227 exabytes in 2010 and is projected to grow at 45.2% to 7,910 exabytes in 2015.3The growth of data constitutes the “Big Data” phenomenon – a technological phenomenon brought about by the rapid rate of data growth and parallel advancements in technology that have given rise to an ecosystem of software and hardware products that are enabling users to analyse this data to produce new and more granular levels of insight [5]. |
II. EXISTING SYSTEM |
The most fundamental challenge for the Big Data applications is to explore the large volumes of data and extract useful information or knowledge for future actions. The basic assumption of user-based CF is that people who agree in the past tend to agree again in the future [6]. |
Different with user-based CF, the item-based CF algorithm recommends a user the items that are similar to what he/she has preferred before In traditional CF algorithms, to compute similarity between every pair of users or services may take too much time, even exceed the processing capability of current RSs. Consequently, service recommendation based on the similar users or similar services would either lose its timeliness or could not be done at all [7]. |
III. PROPOSED SYSTEM |
We proposed a Agglomerative Hierarchal Clustering or Hierarchal Agglomerative Clustering. Clustering are such techniques that can reduce the data size by a large factor by grouping similar services together. A cluster contains some similar services just like a club contains some like-minded users. This is another reason besides abbreviation that we call this approach ClubCF [10]. This approach is enacted around two stages. In the first stage, the available services are divided into small-scale clusters, in logic, for further processing. At the second stage, a collaborative filtering algorithm is imposed on one of the clusters. This similarity metric computes the Euclidean distance d between two such user points This value alone doesn’t constitute a valid similarity metric, because larger values would mean more-distant, and therefore less similar, users. The value should be smaller when users are more similar [12]. |
IV. ARCHITECTURE DIAGRAM |
V. MODULES |
A. LOGIN AND ADD MOVIE DETAILS |
The Login Form module presents site visitors with a form with username and password fields. If the user enters a valid username/password combination they will be granted access to additional resources on your application. Which additional resources they will have access to can be configured separately [15]. |
In this module admin can add new movie titles and their release date and their genre details. These details will added to the existing details. User can select the movie details added in this module and they will rate the movie based on their reviews. These details are used for cluster the data based on their ratings [19]. |
B. DATA PREPROCESSING: |
The training data, we are given a list of vectors (u; m; r; t), where u is a user ID, m is a movie ID, r is the rating u gave to m, and t is the date. After training, we output predictions for a list of user-movie pairs. We measure error by using the root mean squared error. After pre processing we output the movie ids with the corresponding users and their ratings with ; separated files [20]. |
C. DATA CLUSTERING: |
We cluster the people based on the movies they watched and then cluster the movies based on the people that watched them. The people can then be re-clustered based on the number of movies in each movie cluster they watched [25]. Movies can similarly be re-clustered based on the number of people in each person cluster that watched them. On the first pass, people are clustered based on movies and movies based on people. On the second, and subsequent passes, people are clustered based on movie clusters, and movies based on people clusters. A cluster contains some similar services just like a club contains some like-minded users [30]. |
D. RECOMMENDATION: |
This similarity metric computes the Euclidean distance d between two such user points This value alone doesn’t constitute a valid similarity metric, because larger values would mean more-distant, and therefore less similar, users [33]. The value should be smaller when users are more similar. Therefore, the implementation actually returns 1 / (1+d). The upside of this approach is that recommendation is fast at runtime because almost everything is pre computed. One could argue that the recommendations are less personal this way, because recommendations are computed for a group rather than an individual. This approach may be more effective at producing recommendations for new users, who have little preference data available [29]. |
VI. ALGORITHM |
A. COLLABORATIVE FILTERING: |
It is a technique used by some recommender systems. Collaborative filtering has two senses, a narrow one and a more general one. In general, collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc [35]. Applications of collaborative filtering typically involve very large data sets. Collaborative filtering methods have been applied to many different kinds of data including: sensing and monitoring data, such as in mineral exploration, environmental sensing over large areas or multiple sensors; financial data, such as financial service institutions that integrate many financial sources; or in electronic commerce and web applications where the focus is on user data, etc. The remainder of this discussion focuses on collaborative filtering for user data, although some of the methods and approaches may apply to the other major applications as well [36]. |
B. K-MEAN CLUSTERING: |
Clustering is the process of partitioning a group of data points into a small number of clusters. For instance, the items in a supermarket are clustered in categories. Of course this is a qualitative kind of partitioning. A quantitative approach would be to measure certain features of the products, say percentage of milk and others, and products with high percentage of milk would be grouped together. In general, we have n data points xi,i=1...nthat have to be partitioned in k clusters. The goal is to assign a cluster to each data point. K-means is a clustering method that aims to find the positions μi,i=1...k of the clusters that minimize the distance from the data points to the cluster. K-means clustering solves |
argmincΣi=1kΣx∈cid(x,μi)=argmincΣi=1kΣx∈ciâÃËÃÂ¥x−μiâÃËÃÂ¥22 |
where ci is the set of points that belong to cluster i. The K-means clustering uses the square of the Euclidean distance d(x,μi)=âÃËÃÂ¥x−μiâÃËÃÂ¥22. This problem is not trivial (in fact it is NP-hard), so the K-means algorithm only hopes to find the global minimum, possibly getting stuck in a different solution [37]. |
VII. EXPERIMENTAL SETUP AND RESULT |
A Hard drive of twenty G and a RAM memory of 512 MB (min) square measure used for the implementation. Java JDK 1.7 is employed because the front-end java and five.0 is employed because the back-end with MySQL [22] |
A. SCREENSHOTS |
The most fundamental challenge for the Big Data applications is to explore the large volumes of data and extract useful information or knowledge for future actions. The basic assumption of user-based CF is that people who agree in the past tend to agree again in the future. |
XI. CONCLUSION |
In this paper, we present a ClubCF approach for big data applications relevant to service recommendation. Before applying CF technique, services are merged into some clusters via an AHC algorithm. Then the rating similarities between services within the same cluster are computed. As the number of services in a cluster is much less than that of in the whole system, ClubCF costs less online computation time. Moreover, as the ratings of services in the same cluster are more relevant with each other than with the ones in other clusters, prediction based on the ratings of the services in the same cluster will be more accurate than based on the ratings of all similar or dissimilar services in all clusters. These two advantageous of ClubCF have been verified by experiments on real-world data set. |
X. ACKNOWLEDGEMENT |
The author would like to thank the Vice Chancellor, Dean-Engineering, Director, Secretary, Correspondent, HOD of Computer Science & Engineering, Dr. K.P. Kaliyamurthie, Bharath University, Chennai for their motivation and constant encouragement. The author would like to specially thank Dr. A. Kumaravel, Dean , School of Computing, for his guidance and for critical review of this manuscript and for his valuable input and fruitful discussions in completing the work and the Faculty Members of Department of Computer Science &Engineering. Also, he takes privilege in extending gratitude to his parents and family members who rendered their support throughout this Research work. |
References |
|