A Comparative Study of Issues in Big Data Clustering Algorithm with Constraint Based Genetic Algorithm for Associative Clustering
Clustering can be defined as the process of partitioning a set of patterns into disjoint and homogeneous meaningful groups, called clusters. The growing need for distributed clustering algorithms is attributed to the huge size of databases that is common nowadays. The task of extracting knowledge from large databases, in the form of clustering rules, has attracted considerable attention. Distributed clustering algorithms embrace this trend of merging computations with communication and explore all the facets of the distributed computing environments. Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem. An important feature of the proposed technique is that it is able to automatically find the optimal number of clusters (i.e., the number of clusters does not have to be known in advance) even for very high dimensional data sets, where tracking of the number of clusters may be highly impossible. The proposed Optimal Associative Clustering algorithm using genetic algorithm and bayes factor for precision is able to outperform two other state-of-the-art clustering algorithms in a statistically meaningful way over a majority of the benchmark data sets. The result of the proposed optimal associative clustering algorithm is compared with one existing algorithm on two multi dimensional datasets. Experimental result demonstrates that the proposed method is able to achieve a better clustering solution when compared with existing algorithms.
B.Kranthi Kiran, Dr.A Vinaya Babu