ISSN ONLINE(23209801) PRINT (23209798)
Amazing porn model Belle Delphine nudes on sexelibre.org. Watch free video collection of Belle Delphine nede leaked
Rare Muslim porn and سكس on sexsaoy.com. Tons of Arab porn clips.
XNXX and Xvideos porn clips free on xnxxarabsex.com. Best XnXX porn tube channels, categorized sex videos, homemade and amateur porn.
Exlusive russian porn russiainporn.com. Get uniqe porn clips from Russia
Find out on sexjk.com best collection of Arabain and Hijab سكس
D. Sharmila Rani and V.T.Shenbagamuthu Sri Krishna College of Engg & Tech, Coimbatore, Tamilnadu, India 
Related article at Pubmed, Scholar Google 
Visit for more related articles at International Journal of Innovative Research in Computer and Communication Engineering
Clustering is one of the main analytical methods in data mining. A cluster is a collection of data objects that are similar to one another with in the same cluster and are dissimilar to the objects in other clusters.The process of grouping a set of physical or abstract objects into classes of similar objects is called clustering. In existing system, Kmeans algorithm proceeds it randomly select k of the objects,each of which initially represents a cluster mean or center. For each of the remaining objects, an object is assigned to the cluster to which it is the most similar, based on the distance between the object and the cluster mean. It then computes the new mean for each cluster.This process iterates the criterion function converges. In our proposed system, requiring a simple data structure to store some information in every iteration,which is to be used in the next iteration.The improved method avoids computing the distance of each data object to the cluster centers repeatedly, saving the running time. Experimental results show that the improved method can effectively improve the speed of clustering and accuracy, reducing the computational complexity of the Kmeans.
Keywords 
Clustering analysis, Kmeans algorithm, distance, computational complexity 
INTRODUCTION 
Clustering is a way that classifies the raw data reasonably and searches the hidden patterns that may exist in datasets. It is a process of grouping data objects into disjointed clusters so that the data in the same cluster are similar, yet data belonging to different cluster differ.The demand for organizing the sharp increasing data and learning valuable information from data, which makes clustering techniques are widely applied in many application areas such as artificial intelligence, biology, customer relationship management,data compression, data mining, information retrieval, image processing, machine learning, marketing, medicine, pattern recognition, psychology, statistics and so on. 
Kmeans is a numerical, unsupervised, non deterministic, iterative method. It is simple and very fast, so in many practical applications, the method is proved to be a very effective way that can produce good clustering results. But it is very suitable for producing globular clusters. The kmeans algorithm is effective in producing clusters for many practical applications in emerging areas like Bioinformatics . But the computational complexity of the original kmeans algorithm is very high. Moreover, this algorithm results in different types of clusters depending on the random choice of initial centroids. This paper deals with a heuristic method based on sorting and partitioning the input data for finding better initial centroids, thereby improving the accuracy of the kmeans algorithm. 
THE KMEANS CLUSTERING ALGORITHM 
Existing Kmeans algorithm 
In existing system,Kmeans algorithm proceeds it randomly select k of the objects,each of which initially represents a cluster mean or center. For each of the remaining objects, an object is assigned to the cluster to which it is the most similar,based on the distance between the object and the cluster mean. It then computes the new mean for each cluster. This process iterates the criterion function converges. 
Kmeans is a typical clustering algorithm in data mining and which is widely used for clustering large set of data. In 1967, MacQueen firstly proposed the Kmeans algorithm; it was one of the most simple, nonsupervised learning algorithms, which was applied to solve the problem of the wellknown cluster. It is a partitioning clustering algorithm, this method is to classify the given date objects into k different clusters through the iterative, converging to a local minimum. So the results of generated clusters are compact and independent. The algorithm consists of two separate phases. The first phase selects k centers randomly, where the value k is fixed in advance. The next phase is to take each data object to the nearest center. Euclidean distance is generally considered to determine the distance between each data object and the cluster centers. When all the data objects are included in some clusters, the first step is completed and an early grouping is done. Recalculating the average of the early formed clusters. This iterative process continues repeatedly until the criterion function becomes the minimum. Supposing that the target object is x, xi indicates the average of cluster Ci, criterion function is defined as follows: 
E is the sum of the squared error of all objects in database.The distance of criterion function is Euclidean distance, which isused for determining the nearest distance between each data objects and cluster center. The Euclidean distance between onevector x=(x1 ,x2 ,…xn) and another vector y=(y1 ,y2 ,…yn ), The Euclidean distance can be obtained as follow: 
The process of Kmeans algorithm as follow: 
Input: 
Number of desired clusters, k, and a database D={d1, d2,…dn} containing n data objects. 
Output: 
A set of k clusters 
Steps: 
1. Randomly select k data objects from dataset D as initial cluster centers. 
2. Repeat; 
3. Calculate the distance between each data object di (1 <= i<=n) and all k cluster centers cj(1<=j<=k) and assign data object di to the nearest cluster. 
4. For each cluster j (1<=j<=k), recalculate the cluster center. 
5. until no changing in the center of clusters. 
The Kmeans clustering algorithm always converges to local minimum. Before the Kmeans algorithm converges, calculations of distance and cluster centers are done while loops are executed a number of times, where the positive integer t is known as the number of Kmeans iterations. The precise value of t varies depending on the initial starting cluster centers . 
The distribution of data points has a relationship with the new clustering center, so the computational time complexity of the Kmeans algorithm is O(nkt). n is the number of all data objects, k is the number of clusters, t is the iterations of algorithm. Usually requiring k <<n and t <<n. 
PROPOSED SYSTEM 
The kmeans clustering algorithm consists of two separate phases: the first phase is to define k centroids, one for each cluster. The next phase is to take each point belonging to the given data set and associate it to the nearest centroid. When all the points are included in some clusters, the first phase is completed and an early grouping is done. At this point we need to recalculate the new centroids, as the inclusion of new points may lead to a change in the cluster centroids. Once we find k new centroids, a new binding is to be created between the same data points and the nearest new centroid, generating a loop. As a result of this loop, the k centroids may change their position in a step by step manner. Eventually, a situation will be reached where the centroids do not move anymore. 
Algorithm 
Input: 
D = {d1, d2,......,dn} // set of n data items. 
k // Number of desired clusters. 
Output: 
A set of k clusters. 
Steps: 
1. For each column of the data set, determine the range as the difference between the maximum and the minimum element; 
2. Identify the column having the maximum range; 
3. Sort the entire data set in nondecreasing order based on the column having the maximum range; 
4. Partition the sorted data set into ‘k’ equal parts; 
5. Determine the arithmetic mean of each part obtained in Step 4 as c1, c2,….ck; Take these mean values as the initial centroids. 
6. Repeat 
6.2 Assign each data item di to the cluster which has the closest centroid; 
6.3 Calculate new mean of each cluster; 
Until convergence criterion is met. 
EXPERIMENTAL RESULTS 
The original kmeans and the enhanced kmeans algorithms require the values of the initial centroids also as input, apart from the input data values and the value of k. The experiment is conducted for different sets of values of the initial centroids, which are selected randomly. For the proposed algorithm, the data values and the value of k are the only inputs required. The accuracy of clustering is determined by comparing the clusters obtained by the experiments with the predetermined clusters already available in the UCI data set. The percentage accuracy and the time taken for each experiment are computed and tabulated. 
CONCLUSION 
Kmeans is a typical clustering algorithm and it is widely used for clustering large sets of data.Our project elaborates Kmeans algorithm and analyses the shortcomings of the standard Kmeans clustering algorithm. Because the computational complexity of the standard Kmeans algorithm is objectionably high owing to the need to reassign the data points a number of times during every iteration, which makes the efficiency of standard Kmeans clustering is not high. Our project presents a simple and efficient way for assigning data points to clusters. The proposed method in our project ensures the entire process of clustering in O(nk) time without sacrificing the accuracy of clusters. 
References 
