Keywords 
Data stream, Data stream clustering, Outlier detection, CLARANS, ECLARANS 
INTRODUCTION 
A data stream is an unremitting, immediate, stream flow of sequence of items and it is not possible to control the order
in which data item arrive, or not possible to store these entire data items. Some of the applications of areas in which
data streams generated are sensor networks, traffic management, call detail records, blogging and twitter posts [1].Due
to be short of resources where as this type of huge data, the modern data mining systems are not sufficient and
equipped to deal with them. Data stream clustering is a wellknown task in mining data stream, clustering is known as
grouping related objects into a cluster. With the help of data stream clustering method [2], we can detect the outliers,
and the outlier is nothing but it is an object that does not fulfil with the behaviour of normal data objects. Applications
of outlier detection are web logs, fraud detection and click streams, communication of telecoms and web document.
Clustering based outlier mining [14] methods are called as unsupervised in nature and its main objective is to find the
outlier from the data stream using partitioning cluster based method. The object which does not belong to any cluster or
belongs to a small cluster is affirmed as outlier, and the outlier detection process highly depends upon the clustering
technique. 
The remaining section of this paper is organized in the following way. Section 2 illustrates the review of literature.
Section 3 describes how the CLARANS and ECLARANS clustering algorithms are used to detect outliers in data
streams. Section 4 discussed about the experimental results and Conclusions are given in Section 5. 
RELATED WORK 
In this paper [8] the author presented a clustering algorithm called CLARANS which is based on randomize search.
The authors had developed two spatial data mining algorithms SD (CLARANS) and NSD (CLARANS). The
experimental results and analysis indicated that both algorithms are effective, and can lead to discoveries that are
difficult to obtain with existing spatial data mining algorithms. Finally, their experimental results showed that
CLARANS is more efficient than existing clustering methods. 
The paper [4] discussed a literature of several clustering procedures and multivariate outlier procedures. And also the
features of multivariate outliers are also discussed, as well as the applications are highlighted in this survey. Finally the
authors discussed about further research challenges on multivariate outliers. 
In this paper [5] authors conversed about partitioning clustering based outlier detection for data streams. In this
each and every data are entered into a specify size of window, and also they reported each and every data as outlier and
also store the data. By using K means algorithm, they have been found small cluster, which is faraway to other clusters
and termed as outlier. 
In this paper [9] authors compared two partitioning clustering approaches namely CLARANS and FUZZY C MEANS.
By measuring the clustering accuracy and outlier accuracy, the performance of clustering and outlier detection is better
in CLARANS clustering algorithms. 
METHODOLOGY 
In data stream, the clustering technique is applied for grouping the data items and also detecting the outliers.
Clustering and Outlier detection are most important problems in data streams. The main objective of this research work
is to analyse the performance of the two partitioning clustering algorithms namely CLARANS and ECLARANS for
detecting the outliers. The system architecture of the research work is as follows as 
A. DATASET 
Dataset which have been used in this research work is Pima Indian data set; it contains 768 instances and 8
attributes. This dataset is taken from UCI machine learning repository [3]. Data stream is an abundant flawless
sequence of data and it is not possible to store the complete data stream, due to this reason we divide the data into
chunks of same size in different windows. 
B. CLUSTERING 
Cluster analysis is used in a various number of applications; they are stock market analysis, data analysis, image
processing and financial market analysis 14]. In data streams the clustering is one of the subprocess areas which are
used to group the objects as well as it is used to detect the outliers efficiently and also clustering is one of the
unsupervised action in data streams. The data stream clustering are different types of approaches they are distance
based, grid based, partition based, hierarchical based and so on. 
C. OUTLIER DETECTION 
Outlier detection over streaming data is active research area from data stream mining that aims to detect object which
have different [5] behaviour, exceptional than normal object. An outlier is an item that is notably unrelated or
incompatible to other data object whereas weblogs click stream telecommunication, fraud detection, documents of web
are the application areas of outlier detection in data streams. The other specified names of outlier detection are termed
as noise, anomalies, indifferent, not catchable to the related object, and unknown. The clustering based outlier detection is a best technique to manage this problem. For our research we have used partitioning cluster based outlier detection
algorithms CLARANS and ECLARANS. 
D.CLARANS 
This method involves partitioning clustering algorithm in data streams [9]. First the data’s are splitted into chunks of
same size in different windows, after that consider each database(s) into data point (dp), partition of size=s/p, along
with max neighbor of k=3. Then the minimum cost for each data point (dp) identifies the neighbor value, and it follows
the condition i=1and j=1.Then the distance for each data point is calculated and also choose maximum distance (n) for
each data points, if (s) has a lower cost, set current to(s), are increment j by 1.when j > max neighbor, compare the cost
of current with minimum cost. If the cost value is less than (<) min cost, set minimum cost to current of cost value.
Finally group the cluster, in order to satisfy the threshold value≤ min cost. Finally nodes are clustered and outliers are
identified. 

E. ECLARANS (Enhanced Clarans) 
In ECLARANS, first the data are splitted into chunks of same size in different windows, after that consider each
database(S) into data point (dp), partition of size=s/p, along with max neighbor of k=3. Then the minimum cost for
each data point (dp) is identified the neighbor value, and it follows the condition i=1and j=1.Then calculate the distance
for each data points and also choose maximum distance (n) for each data points. Set current to an arbitrary node in n: k,
for each data point we have to set j to 1along with a random neighbor (s) of current value, and also calculate the cost
differential of the two nodes. If (s) has a lower cost, set current to(s) is increment j by 1. when j > max neighbor,
compare the cost of current with minimum cost. If the cost value is less than (<) min cost, set minimum cost to current
of cost value. Finally group the cluster, in order to satisfy the threshold value≤ min cost. Then lastly nodes are clustered
and detect outliers. 

EXPERIMENTAL RESULTS 
We have implemented these two partitioning clustering algorithms in MATLAB 7.10 (R2010a). In order to evaluate the
performance of the algorithms, the two factors namely clustering accuracy and outlier accuracy are used. The different
sizes of the window are 3 and 5. 
A. CLUSTERING ACCURACY 
From the above figure2, it is observed that proposed ECLARANS clustering algorithm performs better than
CLARANS clustering algorithm. 
B. OUTLIER ACCURACY 
From the above figure3, it is observed that proposed ECLARANS clustering algorithm performs better than
CLARANS clustering algorithm. 
CONCLUSION 
Data streams are fast and limitless arrival of ordered and unordered data, by using of data streams clustering technique
we can handle those data. Detecting outliers in data stream is one of the challenging research problems. In this paper,
we have analysed the performance of CLARANS and ECLARANS clustering algorithm for detecting the outliers. In
turn to find the best clustering algorithm for outlier detection two performance measures are used. From the
experimental results it is come to know that the outlier detection and clustering accuracies are more efficient in
proposed ECLARANS while compared to CLARANS clustering. 
Tables at a glance 


Table 1 
Table 2 

Figures at a glance 



Figure 1 
Figure 2 
Figure 3 

References 
 Aggarwal.C, Ed., ?Data Streams ? Models and Algorithms?, Springer, 2007.
 Aggarwal.C.C, J. Han, J. Wang, and P. S. Yu,?A framework for clustering evolving data streams,? In Proc. of VLDB, pages 8192, 2003.
 C. J. Merz and P. M. Murph, UCI Repository of Machine Learning Databases Univ. of CA,Dept. of CIS, Irvine.
 G. S. David Sam Jayakumar and Bejoy John Thomas, ?A New Procedure of Clustering Based on Multivariate Outlier Detection?, Journal of Data Science 11(2013).
 Hossein Moradi Koupaie , Suhaimi Ibrahim, Javad Hosseinkhani, ?Outlier Detection in Stream Data by Clustering Method?, International Journal of Advanced Computer Science and Information Technology (IJACSIT)Vol. 2, No. 3, Page: 2534,2013.
 J. Chandrika, Dr. K.R. Ananda Kumar, ?Dynamic Clustering Of High Speed Data Streams?, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 2, No 1, March 2012.
 Rajendra Pamula, Jatindra Kumar Deka,Sukumar Nandi ?An Outlier Detection Method based on Clustering?, Second International Conference on Emerging Applications of Information Technology, 2011.
 Raymond T. Ng and J. Han. Efficient and effective clustering method for spatial datamining, VLDB'94.
 S. Vijayarani, P. Jothi, ?A New Approach for Detecting Outliers in Data Streams?, International journal of engineering sciences & research Technology, ISSN: 22779655, Pg no: 31283133, November 2013.
 Shifei Ding, Fulin Wu, Jun Qian, Hongjie Jia, ?Research on data stream clustering algorithms? in Artificial Intelligence Review, springer 2013.
 Sudipto Guha, Adam Meyerson, Nine Mishra and Rajeev Motwani, ?Clustering Data Streams: Theory and practice,? IEEE Transactions onKnowledge and Data Engineering, vol. 15, no.3, pp. 515528, May/June, 2003.
 T. Soni Madhulatha, ?overview of streamingdata algorithms?, Advanced Computing: An International Journal (ACIJ), Vol.2, No.6, November, 2011.
 Yihong lu, Yan huang, ?Mining DataStreams Using Clustering?, Proceedings of the Fourth International Conference on Machine Learning and Cybernetics,vol.4, pp. 1821,2005.
 Yogita, Durga Toshniwal, ?Clustering Techniques for Streaming Data?A Survey? in proc. Of the IEEE, 2012.
