Impact of Similarity Measures on Causal Relation Based Feature Selection Method for Clustering Maritime Accident Reports
Unsupervised document clustering is an automated process in which documents are analyzed based on their similarity. In this paper, we propose a new feature selection method based on causal relations to classify maritime accident reports in unsupervised manner. We also compare the impact of different similarity measures on proposed feature selection method. Based on the analysis, we conclude that the proposed feature selection method has better performance over the conventional method due to the effect of dimensionality curse. The impact of similarity measures improves with the proposed feature selection method. In the analysis, we have compared Correlation, Cosine, Spearman, Bray-Curtis, Euclidean, City-block, Squared-Euclidean, Standardized Euclidean, and, Chebychev similarity measures. The first two produced the best results, followed by the next two. The rest did not produce good results with the maritime accident reports used in our analysis. Interestingly Chi-Square gave good results with proposed method in our analysis.
Santosh Tirunagari, Maria Hanninen, Guggilla Abhishek, Kaarle Stahlberg, and Pentti Kujala
To read the full article Download Full Article