Semantic Analysis based Dirichlet Clustering Scheme for Text Documents
Clustering is one of the most important techniques in machine learning and data mining tasks. Similar data grouping is performed using clustering techniques. Similarity measures are used to estimate transaction relationships. Hierarchical clustering model produces the results like tree structured. Partitioned clustering produces results in the grid format. Text documents are unstructured values with high dimensional attributes. Document clustering group up unlabeled text documents into the meaningful clusters. Traditional clustering methods require cluster count (K) for the document grouping process. Clustering accuracy degrades highly with reference to the unsuitable cluster count. Document features are automatically divided into two groups’ discriminative words and non discriminative words. Only discriminative words are needful for grouping documents. Discriminative word identification process is improved with the labeled document analysis mechanism. Concept relationships are analyzed with Ontology support. The system improves the scalability with the support of labels and concept relations for dimensionality reduction process.
C.Selvarathi, M.E, K.Karthika