A New Hybrid Model of K-Means and Naïve Bayes Algorithms for Feature Selection in Text Documents Categorization

Allahverdipour, Ali; Soleimanian Gharehchopogh, Farhad

رقم المقالة : JACR-1609-1474 (R5) زيارة : 231 الصفحة: 73 - 86

نوع المخطوط: ابحاث

A New Hybrid Model of K-Means and Naïve Bayes Algorithms for Feature Selection in Text Documents Categorization

الموضوعات :

Ali Allahverdipour ¹ , Farhad Soleimanian Gharehchopogh ²

1 - Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, Iran
2 - Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, Iran

تاريخ الإرسال : 17 الأحد , ذو الحجة, 1437 تاريخ التأكيد : 25 الجمعة , صفر, 1438 تاريخ الإصدار : 12 الأربعاء , صفر, 1439

الکلمات المفتاحية: Machine Learning, feature selection, Text Categorization, k-Means Algorithm, Na&iuml, ve Bayes algorithm,

ملخص المقالة :

With increasing speed of information and documents on the Web, need to classify them in different categories and clusters to be felt. Clustering try to find related structures in datasets which they are not categorized, yet. Concerning the needs, a new approach for text documents categorization is presented in this paper which included three phases: pre-processing documents and selection feature, K-Means clustering and Naïve Bayes (NB) optimization. The proposed model uses K-Means and NB algorithms that utilize K-Means algorithm to find minimum distances between features from center of clusters and NB algorithm for computing the probability of each feature into documents and using them to clustering features, separately. The proposed model optimizes performance of K-Means algorithm by using NB properties in clustering. Therefore, the model overcomes to the challenges of labeling different documents and origin of K-Means algorithm which it refers to categorizing text documents as un-supervised model. Finally, the experiment results of proposed algorithm and K-Means algorithms are evaluated based on evaluation methods and are compared in validated datasets.

المصادر:

شارک

عنوان URL للمقالة

A New Hybrid Model of K-Means and Naïve Bayes Algorithms for Feature Selection in Text Documents Categorization

سند

الروابط

المراكز ذات الصلة

دعامة

الصفحات الرسمية