An Optimization K-Modes Clustering Algorithm with Elephant Herding Optimization Algorithm for Crime Clustering
محورهای موضوعی : Data MiningFarhad Soleimanian Gharehchopogh 1 , Sevda Haggi 2
1 - Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, IRAN
2 - Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, IRAN
کلید واژه: Clustering, Crime Clustering, K-modes, Elephant Herding Optimization Algorithm,
چکیده مقاله :
The detection and prevention of crime, in the past few decades, required several years of research and analysis. However, today, thanks to smart systems based on data mining techniques, it is possible to detect and prevent crime in a considerably less time. Classification and clustering-based smart techniques can classify and cluster the crime-related samples. The most important factor in the clustering technique is to find the centrality of the clusters and the distance between the samples of each cluster and the center of the cluster. The problem with clustering techniques, such as k-modes, is the failure to precisely detect the centrality of clusters. Therefore, in this paper, Elephant Herding Optimization (EHO) Algorithm and k-modes are used for clustering and detecting the crime by means of detecting the similarity of crime with each other. The proposed model consists of two basic steps: First, the cluster centrality should be detected for optimized clustering; in this regard, the EHO Algorithm is used. Second, k-modes are used to find the clusters of crimes with close similarity criteria based on distance. The proposed model was evaluated on the Community and Crime dataset consisting of 1994 samples with 128 characteristics. The results showed that purity accuracy of the proposed model is equal to 91.45% for 400 replicates.
1. Tayal, D.K., et al., Crime detection and criminal identification in India using data mining techniques. AI & society, 2015. 30(1): p. 117-127; Available from: https://link.springer.com/article/10.1007/s00146-014-0539-6.
2. Gharehchopogh, F.S., H. Shayanfar, and H. Gholizadeh, A comprehensive survey on symbiotic organisms search algorithms. Artificial Intelligence Review, 2019: p. 1-48; Available from: https://link.springer.com/article/10.1007%2Fs10462-019-09733-4.
3. Shayanfar, H. and F.S. Gharehchopogh, Farmland fertility: A new metaheuristic algorithm for solving continuous optimization problems. Applied Soft Computing, 2018. 71: p. 728-746; Available from: https://www.sciencedirect.com/science/article/abs/pii/S1568494618304216.
4. Gharehchopogh, F.S. and H. Gholizadeh, A comprehensive survey: Whale Optimization Algorithm and its applications. Swarm and Evolutionary Computation, 2019. 48: p. 1-24; Available from: https://www.sciencedirect.com/science/article/abs/pii/S2210650218309350.
5. Chen, P.S., Discovering Investigation Clues through Mining Criminal Databases, in Intelligence and Security Informatics. 2008, Springer. p. 173-198.
6. Ganti, V., J. Gehrke, and R. Ramakrishnan. CACTUS—clustering categorical data using summaries. in Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. 1999.
7. Wang, G.-G., S. Deb, and L.d.S. Coelho. Elephant herding optimization. in 2015 3rd International Symposium on Computational and Business Intelligence (ISCBI). 2015. IEEE.
8. Abedi, M. and F.S. Gharehchopogh, An improved opposition based learning firefly algorithm with dragonfly algorithm for solving continuous optimization problems. Intelligent Data Analysis, 2020. 24(2): p. 309-338; Available from: https://content.iospress.com/articles/intelligent-data-analysis/ida194485.
9. Allahverdipour, A. and F. Soleimanian Gharehchopogh, A New Hybrid Model of K-Means and Naïve Bayes Algorithms for Feature Selection in Text Documents Categorization. Journal of Advances in Computer Research, 2017. 8(4): p. 73-86; Available from: http://jacr.iausari.ac.ir/article_651859.html.
10. Khalandi, S. and F. Soleimanian Gharehchopogh, A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier. Journal of Advances in Computer Engineering and Technology, 2018. 4(3): p. 167-184; Available from: http://jacet.srbiau.ac.ir/article_12936.html.
11. Parlar, T., S. Ozel, and F. Song, Analysis of data pre-processing methods for sentiment analysis of reviews. Computer Science, 2019. 20; Available from: http://yadda.icm.edu.pl/baztech/element/bwmeta1.element.baztech-c8123943-cf0e-46d6-acd2-8b42784e4235.
12. Allahverdipour, A. and F. Soleimanian Gharehchopogh, An improved k-nearest neighbor with crow search algorithm for feature selection in text documents classification. Journal of Advances in Computer Research, 2018. 9(2): p. 37-48; Available from: http://jacr.iausari.ac.ir/article_655529.html.
13. Aci, M., C. İnan, and M. Avci, A hybrid classification method of k nearest neighbor, Bayesian methods and genetic algorithm. Expert Systems with Applications, 2010. 37(7): p. 5061-5067; Available from: https://www.sciencedirect.com/science/article/abs/pii/S0957417409010501.
14. Majidpour, H. and F. Soleimanian Gharehchopogh, An improved flower pollination algorithm with AdaBoost algorithm for feature selection in text documents classification. Journal of Advances in Computer Research, 2018. 9(1): p. 29-40; Available from: http://jacr.iausari.ac.ir/article_653945.html.
15. Grubesic, T.H., On the application of fuzzy clustering for crime hot spot detection. Journal of Quantitative Criminology, 2006. 22(1): p. 77; Available from: https://link.springer.com/article/10.1007/s10940-005-9003-6.
16. Wang, W.B., et al. Detecting criminal relationships through som visual analytics. in 2015 19th International Conference on Information Visualisation. 2015. IEEE.
17. Buczak, A.L. and C.M. Gifford. Fuzzy association rule mining for community crime pattern discovery. in ACM SIGKDD Workshop on Intelligence and Security Informatics. 2010.
18. McClendon, L. and N. Meghanathan, Using machine learning algorithms to analyze crime data. Machine Learning and Applications: An International Journal (MLAIJ), 2015. 2(1): p. 1-12.
19. Lawpanom, R. and W. Songpan, Association Rule Discovery for Rosewood Crime Arrest Planning, in Information Science and Applications (ICISA) 2016. 2016, Springer. p. 1025-1032.
20. Agarwal, J., R. Nagpal, and R. Sehgal, Crime analysis using k-means clustering. International Journal of Computer Applications, 2013. 83(4); Available from: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.402.1621&rep=rep1&type=pdf.
21. Kiani, R., S. Mahdavi, and A. Keshavarzi, Analysis and prediction of crimes by clustering and classification. International Journal of Advanced Research in Artificial Intelligence, 2015. 4(8): p. 11-17.
22. Keyvanpour, M.R., M. Javideh, and M.R. Ebrahimi, Detecting and investigating crime by means of data mining: a general crime matching framework. Procedia Computer Science, 2011. 3: p. 872-880; Available from: https://www.sciencedirect.com/science/article/pii/S1877050910005181.
23. Zulfadhilah, M., Y. Prayudi, and I. Riadi, Cyber profiling using log analysis and k-means clustering. International Journal of Advanced Computer Science and Applications, 2016. 7(7): p. 430-435; Available from: https://www.researchgate.net/profile/Yudi_Prayudi/publication/305737193_Cyber_Profiling_using_Log_Analysis_and_K-Means_Clustering_A_Case_Study_Higher_Education_in_Indonesia/links/579eeaa608ae6a2882f5479a.pdf.
24. Kaur, M., S. Vashisht, and K. Saurabh, Adaptive algorithm for cyber crime detection. International Journal of Computer Science and Information Technologies (IJCSIT), 2012. 3(3): p. 4381-4384; Available from: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.438.1130&rep=rep1&type=pdf.
25. Deylami, H.-M. and Y.P. Singh, Adaboost and SVM based cybercrime detection and prevention model. Artif. Intell. Research, 2012. 1(2): p. 117-130.
26. Vadivel, A. and S. Shaila, Event Pattern Analysis and Prediction at Sentence Level using Neuro-Fuzzy Model for Crime Event Detection. Pattern Analysis and Applications, 2016. 19(3): p. 679-698; Available from: https://link.springer.com/article/10.1007/s10044-014-0421-7.
27. Hasanluo, M. and F. Soleimanian Gharehchopogh, Software cost estimation by a new hybrid model of particle swarm optimization and k-nearest neighbor algorithms. Journal of Electrical and Computer Engineering Innovations (JECEI), 2016. 4(1): p. 49-55; Available from: http://jecei.sru.ac.ir/article_556.html.
28. Asghari Agcheh Dizaj, S. and F. Soleimanian Gharehchopogh, A New Approach to Software Cost Estimation by Improving Genetic Algorithm with Bat Algorithm. Journal of Computer & Robotics, 2018. 11(2): p. 17-30; Available from: http://www.qjcr.ir/article_543464_115388.html.
29. Huang, X., et al., DSKmeans: a new kmeans-type approach to discriminative subspace clustering. Knowledge-Based Systems, 2014. 70: p. 293-300; Available from: https://www.sciencedirect.com/science/article/abs/pii/S0950705114002664.
30. Reaves, B.A. and A.L. Goldberg, Law enforcement management and administrative statistics, 1997: Data for individual state and local agencies with 100 or more officers. 1999: DIANE Publishing.