Journal of Applied Science and Engineering

Published by Tamkang University Press

1.30

Impact Factor

2.10

CiteScore

Fancai KongThis email address is being protected from spambots. You need JavaScript enabled to view it.

School of Business Administration, Zhengzhou University of Science and Technology Zhengzhou 450000 China


 

 

Received: February 16, 2024
Accepted: March 15, 2024
Publication Date: May 4, 2024

 Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.


Download Citation: ||https://doi.org/10.6180/jase.202503_28(3).0005  


Big data clustering plays an important role in the field of business administration data processing, but the big data clustering method has some problems such as poor clustering effect and low Jaccard coefficient. Therefore, this paper proposes a novel business administration big data clustering optimization method based on active density peak and Salpa swarm algorithm. This method combines principal component analysis method and information entropy dimensionality reduction to process big data, reduces the time required for data clustering, and uses intuitionistic fuzzy kernel clustering algorithm to cluster big data. The algorithm uses the Logic-Tent chaotic sequence to promote the uniform distribution of the initial population of Salp swarm algorithm (SSA). By adding attenuation factors that limit the leader’s over-searching, the global exploration ability and local exploitation ability of the algorithm are balanced. Adaptive inertia weights are added to follower updates to avoid convergence of the algorithm to local extreme values. This method designs a fast update strategy and effectively updates labels. An active cluster integration framework combining local and global uncertainties is proposed, and weighted voting consistency method is introduced to optimize the integration of cluster results. Experimental results on data sets show that the proposed method can effectively solve the problems of traditional methods and has good clustering performance.


Keywords: Big data clustering, active density peak, Salpa swarm algorithm, fast update strategy, adaptive inertia weight


  1. [1] A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, (2023) “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data" Information Sciences 622: 178–210. DOI: 10.1016/j.ins.2022.11.139.
  2. [2] S. Yin, H. Li, A. A. Laghari, T. R. Gadekallu, G. A. Sampedro, and A. Almadhor, (2024) “An Anomaly Detection Model Based On Deep Auto-Encoder and Capsule Graph Convolution via Sparrow Search Algorithm in 6G Internet-of-Everything" IEEE Internet of Things Journal: DOI: 10.1109/JIOT.2024.3353337.
  3. [3] G. Sun, Y. Cong, J. Dong, Y. Liu, Z. Ding, and H. Yu, (2021) “What and how: generalized lifelong spectral clustering via dual memory" IEEE transactions on pattern analysis and machine intelligence 44(7): 3895–3908. DOI: 10.1109/TPAMI.2021.3058852.
  4. [4] X. Ran, Y. Xi, Y. Lu, X. Wang, and Z. Lu, (2023) “Comprehensive survey on hierarchical clustering algorithms and the recent developments" Artificial Intelligence Review 56(8): 8219–8264. DOI: 10.1007/s10462-022-10366-3.
  5. [5] T. Ding, X. Liu, Z. Tao, T. Liu, T. Chen, W. Zhang, X. Shen, D. Liu, S. Wang, B. Pang, et al., (2021) “Atomically precise dinuclear site active toward electrocatalytic CO2 reduction" Journal of the American Chemical Society 143(30): 11317–11324.
  6. [6] D. He, H. Yu, G. Wang, and J. Li, (2021) “A two-stage clustering-based cold-start method for active learning" Intelligent Data Analysis 25(5): 1169–1185. DOI: 10.3233/IDA-205393.
  7. [7] Y. Yin, Y. Zhao, H. Li, and X. Dong, (2021) “Multiobjective evolutionary clustering for large-scale dynamic community detection" Information Sciences 549: 269–287. DOI: 10.1016/j.ins.2020.11.025.
  8. [8] X. Jiang and Z. Ge, (2021) “Augmented multidimensional convolutional neural network for industrial soft sensing" IEEE Transactions on Instrumentation and Measurement 70: 1–10. DOI: 10.1109/TIM.2021.3075515.
  9. [9] H. Khalili, M. Rabbani, and E. Akbari, (2021) “Clustering ensemble selection based on the extended Jaccard measure" Turkish Journal of Electrical Engineering and Computer Sciences 29(4): 2215–2231.
  10. [10] R. Gantassi, B. Ben Gouissem, O. Cheikhrouhou, S. El Khediri, and S. Hasnaoui, (2021) “Optimizing quality of service of clustering protocols in large-scale wireless sensor networks with mobile data collector and machine learning" Security and Communication Networks 2021: 1–12. DOI: 10.1155/2021/5531185.
  11. [11] Y. Chen, A. Subburathinam, C.-H. Chen, and M. J. Zaki. “Personalized food recommendation as constrained question answering over a large-scale food knowledge graph”. In: Proceedings of the 14th ACM international conference on web search and data mining. 2021, 544–552.
  12. [12] Y. Qin, S. Ding, L. Wang, and Y. Wang, (2019) “Research progress on semi-supervised clustering" Cognitive Computation 11(5): 599–612.
  13. [13] K. Azar, Z. Hajiakhondi-Meybodi, and F. Naderkhani, (2022) “Semi-supervised clusteringbased method for fault diagnosis and prognosis: A case study" Reliability Engineering & System Safety 222: 108405.
  14. [14] D. S. Mai, V. H. Tran, and T. H. Dang. “An Improvement of Fuzzy C-Means Clustering Using the Multiple Kernels Technique with Gravitational Force Information for Data Classification”. In: 2023 15th International Conference on Knowledge and Systems Engineering (KSE). IEEE. 2023, 1–4.
  15. [15] Z. Li, F. Nie, X. Chang, L. Nie, H. Zhang, and Y. Yang, (2018) “Rank-constrained spectral clustering with flexible embedding" IEEE transactions on neural networks and learning systems 29(12): 6073–6082.
  16. [16] S. Yin, (2023) “Object Detection Based on Deep Learning: A Brief Review" IJLAI Transactions on Science and Engineering 1(02): 1–6.
  17. [17] M. Jiang and S. Yin, (2023) “Facial expression recognition based on convolutional block attention module and multi-feature fusion" International Journal of Computational Vision and Robotics 13(1): 21–37.
  18. [18] M. Al-Laham, S. Abdullah, M. A. Al-Ma’aitah, M. A. Al-Betar, S. Kassaymeh, and A. Azzazi, (2023) “Parameter identification of a multilayer perceptron neural network using an optimized salp swarm algorithm" International Journal of Advanced Computer Science and Applications 14(6):
  19. [19] S. Jain and R. Dharavath, (2023) “Memetic salp swarm optimization algorithm based feature selection approach for crop disease detection system" Journal of Ambient Intelligence and Humanized Computing 14(3): 1817–1835. DOI: 10.1007/s12652-021-03406-3.
  20. [20] L. Xu and C. Guo, (2023) “CoxNAM: An interpretable deep survival analysis model" Expert Systems with Applications: 120218. DOI: 10.1016/j.eswa.2023.120218.
  21. [21] C. Zhou and L. Zou, (2023) “Semi-supervised Gaussian processes active learning model for imbalanced small data based on tri-training with data enhancement" IEEE Access 11: 17510–17524. DOI: 10.1109/ACCESS.2023.3244682.
  22. [22] T. Deng, J. Wang, Q. Jia, and M. Yang, (2023) “Semisupervised sparse representation collaborative clustering of incomplete data" Applied Intelligence 53(24): 31077–31105. DOI: 10.1007/s10489-023-05168-1.
  23. [23] P. Zhou, B. Sun, X. Liu, L. Du, and X. Li, (2023) “Active clustering ensemble with self-paced learning" IEEE Transactions on Neural Networks and Learning Systems: DOI: 10.1109/TNNLS.2023.3252586.
  24. [24] F. Buchert, N. Navab, and S. T. Kim, (2023) “Toward Label-Efficient Neural Network Training: Diversity-Based Sampling in Semi-Supervised Active Learning" IEEE Access 11: 5193–5205. DOI: 10.1109/ACCESS.2023.3236529.
  25. [25] L.-y. Tang, Z.-h. Wang, S.-d. Wang, J.-c. Fan, and G.-w. Yue, (2023) “A novel rough semi-supervised k-means algorithm for text clustering" International Journal of Bio-Inspired Computation 21(2): 57–68. DOI: 10.1504/IJBIC.2023.130548.