Journal of Applied Science and Engineering

Published by Tamkang University Press

1.30

Impact Factor

1.60

CiteScore

Hwei-Jen Lin This email address is being protected from spambots. You need JavaScript enabled to view it.1, Fu-Wen Yang1 and Yang-Ta Kao1

1Department of Computer Science and Information Engineering, Tamkang University Tamsui, Taiwan 251, R.O.C.


 

Received: March 4, 2005
Accepted: April 27, 2005
Publication Date: June 1, 2005

Download Citation: ||https://doi.org/10.6180/jase.2005.8.2.04  


ABSTRACT


In this paper, we propose a GA-based unsupervised clustering technique that selects cluster centers directly from the data set, allowing it to speed up the fitness evaluation by constructing a look-up table in advance, saving the distances between all pairs of data points, and by using binary representation rather than string representation to encode a variable number of cluster centers. More effective versions of operators for reproduction, crossover, and mutation are introduced. Finally, the Davies-Bouldin index is employed to measure the validity of clusters. The development of our algorithm has demonstrated an ability to properly cluster a variety of data sets. The experimental results show that the proposed algorithm provides a more stable clustering performance in terms of number of clusters and clustering results. This results in considerable less computational time required, when compared to other GA-based clustering algorithms.


Keywords: Unsupervised Clustering, Genetic Algorithms, Reproduction, Crossover, Mutation, Fitness, Cluster Validity


REFERENCES


  1. [1] Theodoridis, S. and Koutroumbas, K., Pattern Recognition, Academic Press, New York, NY, U.S.A. (1999).
  2. [2] Duda, R. O., Hart, P. E. and Stork, D. G., Pattern Classification, 2nd ed., John-Wiley, New York, NY, U.S.A. (2001).
  3. [3] Arabie, P., Hubert, L. J. and Soete, D. De, Clustering and Classification, World Scientific, Singapore (1996).
  4. [4] Goldberg, D. E., Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, New York, NY, U.S.A. (1989).
  5. [5] Michalewicz, Z., Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag, New York, NY, U.S.A. (1992).
  6. [6] Mitchell, M., An Introduction to Genetic Algorithms, The MIT press, London, England (1996).
  7. [7] Raghavan, V. V. and Birchand, K., “A Clustering Strategy Based on a Formalism of the Reproductive Process in a Natural System,” in Proceedings of the 2nd International Conference on Information Storage and Retrieval, pp. 1022 (1979).
  8. [8] Jones, D. and Beltramo, M. A., “Solving Partitioning Problems with Genetic Algorithms,” in Proceedings of the 4th International Conference on Genetic Algorithms, pp. 442449 (1991).
  9. [9] Babu, G. P. and Murty, M. N., “Clustering with Evolution Strategies,” Pattern Recognition, Vol. 27, pp. 321 329 (1994).
  10. [10] Maulik, U. and Bandyopadhyay, S., “Genetic Algorithm-Based Clustering Technique,” Pattern Recognition, Vol. 33, pp. 1455465 (2000).
  11. [11] Bandyopadhyay, S. and Maulik, U., “Genetic Clustering for Automatic Evolution of Clusters and Application to Image Classification,” Pattern Recognition, Vol. 35, pp. 11971208 (2002).
  12. [12] Davis, D. L. and Bouldin, D. W., “A Cluster Separation Measure,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 1, pp. 224227 (1979).
  13. [13] Bezdek, J. C., “Some New Indexes of Cluster Validity,” IEEE Trans. Systems, Man, and Cybernetics - Part B: Cybernetics, Vol. 28, pp. 301315 (1998).
  14. [14] Bäck, T., Evolutionary Algorithms in Theory and Practice, Oxford Univ. Press, New York, NY, U.S.A. (1996).
  15. [15] Schaffer, J. D., Caruna, R. A., Eshelman, L. J. and Das, R., “A Study of Control Parameters Affecting Online Performance of Genetic Algorithms for Function Optimization,” in Proceedings of the 3rd International Conference on Genetic Algorithms, Schaffer, J. D., Ed. San Mateo, CA: Morgan Kaufman, pp. 859866 (1989).
  16. [16] Bandyopadhyay, S. and Maulik, U., “Nonparametric Genetic Clustering: Comparison of Validity Indices,” in IEEE Trans. Systems, Man, and Cybernetics - Part C: Application and reviews, Vol. 31, pp. 120125 (2001).
  17. [17] Su, M. C. and Chou, C. H., “A Modified Version of the K-Means Algorithm with a Distance Based on Cluster Symmetry,” in IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 23, pp. 674680 (2001).