Privacy-Preserving Classification of Data Streams

Ching-Ming Chao; Po-Zung Chen; Chu-Hao Sun

doi:10.6180/jase.2009.12.3.11

Privacy-Preserving Classification of Data Streams

Computer Science and Information Engineering

Ching-Ming Chao This email address is being protected from spambots. You need JavaScript enabled to view it.¹ , Po-Zung Chen² and Chu-Hao Sun²

¹Department of Computer Science and Information Management, Soochow University, Taipei, Taiwan 100, R.O.C.
²Department of Computer Science and Information Engineering, Tamkang University, Tamsui, Taiwan 251, R.O.C.

Received: January 3, 2008
Accepted: April 1, 2009
Publication Date: September 1, 2009

Download Citation: ||https://doi.org/10.6180/jase.2009.12.3.11

ABSTRACT

Data mining is the information technology that extracts valuable knowledge from large amounts of data. Due to the emergence of data streams as a new type of data, data streams mining has recently become a very important and popular research issue. There have been many studies proposing efficient mining algorithms for data streams. On the other hand, data mining can cause a great threat to data privacy. Privacy-preserving data mining hence has also been studied. In this paper, we propose a method for privacy-preserving classification of data streams, called the PCDS method, which extends the process of data streams classification to achieve privacy preservation. The PCDS method is divided into two stages, which are data streams preprocessing and data streams mining, respectively. The stage of data streams preprocessing uses the data splitting and perturbation algorithm to perturb confidential data. Users can flexibly adjust the data attributes to be perturbed according to the security need. Therefore, threats and risks from releasing data can be effectively reduced. The stage of data streams mining uses the weighted average sliding window algorithm to mine perturbed data streams. When the classification error rate exceeds a predetermined threshold value, the classification model is reconstructed to maintain classification accuracy. Experimental results show that the PCDS method not only can preserve data privacy but also can mine data streams accurately.

Keywords: Data Streams, Data Mining, Classification, Privacy Preservation, Incremental Mining

REFERENCES

[1] Golab, L. and Ozsu, M., “Issues in Data Stream Management,” ACM SIGMOD Record, Vol. 32, pp. 514 (2003).
[2] Clifton, C. and Marks, D., “Security and Privacy Implications of Data Mining,” Proceedings of ACM SIGMOD Workshop on Data Mining and Knowledge Discovery, pp. 1519 (1996).
[3] Utgoff, P. E., “Incremental Induction of Decision Trees,” Machine Learning, Vol. 4, pp. 161186 (1989).
[4] Schlimmer, J. C. and Fisher, D. H., “A Case Study of Incremental Concept Induction,” Proceedings of the 5th International Conference on Artificial Intelligence, pp. 496501 (1986).
[5] Maloof, M. A. and Michalski, R. S., “Incremental Learning with Partial Instance Memory,” Foundations of Intelligent Systems, Vol. 2366, pp. 1627 (2002).
[6] Jin, R. and Agrawa, G., “Efficient Decision Tree Construction on Streaming Data,” Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 571576 (2003).
[7] Street, W. and Kim, Y., “A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification,” Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining, pp. 377382 (2001).
[8] Domingos, P. and Hulten, G., “Mining High-Speed Data Streams,” Proceedings of the 6th ACM International Conference on Knowledge Discovery and Data Mining, pp. 7180 (2000).
[9] Maron, O. and Moore, A., “Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation,” Advances in Neural Information Processing Systems, pp. 5966 (1993).
[10] Gama, J., Rocha, R. and Medas, P., “Accurate Decision Trees for Mining High-Speed Data Streams,” Proceedings of the 9th ACM International conference on Knowledge discovery and data mining, pp. 523 528 (2001).
[11] Hulten, G., Spencer, L. and Ddmingos, P., “Mining Time-Changing Data Streams,” Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97106 (2002).
[12] Verykios, V. S., Bertino, K., Fovino, I. N., Provenza, L. P., Saygin, Y. and Theodoridis, Y., “State-of-the-Art in Privacy Preserving Data Mining,” ACM SIGMOD Record, Vol. 33, pp. 5057 (2004).
[13] Du, W. and Zhan, Z., “Building Decision Tree Classifier on Private Data,” Proceedings of IEEE International Conference on Privacy Security and Data Mining, pp. 18 (2002).
[14] Agrawal, R. and Srikant, R., “Privacy-Preserving Data Mining,” Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 439450 (2000).
[15] Johnsten, T. and Raghavan, V., “Security Procedures for Classification Mining Algorithms,” Proceedings of the 15th Annual Working Conference on Database and Application Security, pp. 285297 (2001).
[16] Meregu, S. and Ghosh, J., “Privacy-Preserving Distributed Clustering Using Generative Models,” Proceedings of the 3rd IEEE International Conference on Data Mining, pp. 211218 (2003).
[17] Oliveira, S. R. and Zaiane, O. R., “Protecting Sensitive Knowledge by Data Sanitization,” Proceedings of the 3rd IEEE International Conference on Data Mining, pp. 613616 (2003).
[18] Lee, G., Chang, C. Y. and Chen, A. L. P., “Hiding Sensitive Patterns in Association Rules Mining,” Proceedings of the 28th IEEE International Conference on Computer Software and Applications, pp. 424429 (2004).
[19] Kantarcioglu, M. and Clifton, C., “Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data,” IEEE Transactions on Knowledge and Data Engineering, Vol. 16, pp. 10261037 (2004).
[20] Traub, J. F., Yemini, Y. and Wozniakowski, H., “The Statistical Security of a Statistical Database,” ACM Transaction Database Systems, Vol. 9, pp. 672679 (1984).
[21] Adam, N. R. and Wortmann, J. C., “Security-Control Methods for Statistical Databases: A Comparative Study,” ACM Computing Surveys, Vol. 21, pp. 515 556 (1989).
[22] Aggarwal, C. C. and Yu, P. S., “A Condensation Approach to Privacy Preserving Data Mining,” Proceedings of the 9th International Conference on Extending Database, pp. 183199 (2004).
[23] Domingo-Ferrer, J. and Torra, V., “Ordinal, Continuous and Heterogeneous k-Anonymity through Microaggregation,” Proceedings of International Conference on Data Mining and Knowledge Discovery, pp. 195212 (2005).