Events Detection for Audio Based Surveillance by Variable-Sized Decision Windows Using Fuzzy Logic Control

Ing-Jr Ding

doi:10.6180/jase.2009.12.3.09

Events Detection for Audio Based Surveillance by Variable-Sized Decision Windows Using Fuzzy Logic Control

Electrical Engineering

Ing-Jr Ding This email address is being protected from spambots. You need JavaScript enabled to view it.¹

¹Department of Electrical Engineering, National Formosa University, Yunlin County, Taiwan 632, R.O.C

Received: July 20, 2007
Accepted: March 6, 2009
Publication Date: September 1, 2009

Download Citation: ||https://doi.org/10.6180/jase.2009.12.3.09

ABSTRACT

In contrast to the use of fixed-length decision window for analyzing the stream of audio frames seen in many audio event detection applications, a variable-sized decision window approach is proposed in this paper. The control of the window size is governed by a fuzzy logic controller (FLC) which estimates the difference between the likelihood of a targeted audio event and that of the normal acoustic background in order to adjust the window size. The FLC is designed to stretch the window while the monitored environment remains “aurally hot” for collecting more audio frames to ensure the reliability and correctness of the detection and to do the opposite if the context gets “aurally calm”. Such a situation-dependent behavior is essential to application where reliable and real-time response is the major concern, for which the fixed-length decision window may not suffice.

Keywords: Audio Event Detection, Decision Window, Fuzzy Logic Controller, Gaussian Mixture Model, Feature Extraction

REFERENCES

[1] Clavel, C., Ehrette, T. and Richard, G., “Event Detection for an Audio-Based Surveillance System,” Proceedings of IEEE International Conference on Multimedia and Expo, pp. 13061309 (2005).
[2] Harma, A., McKinney, M. F. and Skowronek, J., “Automatic Surveillance of the Acoustic Activity in Our Living Environment,” Proceedings of IEEE International Conference on Multimedia and Expo, pp. 634 637 (2005).
[3] Besacier, L., Dufaux, A., Ansorge, M. and Pellandini, F., “Automatic Sound Recognition Relying on Statistical Methods, with Application to Telesurveillance,” Proceedings of International Workshop on Intelligent Communication Technologies and Applications, with Emphasis on Mobile Communications, pp. 116120 (1999).
[4] Atrey, P. K., Maddage, N. C. and Kankanhalli, M. S., “Audio Based Event Detection for Multimedia Surveillance,” Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 813816 (2006).
[5] Evans, R. J., Brassington, E. L. and Stennett, C., “Video Motion Processing for Event Detection and Other Applications,” Proceedings of International Conference on Visual Information Engineering, pp. 9396 (2003).
[6] Albiol, A., Sandoval, C., Naranjo, V. and Mossi, J. M., “Robust Motion Detector for Video Surveillance Applications,” Proceedings of International Conference on Image Processing, Vol. 3, pp. II-379382 (2003).
[7] Amano, T., Hiura, S., Yamaguti, A. and Inokuchi, S., “Eigen Space Approach for a Pose Detection with Range Images,” Proceedings of International Conference on Pattern Recognition, pp. 622626 (1996).
[8] DuPont, E. M., Yu, H. and .Roberts, R. G., “Object Pose Detection in the Presence of Background Clutter and Occlusion,” Proceedings of the Thirty-Sixth Southeastern Symposium on System Theory, pp. 446450 (2004).
[9] Lu, L., Jiang, H. and Zhang, H. J., “A Robust Audio Classification and Segmentation Method,” Proceedings of the 9th ACM International Conference on Multimedia, pp. 203211 (2001).
[10] Lu, L., Li, S. Z. and Zhang, H. J., “Content-Based Audio Segmentation Using Support Vector Machines,” Proceedings of IEEE International Conference on Multimedia and Expo, pp. 956959 (2001).
[11] Li, S. Z., “Content-Based Audio Classification and Retrieval Using the Nearest Feature Line Method,” IEEE Transactions on Speech and Audio Processing, Vo1. 8, pp. 619625 (2000).
[12] Markel, J. D. and Gray, A. H., Linear Prediction of Speech, Springer-Verlag, New York (1976).
[13] Rabiner, L. and Juang, B. H., Fundamentals of Speech Recognition, Prentice Hall, New Jersey (1993).
[14] Dempster, A. P., Laird, N. M. and Rubin, D. B., “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, Vol. 39, pp. 138 (1977).
[15] Linde, Y., Buzo, A. and Gray, R. M., “An Algorithm for Vector Quantizer Design,” IEEE Transactions on Communications, Vol. 28, pp. 8495 (1980).
[16] Fraley, C. and Raftery, A. E., “How Many Clusters? Which Clustering Method? Answers via Model Based Cluster Analysis,” The Computer Journal, Vol. 41, pp. 578588 (1998).
[17] Yager, R. and Filev, D., Essentials of fuzzy modeling and control, Wiley, New York (1994).
[18] Takagi, T. and Sugeno, M., “Fuzzy Identification of Systems and Its Applications to Modeling and Control,” IEEE Transactions on System, Man, and Cybernetics, Vol. 15, pp. 116132 (1985).
[19] Yen, J., Langari, R. and Zadeh, L. A. (eds.), Industrial applications of fuzzy logic and intelligent systems, IEEE Press, New York (1995).