Robust Speaker Identification System Based on Two-Stage Vector Quantization

doi:10.6180/jase.2008.11.4.05

Robust Speaker Identification System Based on Two-Stage Vector Quantization

Two-band analysis tree for a discrete wavelet transform.

Wan-Chen Chen^1,2, Ching-Tang Hsieh ² and Chih-Hsu Hsu³

¹Department of Electronic Engineering, St. John’s University, Taipei, Taiwan 251, R.O.C.
²Department of Electrical Engineering, Tamkang University, Tamsui, Taiwan 251, R.O.C.
³Department of Information Technology, Ching Kuo Institute of Management and Health, Keelung, Taiwan 203, R.O.C.

Received: July 8, 2007
Accepted: March 10, 2008
Publication Date: December 1, 2008

Download Citation: ||https://doi.org/10.6180/jase.2008.11.4.05

ABSTRACT

This paper presents an effective method for speaker identification system. Based on the wavelet transform, the input speech signal is decomposed into several frequency bands, and then the linear predictive cepstral coefficients (LPCC) of each band are calculated. Furthermore, the cepstral mean normalization technique is applied to all computed features in order to provide similar parameter statistics in all acoustic environments. In order to effectively utilize these multi-band speech features, we propose a multi-band 2-stage vector quantization (VQ) as the recognition model in which different 2-stage VQ classifiers are applied independently to each band and the errors of all 2-stage VQ classifiers are combined to yield total error and a global recognition decision. Finally, the KING speech database is used to evaluate the proposed method for text-independent speaker identification. The experimental results show that the proposed method gives better performance than other recognition models proposed previously in both clean and noisy environments.

Keywords: Speaker Identification, Wavelet Transform, Linear Predictive Cepstral Coefficient (LPCC), 2-Stage Vector Quantization

REFERENCES

[1] Atal, B., “Effectiveness of Linear Prediction Characteristics of the Speech Wave for Automatic Speaker Identification and Verification,” Journal of Acoustical Society America, Vol. 55, pp. 13041312 (1974).
[2] White, G. M. and Neely, R. B., “Speech Recognition Experiments with Linear Prediction, Bandpass Filtering, and Dynamic Programming,” IEEE Trans. on Acoustics, Speech, Signal Processing, Vol. 24, pp. 183 188 (1976).
[3] Vergin, R., O’Shaughnessy, D. and Farhat, A., “Generalized Mel Frequency Cepstral Coefficients for LargeVocabulary Speaker-Independent Continuous-Speech Recognition,” IEEE Trans. on Speech and Audio Processing, Vol. 7, pp. 525532 (1999).
[4] Furui, S., “Cepstral Analysis Technique for Automatic Speaker Verification,” IEEE Trans. on Acoustics, Speech, Signal Processing, Vol. 29, pp. 254272 (1981).
[5] Tishby, N. Z., “On the Application of Mixture AR Hidden Markov Models to Text Independent Speaker Recognition,” IEEE Trans. on Signal Processing, Vol. 39, pp. 563570 (1991).
[6] Yu, K., Mason, J. and Oglesby, J., “Speaker Recognition Using Hidden Markov Models, Dynamic Time Warping and Vector Quantisation,” IEE Proceedings – Vision, Image and Signal Processing, Vol. 142, pp. 313318 (1995).
[7] Reynolds, D. A. and Rose, R. C., “Robust TextIndependent Speaker Identification Using Gaussian Mixture Speaker Models,” IEEE Trans. on Speech and Audio Processing, Vol. 3, pp. 7283 (1995).
[8] Miyajima, C., Hattori, Y., Tokuda, K., Masuko, T., Kobayashi, T. and Kitamura, T., “Text-Independent Speaker Identification Using Gaussian Mixture Models Based on Multi-space Probability Distribution,” IEICE Trans. on Information and Systems, Vol. E84- D, pp. 847855 (2001).
[9] Alamo, C. M., Gil, F. J. C., Munilla, C. T. and Gomez, L. H., “Discriminative Training of GMM for Speaker Identification,” Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1996), Vol. 1, pp. 8992 (1996).
[10] Pellom, B. L. and Hansen, J. H. L., “An Efficient Scoring Algorithm for Gaussian Mixture Model Based Speaker Identification,” IEEE Signal Processing Letters, Vol. 5, pp. 281284 (1998).
[11] Soong, F. K., Rosenberg, A. E., Rabiner, L. R. and Juang, B. H., “A Vector Quantization Approach to Speaker Recognition,” Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1985), Vol. 10, pp. 387390 (1985).
[12] Burton, D. K., “Text-Dependent Speaker Verification Using Vector Quantization Source Coding,” IEEE Trans. on Acoustics, Speech, Signal Processing, Vol. 35, pp. 133143 (1987).
[13] Zhou, G., Mikhael, W. B., “Speaker Identification Based on Adaptive Discriminative Vector Quantisation,” IEE Proceedings – Vision, Image and Signal Processing, Vol. 153, pp. 754760 (2006).
[14] Juang, B. H. and Gray, A. H., “Multiple Stage Vector Quantization for Speech Coding,” Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1982), Vol. 7, pp. 597600 (1982).
[15] Hermansky, H., Tibrewala, S. and Pavel, M., “Toward ASR on Partially Corrupted Speech,” Proc. of Int. Conf. on Spoken Language Processing, pp. 462465 (1996).
[16] Hsieh, C. T. and Wang, Y. C., “A robust Speaker Identification System Based on Wavelet Transform,” IEICE Trans. on Information and Systems, Vol. E84-D, pp. 839846 (2001).
[17] Hsieh, C. T., Lai, E. and Wang, Y. C., “Robust Speaker Identification System Based on Wavelet Transform and Gaussian Mixture Model,” Journal of Information Science and Engineering, Vol. 19, pp. 267282 (2003).
[18] Hsieh, C. T., Lai, E. and Chen, W. C., “Robust Speaker Identification System Based on Multilayer EigenCodebook Vector Quantization,” IEICE Trans. on Information and Systems, Vol. E87-D, pp. 11851193 (2004).
[19] Chen, W. C., Hsieh, C. T. and Lai, E., “Multi-Band Approach to Robust Text-Independent Speaker Identification,” Journal of Computational Linguistics and Chinese Language Processing, Vol. 9, pp. 6376 (2004).
[20] Allen, J. B., “How do Humans Process and Recognize Speech?,” IEEE Trans. on Speech and Audio Processing, Vol. 2, pp. 567577 (1994).
[21] Bourlard, H. and Dupont, S., “A New ASR Approach Based on Independent Processing and Recombination of Partial Frequency Bands,” Proc. of Int. Conf. on Spoken Language Processing, Vol. 1, pp. 426429 (1996).
[22] Tibrewala, S. and Hermansky, H., “Sub-Band Based Recognition of Noisy Speech,” Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1997), Vol. 2, pp. 125511258 (1997).
[23] Mirghafori, N. and Morgan, N., “Combining Connectionist Multiband and Full-Band Probability Streams for Speech Recognition of Natural Numbers,” Proc. of Int. Conf. on Spoken Language Processing, Vol. 3, pp. 743747 (1998).
[24] Linde, Y., Buzo, A. and Gray, R. M., “An Algorithm for Vector Quantizer Design,” IEEE Trans. on Communications, Vol. 28, pp. 8495 (1980).
[25] Godfrey, J., Graff, D. and Martin, A., “Public Databases for Speaker Recognition and Verification,” Proc. of ESCA Workshop Automat. Speaker Recognition, Identification, Verification, pp. 3942 (1994).
[26] Wu, X., Luo, D., Chi, H., and H., S., “Biomimetics Speaker Identification Systems for Network Security Gatekeepers,” Proc. of Int. Joint Conf. on Neural Networks, Vol. 4, pp. 31893194 (2003).