Automatic Voice and Speech Recognition System for the German Language with Deep Learning Methods

Authors

  • Cigdem Bakir

DOI:

https://doi.org/10.18100/ijamec.280579

Keywords:

Boltzmann Machines

Abstract

In our age, technological developments are accompanied by certain problems associated with them. Security takes the first place amongst such kind of problems. In particular, such biometric systems as authentication constitute the significant fraction of the security matters. This is because sound recordings having connection with the various crimes are required to be analyzed for forensic purposes. Authentication systems necessitate transmission, design and classification of biometric data in a secure manner. In this study, analysis of German language employed in the economy, industry and trade in a wide spread manner, has been performed. In the same vein, the aim was to actualize automatic voice and speech recognition system using Mel Frequency Cepstral Coefficients (MFCC), MelFrequency Discrete Wavelet Coefficients (MFDWC) and Linear. Prediction Cepstral Coefficient (LPCC) taking German sound forms and properties into consideration. Approximately 2658 German voice samples of words and clauses with differing lengths have been collected from 50 males and 50 females. Features of these voice samples have been obtained using wavelet transform. Feature vectors of the voice samples obtained have been trained with such methods as Boltzmann Machines and Deep Belief Networks. In the test phase, owner of a given voice sample has been identified taking the trained voice samples into consideration. Results and performances of the algorithms employed in the study for classification have been also demonstrated in a comparative manner.

Downloads

Download data is not yet available.

References

Douglas, Reynold , Walter, Andrews and Joseph, Campbell etc., “The SuperSID Project: Exploiting High-Level Information for High-Accuracy Speaker Recognition”, In.Proc. ICASSP, Hong Kong, p.784-787, 2003.

Douglas, Reynolds , Thomas, Quatieri and Robert, Dunn, “Speaker Vrification using Adapted Gaussian Mixture Models”, Digital Signal Processing 10, p.19-41, 2000.

Edmondo, Trentin and Marko, Gori, “A survey of hybrid ANN/HMM models for automatic speech recognition”, Elsevier Neurocomputing 37, p.91-126, 2001.

Keiichi, Tokuda , Heiga, Zen and Alan, Black, “An HMMBased Speech Synthesis System Applied to English”, Proc.of 2002 IEEE SSW, p.227-230, 2012.

Lihang, Li, Dongqing, Chen and Sarang, Lakare etc, “Image segmentation approach to extract colon lümen through colonic material taggng and hidden markov random field model for virtual colonoskopy”, Medical Imaging, 2002.

Lindasalwa, Muda and Mumtaj, Began, “Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques”, Journal Computing, vol.2, issue 3,p.138-143, ISBN 2151- 9617, 2010.

M., Fahid M. and M.A, Robust Voice conversion systems using MFDWC”, 2008 International Symposium on Telecommunications, p.778-781, 2008.

Quan, Jie-Fu, Fan Gang, Zeng F and Robert, Shannon etc., “Importance of tonal envelope cues in Chinese speech recognition”, The Journal of the Acoustical Societct of America, vol.104, no.1, p.505-510, 1998.

Seok, Oh and Ching, Suen, “A class-modular feed forward neural network for handwriting recognition”, Pattern Recognition, vol.35, issue 1, p.229-244, 2002.

Wouter, Gevaert , Georgi, Tsenov and Valeri, Mladenov, “Neural networks used for speech recognition”, Journal of Automatic Control,vol.20,p.1-7,2010.

Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi, “ Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques”, Jornal of Computing, vol.2, issue 3, p.138-143, ISSN 2151-9617, 2010.

M.Zbancioc and M.Costin, “Using Neural Networks and LPCC to Improve Speech Recognition”, International Symposium on Signals, Circuits and Systems, vol.2, pp.445-448, 2003.

O.Eray, “Destek Vektmr Makineleri ile Ses Tanıma Uygulaması”, Pamukkale Üniversitesi, 2008.

A.Mohamed, T.Sainath and G.Dahl, “Deep Belief Networks Using Disciminative Features for Phone Recognition, IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, 2011.

,G. E. Hinton., ,S. Osindero, and ,Y. Teh, “ AFast Learning Algorithm For Deep Belief Nets”, Neural Computation,vol. 18, 2006.

A.Mohamed and D.Deng. “Investigation of full-sequence training of Deep Belief Networks”, Interspeech2010, pp.2846-2849, 2010.

Downloads

Published

01-12-2016

Issue

Section

Research Articles

How to Cite

[1]
“Automatic Voice and Speech Recognition System for the German Language with Deep Learning Methods”, J. Appl. Methods Electron. Comput., pp. 399–403, Dec. 2016, doi: 10.18100/ijamec.280579.

Similar Articles

11-19 of 19

You may also start an advanced similarity search for this article.