Menu
Blog moved to https://alihazrat.medium.com
Abstract Learning representation from audio data has shown advantages over the hand-crafted features such as Mel Frequency Cepstral Coefficients (MFCC) in many audio applications. In most of the representation learning approaches, the connectionist systems have been used to learn and extract latent features from the fixed length data. In this paper, we propose an approach to combine the learned features and the MFCC features for speaker recognition task, which can be applied to audio scripts of different length. In particular, we study the use of features from different levels of Deep Belief Network for quantizing the audio data into vectors of audio-word counts. These vectors represent the audio scripts of different length that make them easier to train a classifier. We show in the experiment that the audio-word count vectors generated from mixture of DBN features at different layers give better performance than the MFCC features. We also can achieve further improvement by combining the audio-word count vector and the MFCC features. Keywords: Deep Belief Networks, Deep Learning, Mel-Frequency Cepstral Coefficients PDF Online: Final version available at link.springer.com Cite as; Ali, H., Tran, S.N., Benetos, E. et al. Neural Comput & Applic (2016). doi:10.1007/s00521-016-2501-7 Disclaimer: Copyrights with publisher. Use allowed for academic, non-commercial purposes only.
0 Comments
|
Hazrat Ali
Hazrat Ali is Assistant Professor at Department of Electrical Engineering, COMSATS University Islamabad, Abbottabad Campus, Abbottabad Pakistan. Archives
October 2020
Categories |