Finally, the speech signals were represented as frames and used as the input for the CRNN architecture. At the feature extraction stage, it compares the Gammatone Cepstral Coefficient (GTCC) feature and Mel Frequency Cepstral Coefficient (MFCC) feature, as well as a combination of both. The proposed system exploits the advantages of both CNN and RNN architectures to construct the CRNN architecture. The proposed system uses a hybrid Convolutional Recurrent Neural Network (CRNN), which combines a Convolutional Neural Network (CNN) with a Recurrent Neural Network (RNN) network, for spoken language identification on seven languages, including Arabic, chosen from subsets of the Mozilla Common Voice (MCV) corpus. In this article, we propose a spoken language identification system that depends on the sequence of feature vectors. Following recent advancements in deep learning and artificial intelligence, spoken language identification applications are playing an increasingly significant role in our day-to-day lives, especially in the domain of multi-lingual speech recognition.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |