Content Tags

There are no tags.

Sequence-based Multi-lingual Low Resource Speech Recognition.

RSS Source
Siddharth Dalmia, Ramon Sanabria, Florian Metze, Alan W. Black

Techniques for multi-lingual and cross-lingual speech recognition can help inlow resource scenarios, to bootstrap systems and enable analysis of newlanguages and domains. End-to-end approaches, in particular sequence-basedtechniques, are attractive because of their simplicity and elegance. While itis possible to integrate traditional multi-lingual bottleneck featureextractors as front-ends, we show that end-to-end multi-lingual training ofsequence models is effective on context independent models trained usingConnectionist Temporal Classification (CTC) loss. We show that our modelimproves performance on Babel languages by over 6% absolute in terms ofword/phoneme error rate when compared to mono-lingual systems built in the samesetting for these languages. We also show that the trained model can be adaptedcross-lingually to an unseen language using just 25% of the target data. Weshow that training on multiple languages is important for very low resourcecross-lingual target scenarios, but not for multi-lingual testing scenarios.Here, it appears beneficial to include large well prepared datasets.

Stay in the loop.

Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.