Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition.
The performance of automatic speech recognition systems degrades withincreasing mismatch between the training and testing scenarios. Differences inspeaker accents are a significant source of such mismatch. The traditionalapproach to deal with multiple accents involves pooling data from severalaccents during training and building a single model in multi-task fashion,where tasks correspond to individual accents. In this paper, we explore analternate model where we jointly learn an accent classifier and a multi-taskacoustic model. Experiments on the American English Wall Street Journal andBritish English Cambridge corpora demonstrate that our joint model outperformsthe strong multi-task acoustic model baseline. We obtain a 5.94% relativeimprovement in word error rate on British English, and 9.47% relativeimprovement on American English. This illustrates that jointly modeling withaccent information improves acoustic model performance.
Stay in the loop.
Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.