Content Tags

There are no tags.

ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus

Authors

Tolulope Ogunremi, Kola Tubosun, Anuoluwapo Aremu, Iroro Orife, David Ifeoluwa Adelani

We introduce the ÌròyìnSpeech corpus -- a new dataset influenced by a desire to increase the amount of high quality, freely available, contemporary Yorùbá speech. We release a multi-purpose dataset that can be used for both TTS and ASR tasks. We curated text sentences from the news and creative writing domains under an open license i.e., CC-BY-4.0 and had multiple speakers record each sentence. We provide 5000 of our utterances to the Common Voice platform to crowdsource transcriptions online. The dataset has 38.5 hours of data in total, recorded by 80 volunteers.

Download PDF

More news

Continue reading and listening

Stay in the loop.

Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.