Alp Öktem, Milena Haykowska

CLEAR Global

Topic:  Enabling Speech Technology for All with TWB Voice

Nearly half of the world’s 3.7 billion people do not speak a major language, leaving most speech technologies out of reach. Building such tools requires annotated voice data, which is scarce for the majority of languages. To address this, CLEAR Global launched TWB Voice in December 2024, a community-based platform connected to Translators without Borders’ network of more than 100,000 linguists. It allows contributors to record, validate, and transcribe speech, supporting both speech recognition and text-to-speech systems. During its pilot in northeast Nigeria, the project collected the first open humanitarian-domain voice datasets in Hausa, Kanuri, and Shuwa Arabic, totaling more than 135 hours of recordings, and achieved strong accuracy with fine-tuned Whisper and Wav2Vec models. The experience showed the need for ongoing engagement, local incentives such as data bundles, and attention to gender gaps in digital access. The platform now enables rapid new projects and is expanding to 10 more underresourced languages, aiming to empower communities and partners to build voice datasets that make speech technology more inclusive worldwide.