The African Languages Lab: A Collaborative Approach to Advancing Low-Resource African NLP

1UC Los Angeles, 2Georgia Institute of Technology, 3University of Wisconsin - Madison, 4University of Cape Coast, 5Carleton University, 6Stetson University, 7Northwestern University in Qatar, 8Cornell University, 9Soka University of America, 10Columbia University
The 64th Annual Meeting of the Association for Computational Linguistics

Abstract

Despite representing nearly one-third of the world's languages, African languages remain critically underserved by modern NLP technologies, with 88% classified as severely underrepresented or completely ignored in computational linguistics. We present the African Languages Lab (All Lab), a comprehensive research initiative that addresses this technological gap through systematic data collection, model development, and capacity building. Our contributions include: (1) a quality-controlled data collection pipeline, yielding the largest validated African multi-modal speech and text dataset spanning 40 languages with 19 billion tokens of monolingual text and 12,628 hours of aligned speech data; (2) extensive experimental validation demonstrating significant performance gains (+23.69 ChrF++, +15.34 BLEU). Our work demonstrates that systematic collaboration and local participation can bridge one of AI's most urgent linguistic gaps.

BibTeX

@misc{issaka2025africanlanguageslabcollaborative,
      title={The African Languages Lab: A Collaborative Approach to Advancing Low-Resource African NLP}, 
      author={Sheriff Issaka and Keyi Wang and Yinka Ajibola and Oluwatumininu Samuel-Ipaye and Zhaoyi Zhang and Nicte Aguillon Jimenez and Evans Kofi Agyei and Abraham Lin and Rohan Ramachandran and Sadick Abdul Mumin and Faith Nchifor and Mohammed Shuraim and Lieqi Liu and Erick Rosas Gonzalez and Sylvester Kpei and Jemimah Osei and Carlene Ajeneza and Persis Boateng and Prisca Adwoa Dufie Yeboah and Saadia Gabriel},
      year={2025},
      eprint={2510.05644},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.05644}, 
}