Although Africa is home to a huge proportion of the world's languages – well over a quarter according to some estimates - many are missing when it comes to the development of artificial intelligence (AI).
This is both an issue of a lack of investment and readily available data. Most AI tools, such as ChatGPT, used today are trained on English as well as other European and Chinese languages. These have vast quantities of online text to draw from.
But as many African languages are mostly spoken rather than written down, there is a lack of text to train AI on to make it useful for speakers of those languages. For millions across the continent this means being left out.
Researchers who have been trying to address this issue have recently released what is thought to be the largest known dataset of African languages. We think in our own languages, dream in them and interpret the world through them. If technology doesn't reflect that, a whole group risks being left behind, says Prof Vukosi Marivate from the University of Pretoria.
The African Next Voices project was born out of the need to create AI-ready datasets in 18 African languages, initiating a movement towards inclusivity in AI. Despite only representing a small fraction of Africa's languages, this initiative recorded 9,000 hours of speech from diverse demographics to ensure a holistic representation.
Farmer Kelebogile Mosime, who utilizes the AI-Farmer app, highlights the disparity in access to technology as she gets valuable assistance in her native Setswana. This underscores that language should not be a barrier to essential services.
Lelapa AI, a South African company, is also working on AI tools in African languages, emphasizing that language is not merely a business issue but one of access to culture, history, and imagination.
As initiatives like these expand, they pave the way towards a future where technology serves as a bridge to understanding, identity, and opportunity for all Africans.