Why AI Needs to Be Able to Understand All the World's Languages

The benefits of mobile technology are not accessible to most of the world’s 700 million illiterate people

When we asked Aissatou, our new friend from a rural village in Guinea, West Africa, to add our phone numbers to her phone so we could stay in touch, she replied in Susu, “M’mou noma. M’mou kharankhi.” “I can’t, because I did not go to school.” Lacking a formal education, Aissatou does not read or write in French. But we believe Aissatou’s lack of schooling should not keep her from accessing basic services on her phone. The problem, as we see it, is that Aissatou’s phone does not understand her local language.

Computer systems should adapt to the ways people—all people—use language. West Africans have spoken their languages for thousands of years, creating rich oral history traditions that have served communities by bringing alive ancestral stories and historical perspectives and passing down knowledge and morals. Computers could easily support this oral tradition. While computers are typically designed for use with written languages, speech-based technology does exist. Speech technology, however, does not “speak” any of the 2,000 languages and dialects spoken by Africans. Apple’s Siri, Google Assistant, and Amazon’s Alexa collectively service zero African languages.

In fact, the benefits of mobile technology are not accessible to most of the 700 million illiterate people around the world who, beyond simple use cases such as answering a phone call, cannot access functionalities as simple as contact management or text messaging. Because illiteracy tends to correlate with lack of schooling and thus the inability to speak a common world language, speech technology is not available to those who need it the most. For them, speech recognition technology could help bridge the gap between illiteracy and access to valuable information and services from agricultural information to medical care.

Why aren’t speech technology products available in African and other local languages? Languages spoken by smaller populations are often casualties of commercial prioritization. Furthermore, groups with power over technological goods and services tend to speak the same few languages, making it easy to insufficiently consider those with different backgrounds. Speakers of languages such as those widely spoken in West Africa are grossly underrepresented in the research labs, companies and universities that have historically developed speech-recognition technologies. It is well known that digital technologies can have different consequences for people of different races. Technological systems can fail to provide the same quality of services for diverse users, treating some groups as if they do not exist.

Commercial prioritization, power and underrepresentation all exacerbate another critical challenge: lack of data. The development of speech recognition technology requires large annotated data sets. Languages spoken by illiterate people who would most benefit from voice recognition technology [ … ]

What do you think?

26.8k Points

Comments

0 0 vote
Article Rating
Avatar
Subscribe
Notify of
3 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
toast
28 days ago

But think like a marketer: If you incorporate more languages … you can use Siri and other ‘smart’ applications to steal more personal data worldwide. Seems profitable for the tech industry to me.

time lapse
28 days ago

I agree. Even if you hate to admit it, the global language is English.

ticker tape
28 days ago

Honestly, I get the author’s point … but how large of a smartphone market share is there for Susu.

Loading…

0
Avatar

Posted by teg

Earth-moving research charts 1B years of tectonic plate movement

Democracy under Siege