The Missing Voices

What was previously a specialist tool for assistive technology, text-to-speech (TTS) technology can now be found nearly everywhere. It comes as standard on smart speakers, phones, vehicles, computers, and continues to be one of the most important tools for people with print disabilities to access written content. Screen readers, reading software, and often accessible publication, all rely on synthetic voices that convert text into speech. It safe to say that many of us have become used to being able to utilize this technology.

Artificial intelligence (AI) has significantly improved the quality of these voices, giving them a more natural feel. For many widely spoken languages, AI generated TTS can sound so natural that it can be difficult to distinguish from recorded human speech. In many instances this makes for a more pleasurable experience, making listening to the voices easier and eliminating audible artifacts that would otherwise be a distraction. The ability to quickly generate high quality audio is promising to be a game changer for accessibility.

But this progress is not shared equally across languages, and a significant number of people around the world are being excluded from this technological revolution.

Languages outside of those most commonly spoken, and especially those used in smaller communities, remain severely under served. Some of these languages have limited synthetic voice support, often using voices that sound more robotic which are prone to inaccuracies. But many languages have no TTS support at all.

When we talk about missing voices in technology, it can be easy to focus on the technical challenge. But for the people who rely on accessible reading, the impact is deeply personal. What does it mean to have to study in a language that is not your first, and maybe not your second language?

A Snapshot of the Problem

Some of the leading AI driven TTS services support around 140 languages, which is impressive and a very welcome development, however there are approximately 7,100 languages in use globally. Studies indicate that 44% of these languages are used in smaller populations, and as a result, are classified as endangered, having fewer than 1,000 active users.

If we only look at the non-endangered languages, we still have approximately 4,000 languages, the vast majority of whom do not have access to any text to speech technology.

To help illustrate this topic we spoke to a few organizations around the world to capture the issues they face in their regions.

Africa

South Africa officially recognizes 12 languages, yet not all of them are supported equally in digital technology. Some languages have limited TTS options, while others have no support at all.

Libraries and accessible content providers do their best to support these languages, but in some cases the technology itself does not exist or does not work well enough to deliver anything meaningful.

Even when TTS voices are available, the quality can be disappointing. Voices may sound robotic or unnatural, and sometimes background noises can appear in the audio. Technical compatibility is also an issue. Some software struggles to process the linguistic structures of local languages, leading to errors or failures during production.

Another challenge is the lack of digital source material. When very little content exists in digital format, producing accessible books becomes even more difficult.

South America

A similar situation can be seen in Paraguay, where Guarani is one of the country’s official languages. Yet, when it comes to text-to-speech technology, its presence is minimal. A synthetic Guaraní voice does exist, but it remains at a very basic, robotic, and unnatural level and is not suitable for continuous reading. As a result, people with visual impairments often rely on Spanish voices, even when Guaraní is their primary language, creating an additional barrier. In education, this gap becomes even more visible: while Braille materials in Guaraní are available and used, audio resources are limited, often restricted to short excerpts rather than full books. Access to information frequently depends on teachers adapting content manually. Although access to technology itself is still limited in many parts of the country, the need is clear, because without a natural and reliable synthetic voice in Guaraní, digital accessibility remains incomplete.

Europe

Across northern Europe, another language community faces similar challenges. The Sámi languages are spoken across parts of Finland, Sweden, Norway and Russia. Several Sámi languages exist, including Northern Sámi, Inari Sámi, and Skolt Sámi. Among them, Northern Sámi is the most widely spoken. Yet even for this language, accessible content remains extremely limited.

Today, about 60 audiobooks exist in Northern Sámi. Thirty-nine of these were produced between 2010 and 2025 by Celia Library. The remaining titles come from Norway and Sweden. Production is challenging and remains slow. In recent years, only about two new talking books have been produced annually.

One reason is the lack of narrators. Only two Northern Sámi narrators are currently available in production studios.

Technology does not yet solve the problem. There are currently no widely usable text-to-speech voices for Sámi languages in accessible book production. Without TTS, every book must be recorded manually.

Metadata systems also create barriers. Sámi languages are not always handled consistently in cataloguing systems, making it difficult for readers to search and discover titles in their own language.

Why Are Some Voices Missing?

The absence of TTS voices is often attributed to the number of speakers, but the reality is more complex. Several factors contribute to this gap:

Market economics influence which languages receive investment. Technology companies prioritize languages with large global markets.
Training data is another barrier. AI voices require large amounts of digitized language data. Many local languages simply do not have enough digital material available.
AI training bias also plays a role. Many models are trained primarily on dominant languages.

When a language does not have a digital voice, the consequences go beyond technology. Students study in another language. Communities lose the ability to access information independently. Technology, which should remove barriers, sometimes reinforces them.

To close the digital voice gap, we need coordinated action, funding, collaboration with linguistics communities as well as disability organisations.

Bridging the Gap

Text-to-speech has transformed access to information for many people. However, when languages remain underrepresented, this access is not equally shared. The examples from South Africa, Finland, and Paraguay highlight how missing voices limit independent access to information and affect whole communities.

A language that cannot be heard through technology becomes a barrier to the digital space. Ensuring that every language has a voice is not only a technical goal, but a step towards more inclusive and equitable access.

For many languages technology is the barrier to access, but it also holds the potential to provide equal access to information. There is currently more work taking place on synthetic speech development, and specifically AI supported speech, than ever before. The technology is advancing rapidly, and lessons from commonly used languages have the potential to improve support for lesser used and historically underserved languages.

At DAISY we’re working with technology companies to help highlight and begin to address this issue. We also work closely with our partner organizations including WBU and ICEVI who recently issued a joint agreement, that the lack of access to high quality synthetic speech is a human rights issue, something that continues to be raised with technology companies.

Many thanks to the DAISY Members and Friends who contributed to this article.

The Missing Voices

News and Events Menu