A recent academic review of speech recognition failures contains a small but telling detail: when a man with dysarthria says “drink fresh water,” the software hears “available now.” This is not a typo. Not a typical mishearing. Simply put, the machine created something completely different that fit the patterns it had learned and sounded plausible. Millions of people experience this peculiar form of failure on a daily basis.
It’s easy to forget how new and limited voice-activated technology is because it has become so integrated into daily life. Whisper-based transcription tools, Google Assistant, Alexa, and Siri were all trained to listen to a relatively narrow segment of humanity. Speaking into phones or listening to audiobooks, these young, standardized English speakers are primarily American or British. These systems were raised in that acoustic environment. A person with cerebral palsy, a stutterer, someone recovering from a stroke, or an elderly person with a weakening voice—anyone whose voice exists outside of that world—runs into a wall that the rest of us are unaware of.
Over 3 million Americans stutter, and approximately 7.5 million struggle to communicate and be understood. These are not insignificant figures, but voice assistants continue to treat everything else as noise and fluent, “normal” speech as the default. Error rates for non-normative speech can be five to ten times higher than for typical speakers, according to researchers examining this disparity. According to reports, a popular model yielded an average word error rate of roughly 6.5 percent for standard speech but almost 62 percent for dysarthric speech. It’s not a rounding error. Exclusion is a feature of the architecture.
The way the failures manifest is what makes this especially unsettling. It’s not always a blank stare or a courteous “I didn’t catch that.” Researchers refer to this phenomenon as hallucination. Occasionally, the system creates a completely different sentence and presents it with complete confidence. A pause is filled with made-up words. A stutter is sometimes described as something more akin to mockery than misunderstanding. A machine that refuses to acknowledge uncertainty when it ought to has an almost unsettling quality.

A few smaller businesses have made an effort to bridge the gap. Voiceitt allows users to create a personal voice dictionary that the app gradually learns to translate. It was founded by a man whose grandmother suffered a stroke and lost most of her speech. By acting as a go-between for Alexa, Tecla enables individuals with speech impairments to tap pre-programmed commands rather than depending on the assistant to directly interpret their voice. These are practical, even transformative, solutions. However, they are workarounds applied to systems that were not initially designed with this population in mind; they are patches rather than fixes.
It’s difficult to ignore the pattern’s recurrence. Large tech firms scale convenience for the typical user remarkably well, but they scale inclusion for everyone else remarkably slowly. When voice tech truly helped their clients, therapists surveyed in a UK study reported real benefits, such as increased independence, confidence, and decreased reliance on caregivers. However, the same study discovered that almost 80% of speech-language pathologists had never even attempted to use these tools in a professional setting, frequently due to financial constraints or a lack of training before the technology itself.
Reading the research gives the impression that the industry views this as a data issue rather than a design philosophy issue. Of course, more varied training data would be beneficial. The deeper problem, however, is that “normal speech” was never a neutral baseline; rather, it was a decision made by engineers who prioritized the simplest scenario. It’s unclear if that decision will be reexamined or if millions of people will continue to adjust to machines that won’t change.
