New Speech-Enhancing Software Ensures Loudspeaker Announcements Are Always Understandable

Illustration for article titled New Speech-Enhancing Software Ensures Loudspeaker Announcements Are Always Understandable

When subway, train stations, and airports get crammed full of people, it’s impossible to hear loudspeaker announcements over all the noise. So researchers at the Fraunhofer Institute developed a new system that listens for when a venue gets loud, and automatically adjusts announcements so they’re always audible.


The new ADAPT DRC software, created by the Project Group Hearing, Speech, and Audio Technology at the Fraunhofer Institute for Digital Media Technology IDMT, uses microphones strategically placed around a venue to constantly monitor the ambient noise. When the din gets especially loud, the system doesn’t just boost the volume of loudspeaker announcements because while that can make announcements easier to hear over the noise, it doesn’t necessarily make them easier to understand.

Speakers tend to distort what they’re broadcasting when pushed too loud, so the ADAPT DRC software goes one step further by enhancing what’s being said by boosting specific pitches and frequencies in speech that are easily muffled or misheard with a lot of surrounding noise. For example, high-frequency consonant sounds like “P”, “T”, and “K” are often spoken quickly, but are key to understanding what’s being said.

The ADAPT DRC analyzes, detects, and boosts those frequencies specifically, so even with the roar of a subway train rolling into a station, it’s still easy to discern what a station attendant is barking over the loudspeaker. The new system hasn’t been implemented anywhere yet, but the Fraunhofer Institute says it’s ready to go, and only requires the installation of a few microphones in a venue for it to be effective.

Contact the author at



I don’t understand why some public transit systems insist on using computer-generated voices, or unnecessarily fragmenting together announcements even from real voices.

For example, the light rail here in Phoenix uses a public address system at the platform that sounds like a woman’s voice (hard to tell if its actually computer generated, but she sounds a lot like Siri), but the announcements obnoxiously go:

“The NEXT —- EAST —— bound train —- will be —- ARRIVING —- in —- FIVE —- minutes.”

Thing is, there’s only two directions on our single-line system, East and West, they only ever announce “arriving”, and they only ever use 5 min and 2 min warnings. So there’s only four different possible announcements! Why can’t they get someone to actually record the full sentence, “The next west bound train will be arriving in two minutes” as a complete, real sentence?

For that matter, even in complex systems where there might be dozens or even hundreds of possible standard announcements, why can’t a professional voice actor get all those on tape? And sure, if something really is unique, use a computer generated voice. But when a standardized phrase gets repeated hundreds of times a day, it would be a lot more pleasant to listen to an actual recording of someone saying it.