As smart speakers have become a ubiquitous part of our day-to-day lives, so too has it become clear these devices may not be as private as they appear at first blush—that they may be listening in even when they shouldn’t be, that recordings may contain sensitive information that we didn’t realize was being uploaded to the cloud, and that those recordings may be accessible by human workers.
In new study findings published this month as part of ongoing research into smart speakers, researchers with the Mon(IoT)r Research Group at Northeastern University and Imperial College London examined the conditions under which a smart speaker might be eavesdropping after being accidentally activated by a source other than users themselves, specifically by popular dialogue-heavy TV series streamable on Netflix, including Gilmore Girls, Grey’s Anatomy, The L Word, The Office, Greenleaf, Dear White People, Riverdale, Jane the Virgin, Friday Night Tykes, Big Bang Theory, The West Wing, and Narcos.
In their experiments, the researchers monitored activation cues such as on-device lights, self-reported cloud recordings, and network traffic to observe circumstances under which non-wake words resulted in an activation for a Google Home Mini (1st generation), an Apple HomePod (1st generation), Amazon Echo Dots (2nd and 3rd generation), and a Harman Kardon Invoke by Microsoft. (Wake words for the Echo Dots included “Alexa,” “Amazon,” “Echo,” and “computer,” while all other devices were considered with their typical wake words, including “hey,” “hi,” and “OK” followed by “Google” for the Home Mini.)
Excluding any instances during which a device was activated by its actual wake word, and using 125 hours of audio from a dozen Netflix series, the researchers found that the smart speakers were activated between 1.5 and 19 times in a day. They found that Apple’s HomePod and the Cortana device were most likely to experience unintended activations, though they noted that the non-wake words causing the activations were inconsistent. After the HomePod and Cortana, they found accidental activations occurred most frequently in the Echo Dot series 2, the Home Mini, and then in the Echo Dot series 3.
As you might guess, the researchers found especially dialogue-heavy shows caused more activations, with Gilmore Girls and The Office summoning the devices most. While they noted that almost every series resulted in multiple devices being activated, they noted that each device had a different show for which it was most frequently cued.
The Echo Dot (2nd generation) and Invoke smart speakers were the two devices found to record for the longest—between 20 and 43 seconds, according to the researchers. For activations that were at least five seconds or longer, the researchers identified patterns for each device. Cortana, for example, was most frequently summoned with words starting with “co,” including words like “consider” and “coming up.” The HomePod was activated by word combinations that sounded like “Hey Siri,” including “He clearly,” “I’m sorry,” “historians,” and “okay, yeah.”
The Echo Dots were summoned by words that sounded similar to “Alexa” and contained “k” sounds, and similarly rhyming or similar words for “Amazon,” and “Echo.” “Computer,” however, was triggered by words and phrases that included “cash transfers” and “got my GED.” The Home Mini was cued with words that rhyme with “hey” and were followed by something that sounded like a hard “G” or contained an “ol” sound but not in every case—the Assistant was summoned by phrases like “A-P girl,” “Okay, but not,” and “I don’t like the cold.”
The researchers said their continued work will examine variables such as whether an activation depends on a character’s accent or gender, whether the recordings are sent to the cloud or remain on the speaker, and whether all records are disclosed to users. While they said they found no evidence to support the idea that devices are constantly recording, that devices are so easily duped by various Netflix series is a good reminder that smart speakers cues are still far from a perfect science and could be picking up on sensitive background information—even if it’s just a few seconds’ worth of audio.
In a statement to Gizmodo, a Microsoft spokesperson told Gizmodo that customer privacy is “extremely important to us and we will evaluate the study and its findings, and continue to inform our products from a number of valuable sources. We are committed to remaining transparent about data collection and ensuring our customers have control over what data is collected and stored.”
An Amazon spokesperson told Gizmodo the company has “a team of world-class scientists and engineers” that works to improve wake word detection for Amazon’s line of smart speakers, as well as on prevention against what it described as “false wakes.”
“Customers talk to Alexa billions of times a month. In rare cases, Echo devices will wake up due to a word in background conversation sounding like Alexa or one of the other available wake words,” the spokesperson said. “By design, our wake word detection and speech recognition get better every day—as customers use their devices, we optimize performance. We are continually investing in our wake word detection technology, and in the last year our wake word performance has improved by 50 percent.”
In a statement to Gizmodo, Google said its “devices are designed to wait in standby mode until activated, and when in standby, the Assistant won’t send what you are saying to Google or anyone else. By default, we don’t retain your audio recordings, and you can change your settings at any time.”
Apple did not immediately return a request for comment.
Whether it’s a couple’s private conversation being shipped to an acquaintance without their permission, human contractors have access to potentially sensitive data, or the reality that smart devices are just as susceptible to attacks as other IoT devices in your home or office, there are many, many reasons to be wary of smart speakers and the data they’re collecting—with your permission or without it—by being in a kind of “always on” state.
Is it shocking that these devices are summoned by overhearing words or phrases similar to their wake words? Of course not. But if nothing else, it’s a fresh reminder that we should be operating around these devices as if they’re always listening—despite whatever their manufacturers would like us to believe.
Added comment from Amazon and Google.