There’s been a number of incidents where Amazon’s Alexa digital assistant has done things like misinterpret something it overheard and started sending random people recordings of private conversations, or audio files kept by the company wound up in the wrong hands. But a real-life stranger could potentially be listening to anything you tell Alexa by design, per a report in Bloomberg on Wednesday.
According to Bloomberg’s report, Amazon employs “thousands” of people across the world tasked with improving Alexa’s voice-recognition features. This team has access to voice recordings from real customers using Alexa-powered devices in their homes and workplaces (only Echo speakers are directly mentioned in the report, though Alexa also runs on mobile phones and numerous third-party devices). Those recordings are “transcribed, annotated and then fed back into the software,” Bloomberg wrote, as part of an effort to continue improving Alexa’s ability to recognize speech without human intervention.
The process is necessary because Alexa has limits to its ability to train itself, especially when it comes to garbled phrasing, accents, slang, regional words, other languages, and the like. Last year, Wired reported that “active learning” techniques in which the system identifies areas where it could improve via human assistance had “helped substantially cut down on Alexa’s error rates.” Wired wrote that adding in support for “transfer learning,” where Alexa tries to apply previously learned skills to new ones, has helped developers “ cut down on the grunt work they’d otherwise face.”
Newer is “self learning,” in which Alexa tries to pick up on context clues to understand commands which aren’t issued in a hyper-specific way (i.e., “Alexa, play 102.5 FM The Bone” vs “Alexa, play The Bone”). According to Wired, Amazon plans to eventually have Alexa recognize emotions of users, which critics have suggested could lead to manipulative marketing tactics. In an article in Scientific American last month, Amazon director of applied science Ruhi Sarikaya argued that such massive amounts of data will soon need analysis that voice recognition systems will have to switch from a “supervised” learning model “toward semi-supervised, weakly supervised and unsupervised learning. Our systems need to learn how to improve themselves.”
Bloomberg interviewed seven separate sources about the program, some of whom said Amazon’s workers are expected to analyze approximately 1,000 audio clips per nine-hour shift. Most of the time the work is “mundane,” Bloomberg wrote:
One worker in Boston said he mined accumulated voice data for specific utterances such as “Taylor Swift” and annotated them to indicate the searcher meant the musical artist. Occasionally the listeners pick up things Echo owners likely would rather stay private: a woman singing badly off key in the shower, say, or a child screaming for help. The teams use internal chat rooms to share files when they need help parsing a muddled word—or come across an amusing recording.
However, on other occasions workers have heard what they thought were crimes, including what they believed to be a sexual assault. Amazon told workers in Romania that it is not the company’s job to intervene, Bloomberg wrote. Others told the news agency that each auditor may encounter as many as 100 recordings a day in which Alexa does not appear to have been deliberately activated by a user with a wake word or command (such as pressing a button).
Amazon characterized the number of recordings that actually are analyzed by humans as “an extremely small sample” in a statement to Bloomberg, adding that it was solely for the purpose of “[improving] the customer experience.” It also characterized the process as low-risk:
We have strict technical and operational safeguards, and have a zero tolerance policy for the abuse of our system. Employees do not have direct access to information that can identify the person or account as part of this workflow. All information is treated with high confidentiality and we use multi-factor authentication to restrict access, service encryption and audits of our control environment to protect it.
However, Bloomberg noted that a screenshot provided by a reviewer “shows that the recordings sent to the Alexa auditors don’t provide a user’s full name and address but are associated with an account number, as well as the user’s first name and the device’s serial number.”
According to Bloomberg, an Apple white paper says its Siri voice assistant only enlists humans to analyze recordings that “lack personally identifiable information and are stored for six months tied to a random identifier,” though the recordings may later be stripped of random IDs for long-term storage. Google’s auditors can only access audio that has been distorted.