The nation’s largest financial data broker, Yodlee, holds extensive and supposedly anonymized banking and credit card transaction histories on millions of Americans. Internal documents obtained by Motherboard, however, appear to indicate that Yodlee clients could potentially de-anonymize those records by simply downloading a giant text file and poking around in it for a while.
According to Motherboard, the 2019 document explains how Yodlee obtains transaction data from partners like banks and credit card companies and what data is collected. That includes a unique identifier associated with the bank or credit card holder, amounts of transactions, dates of sale, which business the transaction was processed at, and bits of metadata, Motherboard wrote; it also includes data relating to purchases involving multiple retailers, such as a restaurant order through a delivery app. The document states that Yodlee is giving clients access to this data in the form of a large text file rather than a Yodlee-run interface.
The document also shows how Yodlee performs “data cleaning” on that text file, which means obfuscating patterns like “account numbers, phone numbers, and SSNs by redacting them with the letters “XXX,” Motherboard wrote. It also scrubs some payroll and financial transfer data, as well as the names of the banking and credit card companies involved.
But this process leaves the unique identifiers, which are shared across each entry associated with a particular account, intact. Research has repeatedly shown that taking supposedly anonymized data and reverse-engineering it to identify individuals within can be a trivial undertaking, even when no information is shared across records.
Experts told Motherboard that anyone with malicious intent would just need to verify a purchase was made by a specific individual and they might gain access to all other transactions using the same identifier.
With location and time data on just three to four purchases, an “attacker can unmask the person with a very high probability,” Rutgers University associate professor Vivek Singh told the site. “With this unmasking, the attacker would have access to all the other transactions made by that individual.”
Imperial College of London assistant professor Yves-Alexandre de Montjoye, who worked with Singh on a 2015 study that identified shoppers from metadata, wrote to Motherboard this process appeared to leave the data only “pseudonymized” and that “someone with access to the dataset and some information about you, e.g. shops you’ve been buying from and when, might be able to identify you.”
Yodlee and its owner, Envestnet, is facing serious heat from Congress. Democratic Senators Ron Wyden and Sherrod Brown, as well as Representative Anna Eshoo, recently sent a letter to the Federal Trade Commission asking it to investigate whether the sale of this kind of financial data violates federal law.
“Envestnet claims that consumers’ privacy is protected because it anonymizes their personal financial data,” the congresspeople wrote. “But for years researchers have been able to re-identify the individuals to whom the purportedly anonymized data belongs with just three or four pieces of information.”
“Consumers generally have no idea of the risks to their privacy that Envestnet is imposing on them,” they added, telling the FTC that their concerns include that Envestnet doesn’t appear to enforce any policies requiring banks and credit card companies inform customers this is happening. (As Motherboard noted, Yodlee admitted it doesn’t audit client use of data in Securities and Exchange Commission filings in 2015.
In a lengthy statement to Motherboard, Yodlee defended its practices, said it complied with the all applicable laws, and wrote it “imposes technical, administrative, and contractual measures to protect consumers’ identities, such as prohibiting analytics and insights users from attempting to re-identify any consumer from the data.” It also cited “leading privacy experts” as agreeing “Envestnet | Yodlee data analytics meet or exceed leading industry standards of de-identification processing.”