You might think of Uber as a ride-hailing company or a lawsuit-ridden self-driving car developer, but at its core, Uber is a big data company. It has to constantly crunch location coordinates, traffic data, payment information, and tax rates—and putting all that data in Uber’s hands sometimes makes users nervous.
But now Uber is debuting a differential privacy tool that it will use to analyze its vast data stores. Differential privacy allows for analysis of large data sets without revealing the identity of any individual included in the data, and is used by companies like Apple and Google to gain insight from user data without compromising privacy. Uber’s new tool will let its data analysts know the likely privacy implications of any queries they make on Uber’s data before they make them.
“Effectively, it’s a way to take a look at queries and decide how sensitive the resulting data is from that query without having to run the query,” Uber’s manager of privacy engineering Menotti Minutillo told Gizmodo.
Here’s how it’ll work: Imagine Uber data analysts want to figure out what the average distance is for a ride in San Francisco. They’ll need to query large swathes of data about rides in the city, but pulling that thread could expose lots of information about individual riders and drivers. Differential privacy scrambles the data and injects noise, making it impossible to trace trip information back to a particular user.
But some queries are more sensitive than others, and therefore require more noise. “The average trip distance in a smaller city with far fewer trips is more influenced by a single trip and may require more noise to provide the same degree of privacy. Differential privacy defines the precise amount of noise required given the sensitivity,” Katie Tezapsidis, an Uber software engineer on the privacy team, explained in a blog post announcing the change.
In order to calculate that sensitivity, Uber partnered with a team of security researchers from the University of California, Berkeley. The researchers worked for over a year to come up with the calculation technique, nicknamed Elastic Sensitivity, which Uber is releasing today as an open-source tool.
Elastic Sensitivity will make it possible for analysts at Uber—and elsewhere—to quickly adapt differential privacy standards to a variety of queries. Previously, an analyst would have queried a database and then tried to weed out sensitive or unnecessary data after the fact. Now, data will come out clean.
“Our team is very, very interested in providing the tools and platforms so people can do their job in a privacy-appropriate way,” Minutillo said. The tool will be able to make suggestions about how much noise should be added in order to preserve privacy, or whether the query should be run at all. “In cases where you have a legitimate use—you need to retrieve data to do analysis—this is an additional layer of protection,” Minutillo added. “We can feel comfortable that the analyst can still get results that are correct, and reduce the risk of singling out any individual that’s in that set.”