These days privacy online feels like an unattainable dream. Everything you do becomes data for companies, which sell that data to affiliates, which then sell your data back to you in the form of targeted ads and personalized recommendations. That is just how things are. But what if it didn’t have to be?
Last week, I met with a team from Canopy, a tech startup that’s created a software developer kit that it hopes will enable companies to create personalized experiences without compromising your privacy. As a proof of concept, earlier this week, the company launched its first app, Tonic.
The idea behind Tonic isn’t exactly new. It’s one of those curated reading experiences—you get shown a bunch of articles, you pick the ones you like, and the next day you get new material to read based on your preferences. The main difference is you don’t have to sign up for an account or enter in your personal data, like age, gender, email, phone number, or location. Instead, it pulls data in a way that’s intended to not betray your privacy while still allowing the app to make intelligent predictions about stories you may want to read.
Theoretically speaking, if Canopy were to license its software to, say, Spotify, it would mean that you’d still get a pretty accurate Discover Weekly playlist, but neither Canopy nor Spotify would know exactly what you were listening to or when, according to the company. That could have some significant implications if you were to apply that kind of privacy-protecting tech to location data, for example.
The key is something called differential privacy, a framework that has its basis in mathematics. It’s a way to share information about a group and its behaviors while protecting the privacy of individuals within that group by obscuring data that exposes your identity.
“Differential privacy is a framework that allows you to make tradeoffs between privacy and accuracy,” Bennett Cypher, a staff technologist at the Electronic Frontier Foundation, told me over the phone. More specifically, Cypher told me, the basic principle is you define an Epilson parameter (math!) that generates noise or confusion to obscure a data set. It’s like giving a ballpark estimate—you get a sense of something, but you don’t know the exact particulars. The higher the parameter, the less noise and more accurate your information. A lower parameter means more noise and greater privacy.
Before your eyes cross, a real-life example Cyphers gave me is the census. The government has a lot of aggregate data about its citizens—and it probably wants to share demographic information from that set without revealing anything about any one particular individual. Let’s say you live in a small census block with only one or two people. It wouldn’t take a genius to figure out personal information about you, given the right parameters. Differential privacy would be a way to summarize that data without putting any one individual at risk.
So, how does that translate to private yet personalized experiences online? Canopy’s head of product, Matthew Ogle, told Gizmodo the secret sauce is in your phone. Instead of creating a behavior model of each user on a server, as many apps do, Canopy does that locally on your phone. When the app does make a request of Canopy’s server for content, what it sends is an encrypted, differentially private version of your behavior. So instead of a model built on your individual preferences, you’re an indistinguishable part of an aggregate of users who like the same things you do.
For most of us, never needing to sign up for another service to reap the benefits of doing so sounds ideal. We do that now because the perks of a personally curated experience seem to outweigh the cost of giving up your privacy. It’s much easier to feel the benefits of an auto-generated playlist than vague privacy violations that you may not even know are happening. That said, it seems like a no-brainer to do this for everything. So why isn’t this more of a thing?
One reason is differential privacy hasn’t been around for that long. “It’s sort of new,” says Cyphers. “There’s not a lot of agreement on what a good parameter is—people are sort of making it up as they go. It’s important for companies to be upfront with what parameters they’re using.”
As for Canopy’s Tonic app, the stakes are low. Reading recommendations don’t carry the same risk as financial transactions or location data, though Canopy’s team did indicate that applying it to those type of data was a feasible long-term goal if things go well. Still, there are limitations as to how far differential privacy can go at the moment.
“One problem is in order to get that tradeoff between privacy and accuracy, for a lot of applications it doesn’t make sense,” Cyphers says. To get a lot of privacy, you have to add a lot of noise, so it becomes sort of useless. It only works in very specific applications.”
For starters, differential privacy isn’t like encryption, where you can just slap it onto varying technologies and call it a day. You can’t send a differentially private email. A differentially private photo would look like static. It works in Tonic’s case because the tech is being applied to the act of discovery.
“The privacy and accuracy tradeoff is real,” Canopy founder and CEO Brian Whitman said over email. He noted that while differential privacy isn’t well-suited to generalized machine learning tasks—think predicting something about a unique person’s behavior—because accuracy would take a significant hit. That said, when it comes to discovering likes and preferences, nothing about that has to be about the individual on the backend.
“The point is we’re not trying to pinpoint a single thing about a single person,” White said. “That is still hard with differential privacy and federated learning. We are understanding larger populations and doing a great job of it. We never should have built recommenders that understood people individually anyway.”
Basically, something like Tonic is a baby step in the right direction. Differential privacy has been used elsewhere—Apple, for instance, said it uses it in improving features like QuickType and Emoji suggestions, as well as some Safari features, and disclosed the Epsilon parameters used. (That said, there’s some disagreement as to how well Apple implemented the tech, leading back to the need for companies to be transparent about their parameters.) Still, even with differential privacy’s limitations, given the looming possibility of federal privacy legislation and discerning users, it wouldn’t be surprising if it starts popping up more frequently in the apps and services we all use—and that’s probably a good thing.