Advertisement

The spreadsheet also appears to mention a number of programs being run internally at Uber for inscrutable purposes. (If you know what any of them are, get in touch!) Among the data fields are 10 different ones whose names contain “greyball,” matching the name of a program that the New York Times revealed this March was used to “deceive authorities worldwide.” Once an account has been given a “greyball” tag, it’s used, as Uber chief security officer Joe Sullivan recently stated, “to hide the standard city app view for individual riders, enabling Uber to show that same rider a different version” of the app.

Uber says that “greyballing” can be used for a variety of innocuous purposes, such as delivering marketing to specific users, but in some places, such as Portland, Oregon for a brief period in 2014, Uber “greyballed” the accounts of city officials so that they wouldn’t be able to catch UberX drivers who were breaking local laws by participating in the service. The Justice Department is now investigating that use.

Advertisement

“The origins [for Greyball] were anti-abuse but other teams have found value in it,” said Uber spokesperson Melanie Ensign.

Other apparent code names that are used as some of the over 500 different possible tags on users’ accounts include “Guardian,” “Sentinel score,” and “Honeypot.” Uber declined to explain the nature of specific tags, but Ensign said the Guardian project “is used to detect spoofing, like what’s described in this Bloomberg story from 2015.” The story details Uber’s challenges in China, where drivers were creating fake rides in order to scam the company.

Advertisement

Uber’s lawyers soon got the document sealed, arguing that it contained “confidential, proprietary, and private information...the very existence, content, and form of which are of extreme competitive sensitivity to defendant in that they demonstrate what data [Uber] considers important enough to capture.” They added that it “references the confidential and proprietary code names for Uber’s internally-developed software, databases, and systems.”

The spreadsheet certainly affirms that Uber knows its way around private information. It’s a vivid reminder of the extreme asymmetry between the users—who are simply interested in being able to hail a ride from Point A to Point B—and the machines that are tracking them. Uber’s automated systems gather small and seemingly insignificant details persistently over time, material that would otherwise be forgotten or bore a human surveillant to death.

Advertisement

Asked about the exhibit, Uber security spokeswoman Melanie Ensign explained that it’s “a catalogue of signals used by our machine learning systems to detect potentially fraudulent behavior or compromised accounts.” Despite the apparent size of the database, Ensign described the material as being based on a small set of things that are “outlined in our terms of service.”

“All of these signals are derivatives of IP address, payment info, device info, location, email, phone number, and account history,” Ensign said.

Advertisement

What’s staggering is that Uber can do so much with just those seven pieces of information.

For example, users give Uber access to their location and payment information; Uber then slices and dices that information in myriad ways. The company holds files on the GPS points for the trips you most frequently take; how much you’ve paid for a ride; how you’ve paid for a ride; how much you’ve paid over the past week; when you last canceled a trip; how many times you’ve cancelled in the last five minutes, 10 minutes, 30 minutes, and 300 minutes; how many times you’ve changed your credit card; what email address you signed up with; whether you’ve ever changed your email address.

Advertisement

And some of the tags appear to pass judgment on a given Uber user, such as the nefarious-sounding tags “suspected_clique_rider” and “potential_rider_driver_collusion.”

A key goal of all this surveillance is to identify and react to abnormal users: a fraudster, an abuser—or, as the Greyball scandal revealed, a government regulator trying to observe how Uber works. Where Uber has run into trouble in the past is when it sees as “abnormal users” those who stand in its way, even if they have good reasons.

Advertisement

In addition to the code names in the document—Guardian, Sentinel, and Honeypot— there are fields called “in_fraud_geofence” and “in_fraud_geofence_pickup.” Geofencing is a technique to digitally rope off a given area. Ensign says these tags would be used to, for example, flag users who are attempting to misuse a promo code. If there was a promo code to take an Uber to a sporting event, this could help detect someone trying to use the same code for a different purpose, explained Ensign.

But it evokes two reports in the New York Times about Uber and geofencing, one in March that said Uber tracked whether accounts were being accessed in government buildings (indicating the user might be part of a government agency trying to crack down on Uber) and another story in April that Uber geofenced Apple’s headquarters so that the app performed differently for Apple employees to keep them from discovering that Uber was “fingerprinting” iPhones.

Advertisement

That fingerprinting allowed Uber to keep track of users even if they erased the content of their phones, but it was a violation of Apple’s privacy rules for app makers. Unfortunately, Uber couldn’t geofence the home address of every Apple employee and people working from outside Cupertino discovered fingerprinting and the geofencing, leading Apple CEO Tim Cook to personally reprimand Uber CEO Travis Kalanick in 2015.

The table offers insight into how Uber’s tagging system can be used beyond fraud prevention, to present different versions of the app to different users in different places.

Advertisement

We asked Rob Graham, a security consultant with Errata Security who often works with large databases, to review the document and to speculate as to why Uber was so concerned about its public exposure.

“I’m sure it’ll help Lyft a lot. They’ll have the context to understand these fields,” he said by email. “Likewise, it’ll help their adversaries, the Uber-haters in the world (I’m an Uber-lover), who will be able to use these fields to figure out exactly how that notorious ‘Greyballing’ works.”

Advertisement

When asked whether it had seen or benefited from the document, a Lyft spokesperson declined to comment.

Uber is far from alone among technology giants in using machine learning systems to attempt to profile its users at a granular level to find the activity and users that stick out as abnormal. But Uber has a history of misusing its systems of surveillance. Years ago, it used its rider-tracking system “God View” as a party trick, and later used it to casually track a journalist who regularly reported on the company. It used the anti-abuse tool Greyball to subvert government regulators. It’s currently being sued by drivers over a program called Hell that tracked their movements through a hack of the Lyft app to find out which drivers were working for both companies.

Advertisement

The code-named programs and hundred of tags in the table proffered by Spangenberg suggest there could be other, still unknown ways that Uber is aggressively tapping its data library. If anyone familiar with these tags can shed light on unexpected ways they’re being used, or has any concern about them, send us a note.

This story was produced by Gizmodo Media Group’s Special Projects Desk.