The Future Is Here
We may earn a commission from links on this page

‘Black Data’ Is the Reason Why Smart Policing Is Still Incredibly Biased

We may earn a commission from links on this page.

We already know we’re being watched. We may even know who’s watching us. What we don’t know is what they’ll do now that they’ve seen us.

Andrew Guthrie Ferguson’s “The Rise of Big Data Policing: Surveillance, Race, and the Future of Law Enforcement,” out on October 9 via NYU Press, is a dense, academic, and illuminating series of essays that sheds light on police departments employing big-data tactics, from using disease modeling to predict gun violence in Chicago, to face recognition, body cameras, and social media data mining. Ferguson describes how data’s obscurant nature is key to how big data policing functions. He calls it “black data.”

Ferguson has an incredible command of the many subjects that fall under the “big data” umbrella, and his writing is at its best when social, cultural, and technological dynamics coalesce into one story. The book is particularly strong when Ferguson takes on how classism and racism shape smart policing datasets, which epitomize how “big data” policing is held back by the many limitations of larger legal structures but is presented as the solution to that very problem. 

The following is an adapted excerpt from “The Rise of Big Data Policing: Surveillance, Race, and the Future of Law Enforcement”:

Rich Data, Poor Data

My work shines light on the “black data” arising from big data policing: “black” as in opaque, because the data exists largely hidden within complex algorithms; “black” as in the next new thing, given legitimacy and prominence due to the perception that data-driven anything is cool, techno-friendly, and futuristic; and finally, “black” as distorting, creating legal shadows and constitutional gaps where the law sued to see clearly. Black data matters because it has real world impacts. Black data marks human “threats” with permanent digital suspicion and targets poor communities of color. Black data leads to aggressive use of police force, including deadly force, and new forms of invasive surveillance.

Many issues arising from this “black data” can often be traced back to the class disparities between those being surveilled.

Law enforcement data-collection systems create the inverse problem of consumer data systems. Large portions of the population do not get tracked or surveilled even though they might be involved in criminal activity.

Class — and the policing decisions impacted by class — protects many people who break the law. Most of the targeted individuals on Chicago’s heat list are young men between the ages of 18 and 25. This is the same age as many young people pursing college or graduate degrees at universities. In both urban Chicago and Ivy League campuses, drug use, drug dealing, thefts, threats, assaults and sexual assaults are unfortunately common. Young people of all economic backgrounds do foolish, dangerous, and impulsive things —sometimes under the influence of drugs or alcohol — yet criminal prosecutions are not equally consistent. After a drunken brawl or a theft or a threat or even a rape, a call to campus security leads to a university disciplinary investigation, while a call to the police leads to a criminal prosecution. Only the latter ends up in the city’s criminal justice database and as part of the suspect’s permanent criminal record.

A host of other class-based protections exist to keep people with money hidden from law enforcement’s big data collection systems. Physical barriers involving private property keep surveillance away from activities that take place behind the walls, and economic mobility allows travel away from high-surveillance, high crime areas. Social status influences police discretion about whether to stop, search, or arrest. If affluent Charlie should be pulled over on the way to college, the ALPR would reveal no warrants, the computer system would show no contacts, and the data trail would be nonexistent. A presumption of data-driven innocence would shape the police interaction (despite Charlie’s possession of illegal narcotics).

Apart from class discrepancies, police data is also quite fragmented. As Ronald Wright has explained, “There are 17, 876 state and local law enforcement agencies operating in the United States. Only 6.1% of those agencies employ 100 or more full-time sworn officers. Seventy-four percent of the agencies employ fewer then twenty-four officers. These smaller entities cannot do the data quality control or collection required for more big data systems. The result is that local police data sets are both incomplete and too small to create useful criminal databases. This fragmentation creates further data holes.

Equally problematic, the current criminal justice system does a notoriously poor job of collecting complete crime data. Certain crimes regularly go unreported. Interfamily physical and sexual abuse does not get reported because of the personal relationships involved. Sexual assault remains underreported because of the social stigma and legal difficulties of reporting. Gang violence results in extrajudicial response rather than police reports. Most drug users do not self-report. Most people illegally possessing firearms do not turn themselves in. White-collar theft remains hard to investigate. Some communities, frustrated with biases policing practices or concerned that contact with the police could have negative consequences, simply decline to report crimes. Even violent crime does not always make it into the data systems. The Bureau of Justice Statistics found that nearly half of violent crimes (3.4 million incidents a year) went unreported. Paralleling the other reasons, the BJS study found that victims of violent crime did not report because they were too afraid or knew the perpetrator or chose some other method to handle the situation.

The combination of class-based and crime-based gaps means that any big data policing system — at best — works on only half the crime data available. Such a distortion signifies a huge limitation of data-driven systems, and on a very basic level, the distortion challenges the reliability of the results from data-driven strategies. Police resources and techno-logical investment go into fighting the crime we can see, not necessarily the crime that exists. For victims of sexual abuse, trafficking, drug addiction, and other less reported, and, thus, less measurable crimes, this neglect can have real impact, because a data-dive policing system without complete data may give a false vision of success.

At another level, these data holes distort future data analytics. Data driven systems trained on incomplete data can lead to inaccurate outcomes. Outlying data from a minority community can be seen by a computer as an error and be ignored in later algorithms. This means that a data holes can create further data holes if the algorithm chooses to minimize certain data that does not fit the model.

That said, the omissions do not hold for all crimes. As discussed, certain crimes like homicides, stranger sexual assaults, property crimes connected to insurance (such as car theft and burglary), and other assaults with serious injuries tend to be reported fairly accurately. In these cases, police reliance on data-driven systems makes sense. Data holes are not everywhere, but they do offer spaces of darkness that must be recognized, even if they cannot be illuminated.