The founders of Predictim want to be clear with me: Their product—an algorithm that scans the online footprint of a prospective babysitter to determine their “risk” levels for parents—is not racist. It is not biased.
“We take ethics and bias extremely seriously,” Sal Parsa, Predictim’s CEO, tells me warily over the phone. “In fact, in the last 18 months we trained our product, our machine, our algorithm to make sure it was ethical and not biased. We took sensitive attributes, protected classes, sex, gender, race, away from our training set. We continuously audit our model. And on top of that we added a human review process.”
At issue is the fact that I’ve used Predictim to scan a handful of people I very much trust with my own son. Our actual babysitter, Kianah Stover, returned a ranking of “Moderate Risk” (3 out 5) for “Disrespectfulness” for what appear to me to be innocuous Twitter jokes. She returned a worse ranking than a friend I also tested who routinely spews vulgarities, in fact. She’s black, and he’s white.
“I just want to clarify and say that Kianah was not flagged because she was African American,” says Joel Simonoff, Predictim’s CTO. “I can guarantee you 100 percent there was no bias that went into those posts being flagged. We don’t look at skin color, we don’t look at ethnicity, those aren’t even algorithmic inputs. There’s no way for us to enter that into the algorithm itself.”
I tell them I am sure that they don’t have a ‘Do Racism’ button on their program’s dashboard, but wonder if systemic bias could nonetheless have entered into their datasets. Parsa says, “I absolutely agree that it’s not perfect, it could be biased, it could flag things that are not really supposed to be flagged, and that’s why we added the human review.” But the human review let these results stand.
“I think,” Simonoff says, “that those posts have indications that someone somewhere may interpret as disrespectful.”
Predictim advertises a service that promises to vet potential babysitters by scanning their presence on social media, the web, and online criminal databases. By using machine learning—neural language processing for text and computer vision for images—to rapidly sift through a person’s lifetime worth of images and posts, Predictim purports to be able to flag individuals prone to abusive behavior, drug use, and posting explicit imagery. People you might not entrust with your kids.
A recent Washington Post article about the service went viral, driven by a wave of fascination and revulsion. The notion that a private algorithm was being deployed to analyze teenagers’ and low-income workers’ musings and selfies on social media, determine their “risk level,” and deliver the results to excitable parents struck many as a grim portent for the future of the informal workforce.
“This AI sitter screening is error-prone, based on broken assumptions, and privacy invading,” Kate Crawford, the founder of NYU’s AI Now Institute, tweeted after the Post story dropped. “What’s worse—it’s a horrifying symptom of the growing power asymmetry between employers and job seekers. And low wage workers don’t get to opt out.”
Certainly, the product’s launch comes at a time when scrutiny over bias in machine learning is growing fast—recently, an automated recruiting tool at Amazon was scrapped because it was determined to be biased against women. Alarms have been raised over the flaws in automated systems that contribute to hiring and firing workers. And Predictim’s founders themselves say that gig work is one of the fastest-growing parts of the economy, and hint that they have plans to offer scans of contract workers more broadly.
So, I decided to take Predictim for a spin, to find out what the brave new future of automated background checking might hold.
Head to Predictim’s website, hit “Initiate a New Scan,” and you’ll be sent to a page where a happy white mother and her pudgy-cheeked baby gaze into a laptop. The overlaid text reads, “Purchase more scans. You’re one step away from making sure your family is happy and safe.” For $24.99, you can scan one individual, provided you know his or her full name, city of residence, and email address. $49.99 gets you three (that’s buy two scans, get one free).
When I entered the first person I aimed to scan into the system, Predictim returned a wealth of personal data—home addresses, names of relatives, phone numbers, alternate email addresses, the works. When I sent a screenshot to my son’s godfather of his scan, he replied, “Whoa.”
The goal was to allow parents to make sure they had found the right person before proceeding with the scan, but that’s an awful lot of data.
“Initially,” Parsa says, “we allowed parents to put in the potential babysitter’s name, and then it would trigger an email to them where they would opt in and give us access to their social media profiles, and then we would scan and generate a report where both can see the report. The issue arose that when a lot of parents told us they don’t feel comfortable telling the sitter, ‘Give me access to your social media, or let me scan you or check you out on the internet.’ So what we decided to do was to focus on the babysitter; that doesn’t require that opt-in or the permission.” (Parsa also says they are in the process of moving to a system that uses only images to let parents identify babysitters.)
In all, I scanned my wife, my son’s grandmother, his godfather, two friends, our babysitter, and Sal Parsa, the CEO of Predictim. He confirmed to me that the addresses listed were his current and previous places of residence, that I now had both of his phone numbers, and that the only incorrect data were some names of relatives he didn’t know.
“All the information you saw is all accurate,” he tells me. I asked him if he would want me to publish that information in my article. “Yeah, that’s a really good point: Yes, it’s available on the internet, but you don’t want to make it easily, or too readily, available.”
After you confirm the personal details and initiate the scan, the process can take up to 48 hours. You’ll get an email with a link to your personalized dashboard, which contains all the people you’ve scanned and their risk rankings, when it’s complete. That dashboard looks a bit like the backend to a content management system, or website analytics service Chartbeat, for those who have the misfortune of being familiar with that infernal service.
To the left is the list of scanned profiles; center right is the risk indicator—a half-circle color wheel with an arrow pointing to the risk level. Green is Very Low Risk, red is Very High.
Simonoff says Predicitm “doesn’t look at words specifically or phrases. We look at the contexts. We call it vectorizing the words in the posts into a vector that represents their context. Then we have what’s called a convolution neural net, which handles classification. So we can say, is this post aggressive, is it abusive, is it polite, is it positive, is it negative?’ And then, based on those outputs, we then have several other models on top of it which provide the risk levels and that provide the explainability.” (He and Parsa insist the system is trained on a combination of open source and proprietary data, but they refused to disclose the sources of the data.)
Potential babysitters are graded on a scale of 1-5 (5 being the riskiest) in four categories: “Bullying/Harassment,” “Disrespectful Attitude,” “Explicit Content,” and “Drug use.”
My wife and my son’s grandmother, who, apart from the occasional incensed political post, have very clean profiles—they both work at universities, where they interact with research subjects and students, and are both white—got the ‘Lowest Risk’ ratings on each of the four categories. Kianah, a musician who babysits part-time, has never been anything but kind and respectful, and was enthusiastically referred to us by friends, was flagged as a “Moderate Risk” (3 out 5) for “Disrespectful Attitude” and a “Low risk” (2 out 5) for “Bullying / Harassment.”
Her total score was “Low Risk” (2 out of 5). On paper, that might not seem that bad—but per the moms interviewed in the Post piece, anything less than perfect is enough to make clients think twice about hiring a babysitter. In other words, if Predictim saw widespread adoption, it would potentially prove a major blow to our babysitter’s business.
In fact, the biggest surprise came when Predictim cleared my son’s godfather, Nick Rutherford, giving him a close-to-perfect score (he only saw a 2 out of 5 on Disrespectful Attitude, got top marks in all other categories, and a ‘Very Low Risk’ in general. Rutherford is a professional comic and TV writer, and his Twitter feed is full of vulgar jokes, sexual innuendo, and F-bombs. It took me about 45 seconds to find a half dozen of them, scrolling through his feed myself.
The question is—why did Kianah get flagged as more of a risk than Nick?
Predictim’s founders note that when their algorithm determines that a person has registered as a Moderate Risk (3 out 5) or more on any of the categories, it automatically subjects them to a human review. Predictim then presents the posts that were flagged to the user so that parents can look over the posts themselves. Our babysitter scored worst on Disrespectful Attitude, so I looked over the flagged posts, all of which came from her Twitter account (which I should mention was anonymized so that any kids googling their babysitter would not find it). A few samples:
“Dint put any makeup on but I got that post-poop glow,”
“Our legal system is a fucking crazy map.”
“haven’t decided if I’m an indigo child or a narcissist”
“2018 is the year I stop talking shit”
The system also flags retweets, so it picked up a few posts like: “RT: 1 thing I like about myself is that I’ve never given a fuck about Grey’s anatomy” and this: “I wish I could manifest today as a person and beat the living fuck out of it”.
And that’s… pretty much the gist of it. The targets of this disrespect were a television drama, our legal system in general, and someone’s bad day. For this, she was pegged a Moderate Risk for Disrespectfulness.
Meanwhile, here are a few of Nick’s posts:
“No joke. I saw Tom Brady suck off @VP to completion 16 times.”
“Just saw an Army green PT Cruiser so how the fuck do you think I’m doing.”
“‘And on this day we celebrate the new Gestapo, President Turd and his army of cowardly fuck-bois.’ -Paul Ryan”
That’s not counting the RTs, either. If anything, Nick’s posts are more disrespectful, as they are aimed at actual people and public figures—yet Nick sailed through with the lower score.
Sal Parsa and Joel Simonoff met at UC Berkeley, where Parsa was pursuing an MBA and Simonoff was working on natural language processing. Their first project was called Social Filter, which aimed to let job seekers locate and delete social media posts that employers might find offensive. They took their technology to Berkeley’s SkyDeck incubator, where they received $100,000 to develop their idea. Along the way, they abandoned Social Filter for what would become Predictim.
“One day, while talking to a few moms, we realized that child abuse is a huge problem in the Western World, particularly the United States,” Parsa told me in an email. “We began interviewing all types of people and during that process we realized that the biggest impact and the biggest need for our technology was childcare, and babysitting,” Parsa says. The sharing economy, he predicts, is growing so fast that it will account for half of the world’s GDP in 10 years.“Yet there’s something missing.” We still rely on “outdated” background checks. “But there is nothing in between, that we call human trust,” he says. “If somebody came into my house or were taking care of the most important person in my life,” he continues, “I don’t know anything about that person except that maybe the background check came back clean, and maybe I interviewed them over the phone—but maybe they presented their best self. However, there is so much more information out there that can show the real, true personality of people.”
Neither Parsa nor Simonoff have children, though Parsa is married, and both insist they are passionate about protecting families from bad babysitters. Joel, for example, once had a babysitter who would drive he and his brother around smoking cigarettes in the car. And Parsa points to Joel’s grandfather’s care provider. “Joel’s grandfather, he has an individual coming in and taking care of him—it’s kind of the elderly care—and all we know about that individual is that yes, he hasn’t done a—or he hasn’t been caught doing a crime.”
According to Parsa, the company currently has 1,000 active users, and has received some level of seed funding, though he declined to tell me how much, or who had invested.
Predictim, he stresses, was tailor made for moms. “We interviewed 100 moms, caregivers, and pet owners,” he says. “When we explained their product to them, they loved it. They said they’d use it every time.” The four categories were determined by feedback form those workshops, as well as surveys Parsa sent out to mommy bloggers. They clearly heeded the results, and tailored their product to this famously high-strung cohort. On the website, there are blog posts with titles like “Children Will Mimic Their Child Care Provider – Who Do You Want Your Child to Act Like?” and statements like “The best way to keep your child safe is to have a caregiver evaluation done with a social media checker… If you want your child to mimic people you would be proud if they turned out like, then you want Predictim.”
It’s thus the expectations of mommy blogs that Predictim is built on, and that, if it is successful, future babysitters must live up to.
“The black woman being overly penalized—it could be the case that the algorithm learns to associate types of speech associated with black individuals, even if the speech isn’t disrespectful,” Kristian Lum tells me. Dr. Lum is the lead statistician at the Human Rights Data Analysis Group, and has published work in the prestigious journal Nature concluding that “machine-learning algorithms trained with data that encode human bias will reproduce, not eliminate, the bias.”
Lum says she isn’t familiar with Predictim’s system in particular, and to take her commentary with a grain of salt. But basically, a system like Predictim’s is only as good as the data that it’s trained on, and those systems are often loaded with bias.
“Clearly we’re lacking some context here in the way that this is processing these results,” Lum says, “and that’s a best case scenario. A worst case is that it’s tainted with humans labeling black people as disrespectful and that’s getting passed onto the algorithm.”
To be fair, I scanned another friend of mine who is black—someone whose posts are perhaps the most overwhelmingly positive and noncontroversial of anyone on my feed—and he was rated at the lowest risk level. (If he wasn’t, it’d be crystal that the thing was racist.)
And Parsa, who is Afghan, says that he has experienced a lifetime of racism himself, and even changed his name from a more overtly Muslim name because he couldn’t get prospective employers to return his calls despite having top notch grades and a college degree. He is sensitive to racism, in other words, and says he made an effort to ensure Predictim is not. Parsa and Simonoff insist that their system, while not perfect, can detect nuances and avoid bias.
“We made sure to train our AI model to be completely ethical and not biased,” Parsa says. “We made sure our program, our software, our machine, as people call it, can understand sarcasm or jokes.”
“It doesn’t understand my sarcasm,” Kianah texts me. I’ve filled her in on the saga thus far, and shared with her the posts Predictim flagged. “Lol so strange. I wonder if this will become a big part of job screening in the future.”
She says she’s not surprised her social media accounts would be interpreted in a negative way, or that bias might infect such a program. “I think systems have the same prejudice that their creators do,” she writes. Still, she’s not happy about it.
“It’s deeply unsettling and makes me feel under a microscope,” she says, urging me to share her part in the story. “It has the potential to create an even more hostile world.”
Not only that, but if people actually buy into the system, it would mean she might have difficulty finding work—and so might just about anyone on Twitter using coarse language, or dialect outside of the normalized mainstream.
“Of course it’s concerning,” Dr. Lum says. “The last thing we want is groups of people to have fewer employment opportunities based on data that encodes racial and historical bias.” (Again, noting she hasn’t seen this specific data set.)
Parsa and Simonoff may or may not believe in their mission, though Parsa tells me to Google cases of babysitter abuse, and that reading them nearly made him cry. (There are indeed pages full of horrifying stories of abuse and neglect, though, for perspective, a 2001 US Justice Department bulletin prepared by professors at the Crimes Against Children Research Center at the University of New Hampshire noted that babysitters account for just 4 percent of crimes committed against children, a figure below the rate of complete strangers.)
But it seems rather interesting to me that they swung from helping job applicants weed out and delete their offensive posts to helping moms and caretakers find those offensive posts, and then use them as rationale not to hire people. To me, it feels like a case of technology in search of a problem to solve, and perhaps an unregulated labor market to run rampant in. In fact, a few days into my playing with the site, a warning suddenly appeared over the dashboard:
“Predictim’s [sic] uses publicly available data to help parents decide as to who they want to trust. Predictim does not provide private investigator services or consumer reports, and is not a consumer reporting agency per the Fair Credit Reporting Act. You may not use our site or service or the information provided to make decisions about employment, admission, consumer credit, insurance, tenant screening or any other purpose that would require FCRA compliance.”
In the midst of a bad press cycle, Predictim was covering its bases, it seemed. It is illegal to run background checks on prospective employees during the hiring process, unless they comply with the FCRA—employers must notify prospective hires that they’re conducting the check for one thing, and give them a chance to dispute the results, among offering other protections. Those protections, however, don’t necessarily apply to informal and contract workers like babysitters, which is one reason Parsa and Simonoff targeted this market segment. (It would be much like how Uber and Lyft argue their drivers are contractors, not employees, which allows them to provide fewer benefits and skirt employee protections.) The new warning text was likely a measure aimed at giving Predictim legal cover from employers who might use the service on potential hires illegally.
Facebook and Twitter also responded to the media storm, and announced they were shutting off Predictim’s access to their platforms. Simonoff, however, says they were not using either’s API, and only using public data anyway, and that the blockades are effectively meaningless.
For now, Simonoff and Parsa are undeterred. They say the press has them all wrong. “In the recent news, we have been unfairly treated by the media,” Parsa says. “I mean, we have been very professional with the press, because we trust and respect journalists. But some of them twisted our words.”
They stress this over and over, that Predictim has incorrectly been labeled a “black box” that spits out predictions. “Our product does not predict if someone is going to be a good or a bad babysitter,” Parsa says. “That’s not what we do.” I do not point out that the company’s name is literally Predictim. The founders insist it’s all about putting more data into parents’ hands, about helping those parents make more informed decisions, about making everyone safer. But those results, right now, are wonky and inconsistent at best. And once these systems start flashing warnings about Moderate Risks and beyond, the average user is going to be biased towards the results, no matter what kind of AI or human review is behind the hood.
“We’re not in a place as a society,” Lum says, “until we get unbiased training data, to be determining a person’s level of respectfulness with AI.”
Ready or not, the use of automated assessment tools is only poised to rise. We already have to contend with employers and credit agencies combing through past posts to reach conclusions about hireability, loan-worthiness, even health care eligibility. The automation of those processes renders them that much more inscrutable, and harder to dispute by the affected parties; the gig worker on the wrong end of the algorithm.
Predictim was a lightning rod, perhaps, because they overreached in scanning a particularly sensitive cohort—I asked the founders repeatedly if they wanted to live in a world where casual posts between friends could determine employment eligibility, even among teenage babysitters, and they demurred, or spoke to the parents’ “absolute right” to know everything about sitters. But the demand and willingness of Silicon Valley investors to back such ventures portend a future filled with similar, and perhaps savvier, startups. And it probably won’t be those getting graded by the algorithm who will decide how and when it’s pushed out.
“We are releasing our technology in steps,” Simonoff says. “This is the first step in a long line.”
This story is part of Automaton, an ongoing investigation into the impacts of AI and automation on the human landscape. For tips, feedback, or other ideas about living with the robots, I can be reached at firstname.lastname@example.org.