If you’ve spent any time online you’re well aware the internet is infested with bots and shit stirrers of all stripes pretending to be people they aren’t. Though bot proliferation, in particular, is difficult to measure, a 2020 report from cyber security firm Imperva found over a quarter, (37.2%) of all internet users weren’t human. That’s a lot. Mix that with everyone else operating under aliases and you start to realize that much of the modern internet, to some degree, is fake. But what if there was a tool that could cut against that fakery and identify the author of any given post based solely on the linguistic stylings of their text?
Experts at the Intelligence Advanced Research Projects Activity, the research wing of the intelligence community, are using artificial intelligence and heaps of online text data to create just such an identity verification marker, NextGov notes in a recent report. The researchers hope one day this text “fingerprint,” could play a significant role in identifying individuals behind disinformation campaigns and fighting back against human trafficking.
“Imagine you had machine-generated text that was being created online to conduct a disinformation campaign,” IARPA Program Manager Dr. Timothy McKinnon told NextGov. “What the technology will be able to do is it will be able to identify, potentially, the fact that a machine generated the text, and also help you understand which groups are engaged in those activities.”
The proposed text-based fingerprinting technique would reportedly work somewhat similar to other ways forensics experts currently determine someone’s identity based on their handwriting. Just as humans have tiny little individual differences and idiosyncrasies in the way they write a word, online authors similarly have their own tells when crafting sentences online.
“Think about if you had 100 different people, and you ask them to describe some simple thing—like how to open a door—in two sentences or one sentence, you’d probably get about 100 different answers, right?” McKinnon asked. “And, you know, each person sort of has their own idiosyncrasies as an author that are potentially used by authorship attribution systems.”
With enough input data, McKinnon believes an AI tool could determine a digital fingerprint based solely on written text. Armed with that technology, a government agency could potentially determine whether a bad actor was trying to falsely impersonate someone else online, or possibly tell if something posing as a human online was actually just a bot spewing disinformation.
Whether or not that’s a good thing probably depends on how problematic you think disinformation and misinformation campaigns are and how much you value the idea of anonymity online. Civil liberties groups and privacy advocates may shudder at the thought of a powerful new text fingerprinting tool wielded by state agencies, particularly in the wake of recently declassified documents detailing a bulk data collection spearheaded by the CIA.
But before anyone microwaves their laptop it’s worth noting that the particular use cases around IARPA’s text fingerprinting tools remain largely theoretical. McKinnon was quick to note that IARPA focuses mostly on exploratory ventures and doesn’t necessarily determine how technology will develop longer-term or how government partners will choose to deploy it.
Still, there are some potential privacy-preserving aspects of the technology as well. After identifying a user’s digital fingerprint, for example, someone could then go in and slightly modify the text so it no longer looked like the original author.
Regardless of your feelings towards a new AI system, it was inevitable that day would come. Aside from mainstays like fingerprints and facial recognition, researchers have determined ways to identify people based on their voice, gait, feces, and even their ass.