FEMA Contractor Tracing Coronavirus Deaths Uses Web Scraping, Social Media Monitoring

Illustration for article titled FEMA Contractor Tracing Coronavirus Deaths Uses Web Scraping, Social Media Monitoring
Photo: Angela Weiss / AFP (Getty)

On December 31, few Americans had ever heard the word “coronavirus.” It was unfathomable that in cities like Pittsburgh, the National Guard would soon be deployed to work food lines stretching as far as the eye can see. Few had ever experienced the emotional strain of having to physically separate themselves from a loved one for weeks and possibly months. No one was ready to hear the economy once again being compared to the Great Depression so soon.

As Americans rang in the new year with fireworks (and the time-honored tradition of watching CNN anchors get sloshed), an algorithm designed to help anticipate the kind of ramping chaos only a rapidly spreading pathogen can bring began to notice an unusual amount of health-related chatter in China. The surge came primarily from Wuhan, one of China’s central cities, a sprawling metroplex of more than 11 million inhabitants, according to John Goolgasian, chief operating officer at Geospark Analytics.


Geospark Analytics combines machine learning and big data to analyze events in real-time and warn of potential disruptions to the businesses of high-dollar private and public clientele, FEMA and the U.S. Defense Department among them. Over the phone, Goolgasian said his firm wasn’t sure what was happening when the virus first was flagged. Then again, no one was. “We saw there was this pneumonia or SARS-like thing happening, so we ran some retrospective analysis and shot it out to our users that day,” he said.

That analysis, titled, “The 5 Things You Need to Know,” listed among other items of interest—clashes between Chilean police and protesters and a fire near India’s Kandla Port—a SARS-like virus spreading in Wuhan, sparse details of which were offered under a curious subheading: “Pneumonia outbreak?” Earlier that afternoon, Chinese authorities had confirmed 27 people were infected with the mystery “pneumonia” of an “unknown origin.”

The first death attributed to the novel coronavirus, which at the time had no name, came 11 days later.

Geospark Analytics’ product, called Hyperion, the namesake of the Titan son of Uranus (meaning, “watcher from above”), fingered Wuhan as a “hotspot,” in the company’s parlance, within hours after news of the virus first broke. “Hotspots tracks normal patterns of activity across the globe and provides a visual cue to flag disruptive events that could impact your employees, operations, and investments and result in billions of dollars in economic losses,” the company’s website says.


Whether Geospark Analytics’ private and public clients took any action based on its December 31 alert is hardly the software’s responsibility. Unlike Hyperion, many of its mortal users simply ignored the clear signs that cataclysmic event was barreling towards them, many until after panic beset the masses.

On March 21, the Department of Homeland Security awarded Geospark Analytics a $150,000 contract to provide FEMA with “geospatial analysis in support of disaster survivors.” Goolgasian, who spent two decades at the National Geospatial-Intelligence Agency, the Pentagon’s mapmaker—and did a stint at the CIA, based on an introduction he gave during a panel in 2017—declined to say whether the contract relates specifically to FEMA’s coronavirus efforts.


“I can talk about what we do, but I don’t want to get into the details of the contract,” Goolgasian said.

FEMA did not respond to Gizmodo’s request for comment.

Geospark Analytics has been sucking up data on the virus from a variety of sources since the pandemic began as part of an effort to determine which counties are at the highest risk. This involves combing through millions of social media posts “and everything else around it,” Goolgasian said, as well as datasets from hospitals around the United States. “We created this living model or seven-day forecast of where the growth of the virus could be,” he said, “based on death rates and existing hospital infrastructure.”


“Our client base ranges from small business to large governmental organizations,” he said. “We also pride ourselves on the fact that we have released information to the public at no cost to assist in the response to this unprecedented COVID-19 crises.”

In December, Geospark Analytics received $250,000 from the Department of Defense as part of a small business research award. It had previously received taken on Air Force contracts involving “global stability, threat, and operational risk forecasting,” for a total of $165,000, records show. (Somewhat confusingly, Geospark Analytics of Herndon, Virginia, is not to be confused with GeoSpark of North Potomac, Maryland, a company that focuses on cell phone location intelligence, another area of interest for the federal government. When we asked Goolgasian whether the two companies are related in any way, he was steadfast: “We are completely separate, not even close to doing the same thing.”)


In the last year, Geospark Analytics claims to have processed “6.8 million” sources of information; everything from tweets to economic reports. “We geo-position it, we use natural language processing, and we have deep learning models that categorize the data into event and health models,” Goolgasian said. It’s through these many millions of data points that the company creates what it calls a “baseline level of activity” for specific regions, such as Wuhan. A spike of activity around any number of security-, military-, or health-related topics and the system flags it as a potential disruption.


Amid the unrest in Hong Kong last year instigated by planned changes to the city’s extradition laws, for example, Hyperion alerted its users to a “significant increase in negative activity in Hong Kong.”

In a promotional blog post, Geospark Analytics explained that at the time, Hyperion highlighted certain areas in Hong Kong, where millions of anti-government protesters had gathered, with an “interactive icon” on the platform’s global map. “By clicking on this icon a user will be able to access all relevant articles and social media posts that Hyperion has identified,” it said, adding that the function “provides content related to the recent activity and allows users to take a historical look at the region going as far back as 90 days,” including “social media posts.”


Goolgasian, pressed on the privacy implications, said that monitoring social media is only a “small piece” of what Geospark Analytics does and that it pursues “more authoritative and validated” sources. Social media data is, after all, notoriously unreliable. A 2016 study, for example, found that Google prominently surfaced information about a much-discussed “cholera” epidemic in the United States in 2007 “as a result of Oprah Winfrey picking Love in the Time of Cholera as book of the month in her book club.”

“We rely more on traditional data sources and we don’t do anything that isn’t publicly available,” Goolgasian said, echoing a common refrain among data firms that fuel surveillance products by mining the internet itself. Earlier this year, CEO Hoan Ton-That of facial recognition firm Clearview AI defended his company’s aggressive web scraping by arguing he had a First Amendment right to data made public by users on social media. Several major companies, including Google and Facebook, have indicated they plan to take legal action.
“Whether it comes from purchasing information through APIs, through RSS feeds or web scraping, or even looking at things like state-level department of health data, we get the latest and most authoritative information,” Goolgasian said.


Goolgasian was also contacted by Senator Ron Wyden’s office on Friday. A longtime supporter of digital privacy, Wyden is working to get a handle, an aide said, on the flood of data firms approaching the government with solutions to the coronavirus. While Goolgasian did not offer any further details about Geospark Analytics’ work for Homeland Security, he was adamant that certain types of data it considers strictly off-limits: “We DO NOT process any cell data. It has been something that we have purposefully stayed away from for the reasons you are concerned about,” he wrote in an email shared with Gizmodo.

Despite downplaying social media’s role in Hyperion’s forecasts, Geospark Analytics announced last year that it established an agreement with Twitter granting it access to an “enhanced data stream.” “Adding this real-time data source to our war chest of unique data will further enhance situational awareness and instantly notify users of breaking events in the time it takes to write a tweet,” it said.


Geospark Analytics product manager Serena Kelleher-Vergantini elaborated after the announcement that by “stream” the company meant enterprise access to Twitter’s API, also known as Firehose, which she went on to describe as completely useless without Hyperion. “Needless to say, no matter how much effort we put into building the initial rules, the results were mediocre at best,” she wrote, describing Hyperion’s filters for events like “earthquakes” or “terrorism.”

“The valuable tweets were there, but they were drowning in a sea of back-and-forth tweets between people arguing (over a terrorist event), emotional tweets of people who felt like they had an experience that ‘felt like an earthquake,’ and tweets about a drink referred to as ‘the landslide.’” she said, adding: “trust us when we say, you really don’t want the Twitter firehose. What you need is a platform like Hyperion that will filter out the noise to find the Twitter data you need.”


A Twitter spokesperson told Gizmodo:As stated in our Developer Policy, we do not allow our data products to be used for surveillance purposes, or in any other way that would be inconsistent with people’s reasonable expectations of privacy. We consistently hold ourselves accountable to rigorous standards, including third-party audits of key products and services, and proactive enforcement of our policies.


Twitter has a complicated history with government contractors using Firehose to monitor its users’ speech. In 2016, for example, the platform severed ties with multiple analytics firms—effectively shuttering some of them—citing a longstanding rule against the sale of user data for “surveillance” purposes. (The decision came after intense reporting by the Guardian, Daily Dot, and other outlets, along with pressure by the ACLU.) “Using Twitter’s Public APIs or data products to track or profile protesters and activists is absolutely unacceptable and prohibited,” Twitter said at the time.

Geofeedia, which quietly marketed its ability to monitor Black Lives Matter protests, was one of the companies Twitter banned then. Today, Geospark Analytics monitors protests in South America, Southeast Asia, and other areas where Twitter users reside. The differences between how they present their products may be consequential. The user data provided by Twitter in both cases is essentially the same. But where Geofeedia did little to mask that it was “surveillance” company, Geospark Analytics seems to avoid the term entirely, even if its skill at intelligence gathering is why it’s in business with the government in the first place.


“We strictly adhere to Twitters use policies and DO NOT process any Twitter content related issues like crime, protest, or social unrest,” Goolgasian said in an email. “We do not monitor individual Twitter accounts or activity. We do utilize Twitter for breaking news around disasters, terrorism, disease outbreaks, and transportation disruptions.”

While the privatization of intelligence is nothing new, Geospark Analytics’ contract with Homeland Security comes at a chaotic moment for the agency.


Jared Kushner, presidential son-in-law and senior advisor, who has zero emergency management experience, was appointed to supervise response efforts at FEMA and muster the support of private industry assets for the White House. Politico reported Thursday that Kushner and his team of technocrats have taken an “all-of-private-sector” approach, tasking potentially unvetted outside advisors with solving problems related to the producing medical supplies and a lack of covid-19 testing.

“It’s a little crazy,” one advisor, reportedly brought on to assist the government, told the reporters. “It’s all hands on deck—it’s literally, who’s got the technology and data? Who can help us?”


Update, 9:30pm: Additional context added.

Update, April 7: Statement from Twitter added.


Senior Reporter, Privacy & Security

Data Reporter - Investigations with Technology

Share This Story

Get our newsletter