The Pentagon accidentally left at least 1.8 billion publicly accessible posts it scraped from social media sites, forums and other web destinations unsecured on an Amazon S3 repository, where anyone with a free Amazon Web Services account could download the data, PC Mag reported.
The data archive was originally found by security firm UpGuard, which wrote that the postings appeared to have been assembled by “CENTCOM and PACOM, two Pentagon unified combatant commands charged with U.S. military operations across the Middle East, Asia, and the South Pacific.” While many of the postings seemed to have been made by persons who were not U.S. citizens, many were “apparently benign public internet and social media posts by Americans,” UpGuard wrote. Available evidence indicates that the Pentagon hired a now-defunct contractor titled VendorX to store the files, and that VendorX was apparently assembling “an ingestion engine for the bulk collection of internet posts—organizing a mass quantity of data into a searchable form.”
According to UpGuard’s report, the data appeared to have “an emphasis on Arabic, Farsi (spoken in Iran and Afghanistan), and a number of Central and South Asian dialects spoken in Afghanistan and Pakistan,” suggesting that program was related to U.S. military and intelligence operations in central Asia. However, as seen below, the Pentagon program apparently swept up public posts by Americans, including airing their political views.
As noted by Ars Technica, the data may have been being used to fuel Outpost, which posts by former VendorX employees described as a “multi-lingual platform designed to positively influence change in high-risk youth in unstable regions of the world.”
Military officials say the program in question was conducted using “commercial off-the-shelf programs” and downplayed why it had collected posts of U.S. citizens—which is yet another reminder of just how casually authorities can spy on the public as well as potentially illegal. The government also questioned that the program was particularly interesting or scandalous.
“Once alerted to the unauthorized access, CENTCOM implemented additional security measures to prevent unauthorized access,” CENTCOM spokesman Major Josh Jacques told PC Mag. “... The information you are asking about is not sensitive information. It is not collected nor processed for any intelligence purposes.”
Big data archives leaking online due to substandard security has become a major point of concern in 2017, especially following the recent leak of over 145 million Americans’ sensitive personal information from credit history company Equifax. In this case, changing a simple privacy setting would have kept the files hidden from public view. UpGuard has previously found data caches from Viacom, Verizon, Tigerswan, Dow Jones, Deep Root Analytics, and Booz Allen similarly unsecured.