Back in 2016, Microsoft built a database of more than 10 million images featuring roughly 100,000 people. Today, the Financial Times has reported Microsoft quietly deleted this database, dubbed MS Celeb, from the internet.
Before its deletion, MS Celeb was the largest public facial recognition data set in the world. It was purportedly called ‘Celeb’ to imply the faces in the data set were from public figures. The catch is, according to the Financial Times, many people featured in the set were not asked for their consent. Instead, their images were included by scraping image and video searches via the Creative Commons license. (Under the license, you can reuse photos for academic research. And the subject of the photos doesn’t necessarily grant the license, the copyright owner does.) However, the Financial Times found the set contained faces of private citizens and security journalists, including Kim Zetter, Adrien Chen, and Julie Brill, a former FTC commissioner, among others.
“The site was intended for academic purposes,” Microsoft told the Financial Times. “It was run by an employee that is no longer with Microsoft and has since been removed.”
Unfortunately, it’s not quite that simple. The MS Celeb set has been used by several companies, including IBM, Panasonic, Alibaba, Nvidia, and Hitachi. It’s also been used by Sensetime and Megvii. These two companies are suppliers to Chinese officials in Xinjiang, where facial recognition tech and artificial intelligence has been used to track and imprison minority groups like Uighurs and Muslims. Sensetime was valued at more than $4.5 billion as of late 2018, and its SenseTotem and SenseFace systems are used by various Chinese police departments. Megvii recently raised $750 million in series D funding, and its Face++ tech was actually cited in a Human Rights Watch report as a provider to the Integrated Joint Operations Platform—a police app used in Xinjiang. However, the group then amended its report that the Face=++ account in the IJOP code had never been actively used. In a New York Times report, both companies denied direct knowledge of their software being used to racially profile minorities in China.
It’s unclear whether the MS Celeb data set definitively played a role in attempts to racially profile in the Xinjiang program, and if it did, how critical the data set was in developing that technology. However, researchers at MegaPixel contend that Microsoft clearly lost control over who actually used the data set. A chart shows that China topped the list of countries using MS Celeb in dataset citations in both 2018 and 2019.
Microsoft itself has been vocal about its opposition to using such tech as a form of government surveillance. In a December 2018 blog, Microsoft called on companies to create safeguards and for governments to start regulating facial recognition tech. In the blog, it also acknowledged the potential for governments to abuse facial recognition. Earlier in April, Microsoft also reportedly turned down a California law enforcement agency’s request to install facial recognition tech in officers’ cars and body cameras, as doing so would disproportionately impact women and minorities.
However, Microsoft’s objections and good intentions only go so far. The FT noted the MS Celeb data set is still available to any academic institution or company that had previously downloaded it, and it’s still being shared on GitHub, Dropbox, and Baidu Cloud. Gizmodo reached out to Microsoft for comment but did not immediately receive a reply.