The Amazing Quest To Preserve and Restore 52,000 Holocaust TestimonialsS

From 1994 to 1999, some 52,000 testimonies from Holocaust survivors and eye-witnesses were recorded on Betacam SP tapes by USC's Shoah Foundation's Institute for Visual History and Education. Last year, a massive preservation project was completed that digitized all the inventory—but about five percent of the tapes were discovered to be almost completely unwatchable. Time and faulty equipment used at the outset had taken their toll, and many of these clips couldn't be visually parsed at all.

Now, Shoah's Information Technology Services team is working on an unprecedented restoration project to bring copies of these videos back to life (the originals will be kept as untouched historical documents).

Last week, Gizmodo spoke with Ryan Fenton-Strauss, ITS video archive and post-production manager, about the major developments they've made.

How did the preservation project make these current restorations possible?

We used robotic technology to migrate all of the original 235,000 analog Beta SP tapes to Motion JPEG 2000, which is the archival gold standard for digital files. It was a huge project. We had to play back 105,000 hours of material in real time, so our main goal was efficiency; during that process there wasn't really any opportunity for any human intervention, but we had a really rigorous quality assurance process on that end because, after working with the archive for years, we knew there were some unfortunate issues. We had a large team of students watching the testimonies to find the anomalies, and use graphs to mark where the problems were.

Then, at the end of the Preservation Project, we said: "Well, what tricks are available to correct these videos and make an improved version of them?

So you did some research with the motion picture industry?

Yes. But the focus in the industry has been film restoration; for years and years, there's been a lot of sophisticated work happening there. We went to some of the big names with our sampling of problems and let them take a crack at it, but no one was able to correct them. We brought in some of the most powerful and expensive film restoration tools to set up demos, and they just weren't able to do anything with the videos.

Plus there was the sheer number of clips you were working with.

Right. Video restoration has happened on a very small scale—you can spend hundreds and hundreds of hours fixing a two-hour film. We had a different problem, in that what we had was thousands of hours of material that we had to restore en masse. So we needed to figure out processes that we could do in as close to real time as possible. There are some widely available tools and techniques out there—like common professional audio editing stuff to reduce noise—but there was this subset of videos that tended to be our worst problems, where the video was just covered up with artifacts, when the camera failed to record the signal properly.

How did you start the restoration process?

Our first pass was using a better machine for playback, to recover whatever information was on there; the older playback decks actually have more sophisticated error correction, and they were able to recover a pretty murky image that was buried behind a lot of artifacts. The next pass was to say: "Okay, let's isolate everything that's good, and get rid of everything that's not, and render that out."

Tell me a little more about this idea of "good" and "bad" images in a video.

Well, a half-hour video will break down into 50,000 individual frames and 100,000 sub-frame fields. Once you've got a series of pictures, you can identify the ones where there is no recognizable face. The first idea I had was if you simply drop all the stuff that's bad and borrow the previous image; after filling it in, you would then sync back up with the audio and be able to watch the video. You're sacrificing some resolution, essentially just isolating the good parts of the signal. By filling in the rest, it would play as a video.

But I was wrestling with how to systematically detect and programmatically remove those patterns. Then, one day, I was going through my Google Picasa photo album and I noticed that the facial recognition software identified my daughter—I think she was six at the time—going all the way back to when she was a baby. It mixed her up a little bit with my son but, for the most part, it was a pretty miraculous thing. The technology had come so far that I thought: "There's got to be an automated way of finding which are the good images and which are the bad in these testimonials." So we brought the images into Picasa and hit "Does this image have a face?" It got rid of a lot that didn't, so we could easily filter out and separate them. Then there was the manual intervention, but it was still pretty clunky.

What we really wanted to be able to do was automate the search for patterns in bad images, because sometimes it's just an artifact across somebody's chest. So I've been working with a student here named Ivan with an application from National Instruments called Vision Builder, that's typically used for quality assurance checks in mobile devices and quality testing in engineering and manufacturing.

The trick is that we're going to have to come up with a whole sampling of things that we're looking for, which is going to change with every tape. Our hope is that we'll be able to have ten to 20 different templates that search for different things. Fortunately, there's a really nice user interface in the system, so we can easily train somebody without a programming background to use it.

The restoration project is slated to finish up sometime next year. Where are you now?

So far, we've been trying to learn as much as we can about detection and to make the workflow manageable, because there's a lot of challenges working with so many images. By early next year, we'd like to put this process into production. For me, that means that a great day would be running ten half-hour tapes simultaneously, processing them, and coming out with the more watchable video. We'll put multiple software licenses in place, and train students to go through these tens of millions of images.

This has obviously been a huge technical undertaking, but it seems there would be a pretty big emotional impact in watching these testimonials, too.

There were survivors who managed to come out of the Holocaust and tell their story and, for some reason, it didn't get captured; it really was an unfortunate byproduct of the race against time to collect as many testimonials as we could. It's my job to make sure that we do the best we can for them—in fact, it's an obligation, as we've been entrusted with these videos. So, for me, there's a satisfaction that comes with the feeling that, wow, we rescued their video. It feels like we're doing good.

You can register here to view and search 1,200 testimonies available in the public online database; the restored copies of the tapes will be implemented soon.