YouTube Gets Automatic Captioning For All Videos

Today, YouTube is rolling out automatic captioning for all videos uploaded to the service, using Google's speech recognition service. You can see a demo in the video above.

Advertisement

Automatic captioning with Google speech recognition was launched in November. This only had a few selected education partners to test out automatic captioning, until now.

There are many reasons for captions on every video: ESL viewers, people in other countries, searchability, not wanting to disturb others, loud locations and automatic translations to other countries.

The captioning won't be perfect, since Google's speech recognition isn't perfect, but it is really, really cool, and is sort of one step toward the goal of speech to speech recognition in real time that Google is aiming for. By testing pre-recorded videos, they can help refine the tech on something that isn't as vital or time sensitive, in order for it to be used in something that is—phone conversations.

Also cool, if your video gets captioned weirdly by Google's system, you can download the captions in plain text and correct the captions yourself. This is much easier than captioning from scratch.

If you want to have YouTube go and caption something you uploaded a few years ago—because they caption newly uploaded videos first—you can manually request that as well.

Advertisement

Update: In response to some of the comments, yeah, it may use the same system as Google Voice's transcription (not sure yet), but having more people come in and upload their correct versions of captions helps Google learn and improve their system faster, which helps all their speech-to-text services.

Update 2: I don't usually get emotional at press conferences, but watching the students from the California School For the Deaf talk about how the auto-captioning will improve their lives is kinda making me tear up. Right now, I think this is cooler than anything I've seen rolled out in the last few years.

Advertisement

Update 3: I asked if this was the same algorithm currently being used in Google Voice, and they yes, more or less, if you're talking about the base technology. Goog411 and Voice Search all have the same core algorithms, but each of these four have various conditions and issues that the algorithm needs tweaking to. So, you can probably expect a similar level of performance to Google Voice, or maybe even worse, if the videos have people who don't speak clearly, or multiple voices, or a noisy background.

DISCUSSION

bobman1235
TheBobmanNH

Is this the same engine that powers Google Voice? Because if so, saying it isn't perfect is about as much of an understatement as you can get. I've yet to see a single Google Voice transcript be even CLOSE to what it should be, even in the case of robotic automated messages that are designed to be clear as day.