Today, YouTube is rolling out automatic captioning for all videos uploaded to the service, using Google's speech recognition service. You can see a demo in the video above.
Automatic captioning with Google speech recognition was launched in November. This only had a few selected education partners to test out automatic captioning, until now.
There are many reasons for captions on every video: ESL viewers, people in other countries, searchability, not wanting to disturb others, loud locations and automatic translations to other countries.
The captioning won't be perfect, since Google's speech recognition isn't perfect, but it is really, really cool, and is sort of one step toward the goal of speech to speech recognition in real time that Google is aiming for. By testing pre-recorded videos, they can help refine the tech on something that isn't as vital or time sensitive, in order for it to be used in something that is—phone conversations.
Also cool, if your video gets captioned weirdly by Google's system, you can download the captions in plain text and correct the captions yourself. This is much easier than captioning from scratch.
If you want to have YouTube go and caption something you uploaded a few years ago—because they caption newly uploaded videos first—you can manually request that as well.
Update: In response to some of the comments, yeah, it may use the same system as Google Voice's transcription (not sure yet), but having more people come in and upload their correct versions of captions helps Google learn and improve their system faster, which helps all their speech-to-text services.
Update 2: I don't usually get emotional at press conferences, but watching the students from the California School For the Deaf talk about how the auto-captioning will improve their lives is kinda making me tear up. Right now, I think this is cooler than anything I've seen rolled out in the last few years.
Update 3: I asked if this was the same algorithm currently being used in Google Voice, and they yes, more or less, if you're talking about the base technology. Goog411 and Voice Search all have the same core algorithms, but each of these four have various conditions and issues that the algorithm needs tweaking to. So, you can probably expect a similar level of performance to Google Voice, or maybe even worse, if the videos have people who don't speak clearly, or multiple voices, or a noisy background.