Microsoft just launched a new online app that offers to try and understand the contents of your photographs and write captions for them. And it’s surprisingly impressive—most of the time.
You can simply upload an image to CaptionBot and have it return a description for you. (It’s worth noting that Microsoft will hang on to any images you upload to learn from in the future, though it says it doesn’t capture any personal information.) Its results are pretty impressive, as you can see in some of these images.
It’s not the first AI to write captions. A couple of years back, Google announced it had written an impressive series of algorithms that did the exact same thing. Microsoft’s offering—which is very neatly packaged compared to others that have gone before it—works in much the same way.
It combines two neural networks: One deals with image recognition, the other with natural language processing. By studying enough labelled images, the software works out how to pair up image features with human descriptions of what they show, then replicates that process when presented with new images.
It doesn’t always work perfectly, making some mis-steps over more abstract images. But on the whole it’s pretty damn impressive and when it does go wrong you can kind of see how it got confused.