Microsoft’s new ‘Seeing AI’ algorithm exceeds human accuracy
Microsoft rolled out an evolution of the algorithm they currently use to caption images. The new algorithm actually exceeds human accuracy for certain limited test cases.
The new benchmark
According to Microsoft, it will be incorporated into the company’s assistant app for the visually impaired as well as its larger suite of Office products.
The image captioning technology fills the need for image information on the web and in documents with no alt text. Saqib Shaikh, a software engineering manager with Microsoft’s AI team in a press statement:
“Ideally, everyone would include alt text for all images in documents, on the web, in social media — as this enables people who are blind to access the content and participate in the conversation. But, alas, people don’t. So, there are several apps that use image captioning as way to fill in alt text when it’s missing”.
According to Microsoft, the new algorithm is twice as good as the current one. This will mean a huge improvement in user experience for those using apps like Microsoft’s Seeing AI.
What is Seeing AI?
In short, Seeing AI is an accessibility app that describes the world as seen through a smartphone camera.
What’s most impressive about Microsoft’s algorithm is that not only is it able to identify people and objects, it’s also able to identify relationships between objects.
This important as it’s able to give context to a room and its occupants rather than just creating an inventory of objects it recognises. In practice, this means that the algorithm is able to describe someone as sitting in a chair or reading a book.
This additional layer of information means that apps enabled with the algorithm can give meaningful information about images to assist users.
While we still need to see how the new algorithm performs in the wild so far, it has outperformed the previous best “nocaps benchmark scores for image captioning in a pre-print paper published in September.
‘Surpassing human performance’
The nocaps benchmark rates algorithms for their ability to caption images as compared to a dataset of 166 000 images captioned by humans. As Harsh Agrawal, one of the creators of the benchmark told The Verge:
“Surpassing human performance on nocaps is not an indicator that image captioning is a solved problem”.
Argawal noted that the metrics used to evaluate performance on nocaps “only roughly correlate with human preferences” and that the benchmark itself “only covers a small percentage of all the possible visual concepts”.
Despite this, Microsoft is hopeful that even if they don’t achieve overall results that are twice as good in everyday use, the new algorithm will still provide more value for visually impaired users.