Hotdogs, Emoji and Faces: How AI Is Learning to See


If you’ve noticed a huge surge in companies saying they’re using artificial intelligence lately, you aren’t alone. Even simple-sounding tech is suddenly “built on AI”, such as the hotdog identifier Not Hotdog or Google’s Allo chat app, which turns selfies into custom emoji.

If you’re the skeptical type, you’re probably distrustful of these claims. Buzzwords like “big data” and “cloud processing” were used by pretty much every startup in their heyday, too. But some of these whimsical apps are surprisingly legit.

Not Hotdog, for instance, is built on top of Google’s machine learning library, Tensorflow, plus the open source Keras neural network. Over several months, its developers tested multiple frameworks, trained their AI on edge cases, and figured out how to apply the results using only mobile phone processors.

Small applications like Not Hotdog offer a window into the larger challenge of accurately tagging thousands of distinct objects, which we’ve spent years on at GumGum. Eye color, for example, is tricky for machines. A convolutional neural network learns to recognize eyes by breaking images of faces into pieces, examining each piece separately, then merging the resulting data to consider as a whole. Do the pieces have characteristics of a face? If so, which is the eye, and what color is it? Once it has the answer, we can overlay a glow of the correct color and size on a person’s iris in an image.

While each neural network is its own unique model, some details are consistent: neural networks, for instance, loosely mimic the way our visual cortex processes visual information, thus some of the underlying mathematics is shared across applications. Several of the most important ideas can be learned from The Visionary’s micro-course on machine learning and computer vision.

The web is going visual: YouTube reports a billion hours of video watched every day. Instagram gets 95 million uploads per day. Mary Meeker’s Internet Trends 2017 report notes that images and video will soon be the core tech for search, augmented reality, social networking and advertising. Computer vision is needed to assist all of these uses, as well as important real-world applications like autonomous vehicles.

So the profusion of computer vision apps and developers working on them actually couldn’t come at a better time. For more of the developments in computer vision, subscribe to our newsletter.

This article originally appeared on Engadget.

Illustrations by Sergio Membrillas