Keeping up with the alphabet soup of techie terms and jargon for all our gadgets today seems like a full-time job, especially as once-esoteric topics make their way into everyday kitchen-table conversation. Among the most hyped is surely artificial intelligence (AI), which has become a catch-all term for describing a range of technologies that allow machines to essentially “think” by processing data on their own.

One of the most powerful and compelling types of AI is computer vision—which you’ve almost surely experienced in any number of ways without even knowing it. Here’s a look at what it is, how it works, and why it’s so awesome (and is only going to get better).

What do you mean by computer vision?

Computer vision, or CV for short, is an academic term that describes the ability of a machine to receive and analyze visual data on its own, and then make decisions about it. That can include photos and videos, but more broadly might include “images” from thermal, or infrared sensor, detectors and other sources. CV is already in use for a number of purposes, but on the consumer level, it is already relied upon by remote control drones to avoid obstacles, as well as by cars from Tesla and Volvo, among others.

Why do we need computer vision anyway?

CV allows computers, and thus robots, other computer-controlled vehicles, and everything from factories and farm equipment to semi-autonomous cars and drones, to run more efficiently and intelligently and even safely. But CV’s importance has become even more obvious in a world deluged with digital images. Since the advent of camera-equipped phones, we’ve been amassing astonishing amounts of visual imagery that, without someone or something to process it all, is far less useful and usable than it should be. We’re already seeing computer vision help consumers organize and access their photo collections without needing to add tags in, say, Google Photos, but how to stay on top of the billions of images shared online every day (approximately 3 billion, according to Mary Meeker).

To get an idea of how much we’re talking about here, last year photo-printing service Photoworld crunched the numbers and found it would take a person 10 entire years to even look at all the photos shared on Snapchat—in just the last hour. And of course, in those 10 years, another 880,000 years' worth of photos would have been already been spawned if things continue at the same rate. Simply put, our world has become increasingly filled with digital images and we need computers to make sense of it all—it’s already well past human capabilities to keep up.

“it would take a person 10 entire years to even look at all the photos shared on Snapchat—in just the last hour”
"Microsoft recently created an algorithm that incorrectly identified what was in pictures just 3.5 percent of the time. That means it was correct 96.5 percent of the time."

How does computer vision work?

On a certain level CV is all about pattern recognition. So one way to train a computer how to understand visual data is to feed it images, lots of images—thousands, millions if possible—that have been labeled, and then subject those to various software techniques, or algorithms, that allow the computer to hunt down patterns in all the elements that relate to those labels. So, for example, if you feed a computer a million images of penguins, it will subject them all to algorithms that let them analyze the colors in the photo, the shapes, the distances between the shapes, where objects border each other, and so on, so that it identifies a profile of what “penguin” means. When it’s finished, the computer will (in theory) be able to use its experience if fed other unlabeled images to find the ones that are of penguins.

Microsoft recently created an algorithm that incorrectly identified what was in pictures just 3.5 percent of the time. That means it was correct 96.5 percent of the time.

Fortunately, some of the geniuses at Google thought up another option: Back in 2012, they fed a computer loads and loads of images and let it figure out patterns on its own and see what happened—a process dubbed deep learning. Turns out that, with good enough algorithms, computers are able to find patterns on their own and begin to sort through images without requiring humans to handhold along the way. Today, some deep learning algorithms are surprisingly accurate.

What exactly can a computer decipher in an image?

At this point, almost anything, once a computer has been trained. On a simple level, a lightweight version of this AI is used already by digital cameras, even the one on your smartphone, to determine that, say, the batch of colors in the center of the image is actually a face and therefore the point that it should rely on to adjust focus and exposure. If you’ve ever used the Google translate app, you may have discovered the ability to point your smartphone’s camera at text from any number of languages and have it translate to another language onscreen almost instantly. That’s a form of “augmented reality” (AR), when computer vision -- specifically, optical character recognition -- enables an accurate translation that’s then transformed into an overlay onto the real world (essentially, the translated text in place of the original text). It’s so simple and instant that it’s easy to forget this is a mind-blowing capability in nearly everyone’s pocket, and a glimpse of the power of computer vision in action.

More advanced CV-enabled computers today not only know that there are different objects in a given image, but can actually understand what these objects are, so not only faces, but also vehicles, trees, buildings, birds, money, and on and on. Even when these objects are partially obscured or displayed at an angle (a process known as occlusion). It’s still early days, but we’ve begun the process when computers will have a functional understanding of the world around us.

“the ability to point your smartphone’s camera at text”
Does this process take a long time?

Does this process take a long time?

Increasingly, no. That’s the key to why computer vision is so thrilling: Whereas in the past even supercomputers might take days or weeks or even months to chug through all the calculations required, today’s ultra-fast chips and related hardware, along with the a speedy, reliable internet and cloud networks, make the process lightning fast. Once crucial factor has been the willingness of many of the big companies doing AI research to share their work—Facebook, Google, IBM, and Microsoft, notably—by open sourcing some of their machine learning work. This allows others to build on their work rather than starting from scratch. As a result, the AI industry is cooking along, and experiments that not long ago took weeks to run might take 15 minutes today. And for many real-world applications of computer vision, this process all happens continuously in microseconds, so that a computer today is able to be what scientists call “situationally aware.”

What are the practical uses of computer vision?

The computer vision and hardware market is expected to reach $48.6 billion by 2022, so the sector is growing. And this is where we get to the really good stuff: There is almost no end of uses for computer vision. Think of any futuristic situation, and there’s likely a computer vision-related solution that can or will someday be applied. Take those fancy Tesla cars you’ve heard so much about: They rely on a host of cameras as well as sonar, that not only prevent your car from drifting out of a lane, but are able to see what other objects and vehicles are around you and also read signs and traffic signals. In fact, Tesla’s cars actually look under the car in front of you to the car ahead to take into account traffic patterns. Similarly, as reliant on technology as today’s healthcare already is, computer vision will enable new ways of doing diagnostics that are closer to Star Trek to analyze X-rays, MRI, CAT, mammography, and other scans. (After all, some 90 percent of all medical data is image based.)

90 percent of all medical data is image based.

And computer vision will also help make robots and drones an ordinary part of everyday life. Imagine fleets of firefighting drones and robots sent into wildfires to cut down trees and guide water delivery. Or fleets of drones sent to search for lost hikers, or earthquake survivors, or shipwrecked sailors. In fact, drones are being used to help farmers keep tabs on crops, but satellites as well can help farmers manage their fields, look for signs of drought or infestation, perhaps even analyze soil types and weather conditions to optimize fertilization and planting schedules.

In sports, computer vision is being applied to such tasks as play and strategy analysis and on-field movement in games, ball and puck tracking for automated camera work, and comprehensive evaluation of brand sponsorship visibility in sports broadcasts, online streaming, and social media. No surprise here, considering the sports and entertainment market is expected to grow to $1.37 billion by 2019.

And finally, new forms of personal technology will appear, and even new types of media, similar to the way movies and TV were inventions of the last century. Immersive technology that makes the viewer feel physically transported are arriving already in the form of virtual and augmented reality, which is familiar to anyone who has witnessed frenzied Pokémon Go players searching for imaginary monsters in the real world using their phones. That’s rudimentary tech, but it shows how convincing and satisfying it can be—wait until we all own VR goggles.

See how computer vision is used in marketing and advertising.

For the latest in artificial intelligence, computer vision and image recognition, visit

Contact us