6 Cool Uses of Computer Vision In Video

Computer vision has evolved at a rapid pace in the past five years, making a lot of basic image recognition tasks more accurate and prevalent than ever, so much so that it’s being offered as an off-the-shelf commodity by tech giants such as Amazon, Google, IBM and Microsoft. Getting machines to identify what’s going on in videos, however, is infinitely more challenging, since you’re not only dealing with just objects, faces and landscapes in a single image, but chronology and time as well—actions, narrative, point of view. The good news is that the same developments that have supercharged the current image recognition boom—namely, better and more training data and faster and cheaper computing power—are also helping advance the application of computer vision to video. It’s early days still—most of the current state-of-the art remains image recognition as applied to individual frames in videos—but there’s no doubt that this an exciting next frontier application of computer vision, and the number of startups and established players innovating in the space is growing. Here are six captivating examples:

Customer and visitor analytics from existing camera footage

There are more than 245 million video surveillance cameras around the world capturing everything from traffic on to roads to shoppers in malls, but a mere one percent is actually being looked at and analyzed. This according to Aura Vision Labs, which recently debuted a computer vision technology that is capable of looking at CCTV or other public camera footage and recognizing gender, age and clothing styles of people, even in crowds. That might sound creepy, but the kicker is that it’s completely anonymous–no facial recognition here–making it compliant with new privacy rules such as GDPR, among others. Aura plans to apply the technology to retail situations, where computer vision can be more accurate than current methods such as beacons (which require participating cell phones) and loyalty cards (which require sign-up and actual usage at the cash register). Computer vision can give a much more accurate analysis of customer behavior.

Real-time metrics for real-world ads

Print ads and those found on billboards, cars, walls or other real-world public spaces have been effective ever since advertising’s earliest days, but they’ve struggled to produce deep performance metrics, not to mention efficient real-time ways of purchasing, when pitted against their digital counterparts. Launching in June, Blimp combines existing and publicly available data from CCTV, satellite and traffic sensors (electromagnetic coils and webcams) with data from proprietary devices such as the pocket-sized Blimp radar (which tracks mobile phone WiFi signals) and the Blimp headcounter (a computer vision-enabled video camera attached to, say, a billboard, to literally count heads and how long they are looking at real-world ads up to 165 feet away). Blimp also serves as a marketplace for ads on these spaces, enabling literally anyone to turn their car, house, wall, or even t-shirt into a buyable and trackable real-world ad. Thanks to computer vision’s monitoring of video, among other sensor and analytical systems, offline ads are suddenly on again.

Real-time logging, editing and analysis of TV and sports video footage

You’ve probably seen the great job that apps such as Google Photos, Apple Photo’s Memories and Magisto do with automatically sorting through your smartphone pictures and video, capturing the best moments and automatically editing them into short movies for you. That same kind of computer vision process, albeit trained to look for very different things, is being applied to millions of hours of video created at sports events and on shows such as Big Brother, which has cameras on its captive contestants 24/7.

Previously, human loggers would have to do their best to find winning plays or moments of reality show gold in all these hours of video. Now, computer vision is tackling the job. At the most recent U.S. Open, IBM Watson applied computer vision to more than 320 hours of tennis match footage in order to find clips of winning plays and compelling moments, using models that looked for such characteristics as fist pumps by players and cheering crowds in the video. Clips were then automatically edited into highlight reels by Watson—again using computer vision--in less than five minutes, on average, and immediately made available to broadcasters, who were able to share the videos with viewers up to 10 hours faster than before. AI and computer vision is also coming into play at the upcoming Ferrari Challenge North America Series, where drone footage of the races will use object recognition on live video to detect racecar positioning, then use that to provide insight to drivers in real time as they do the race, not to mention stats and distinct viewpoints (including automatically edited highlights) for viewers.

Endemol Productions, which produces Big Brother in dozens of markets across the globe, is leading the way for AI and computer vision techniques in reality TV programming. The company gathers hours and hours of footage from an array of 4K HD cameras on the show’s set. Its Microsoft Cognitive Services-powered technology then applies computer vision, facial recognition, and natural-language processing (NLP) to the footage, cross-references that with data from biometric sensors (to flag the heartbeats of contestants engaged in “drama”), and logs anything that might be of interest in order to create highlight reels that show producers can use for the main storyline or to distribute to social media.

The hours and hours of video generated by sports and reality TV is only set to grow in the future. Case in point: Nikon’s MRMC subsidiary has developed a robotic camera system called the Polycam Player, which uses image recognition and AI to follow specific players’ faces on the field in order to capture specific gameplay in tight angles that are more challenging (and expensive) for humans to do. In a world of robotic cameras and computer vision video scanning, machines become not only part of the solution, but also the production.

Sports sponsorship valuation

Professional sports is another area where an overload of previously ignored data—in this case, non-team-owned fan and highlight stills and videos shared on social—has demanded a computer-vision approach. GumGum Sports focuses on sports media valuation, using advanced computer vision to look for visible signage across television, streaming and social channels to find the full media value of a brand’s sponsorship. “Advanced” in this case means image recognition that can recognize logos in stills and videos even if they’re dimly lit, partially blocked by something else, or at an angle. In sports clips, the technology also looks for specific plays--3-point shots, dunks, pick-and-rolls--that are more likely to elicit views and engagement, as well as for other criteria such as how visible a logo is, how much of the frame it takes up, how often it shows up, and where in the clip it’s located. These findings are then used to tell a brand, team or league how much their product placement is really worth to better inform their next sponsorship contract.

In particular, the technology has found a lot of untapped value in non-owned social media accounts. GumGum Sports’s analysis of the NBA’s recent foray into brand sponsorships of team jerseys found that 80 percent of teams’ sponsor value comes from Instagram, and that an additional $350M in value will go to sponsors who take this computer vision analysis of non-owned social media photo and video shares into account.

Brand safety

The one-year-old startup Uru uses computer vision technology to place logos in the white spaces and other superimposable parts of videos—say, a Vans logo on a skateboard or t-shirt in user-generated X Games videos on YouTube—which is an extremely cool use this technology, but it’s diving even deeper into much-needed brand safety endeavors. Its technology scans the video content of clips for brand safety, ensuring that sponsor ads aren’t positioned as pre-roll or next to something controversial, and, increasingly, fake. “Brands are very hip to the fact now that there's all kinds of garbage out there in the video world, and they want to make sure that they're serving ads against stuff that lifts their brand or at least doesn't harm it,” says Uru cofounder and CEO Bill Marino. “We produce tags of what’s inside the video, the objects, the themes, and the brand, and then produce a brand safety analysis that identifies videos that are safe, free of profanities, hate speech, cyberbullying, weapons, and other things that brands don’t want to be advertising next to.” As with all AI, this approach also relies upon NLP and existing metadata, but computer vision is able to find things that, say, a video with a few slightly altered keywords that bad actors use to slip past Google and Facebook’s content filters might not.

Relevant real-time video search

Finding pictures you took in front of the Eiffel Tower five years ago is as simple as typing “Eiffel Tower” into Google Photos and instantly getting a filtered list of those images. This thanks to the service’s application of image recognition to your photo collection to automatically identify what’s in images and then categorize and tag them based on such factors as people, objects, landscapes and famous landmarks. Best of all, you didn’t have to take the time out to search for and organize those photos yourself. Now the same process is being applied to video, which until now has relied on existing tags and titles provided by people and databases such as IMDB. Television maker Hisense recently announced a partnership with AI hardware manufacturer Yi+AI to create a next-generation, computer vision-based video search on smart TVs, which will enable users to get instant listings of, say, every movie or scene with Meryl Streep available on live, streaming, cable or pay-per-view channels, and then jump right to them.

On the enterprise side, video search platform Vidrovr uses a mix of image-, audio-, and other recognition methods to automatically identify and classify what’s going on in videos—everything from faces to objects to actions. It then provides this technology to publishers and other content creators who want to find and add relevant video next to articles or other content they are publishing, video that might not have been found previously because it simply wasn’t pre-tagged according to whatever specific topic is in an article.

These are just a few examples of how computer vision is being applied to video; new breakthroughs and innovative uses crop up weekly, especially as the growth of video continues on apace and the reliability of increasingly alterable video is called into question. Ironically, so much of today’s video still goes unwatched, unused and unmonetized. But thanks to computer vision and the work of a growing list of enterprising technologists and entrepreneurs in the space, a lot of that video won’t end up on the proverbial cutting room floor.