How Computer Vision Is Transforming Media

It's starting to be widely known among consumers that computer vision is changing everyday life—even if they don't necessarily know the exact term computer vision (or have a clue about how it works). Anyone who has seen an ad or a demo for the iPhone X, which deploys computer vision as part of its futuristic Face ID sensor system, understands that machines and algorithms are getting really good at "seeing"—and making sense of what they see. What's less widely known is how computer vision is transforming the media that streams across the mediascape onto our various screens—including smartphones, laptops, TVs and more. Here, a quick look at where media experiences powered and mediated by computer vision are now and where they're heading:



For the latest US Open Tennis Championships, the United States Tennis Association (USTA) teamed up with IBM Watson to automatically generate match highlights to be shared through social media and on the web. An AI-powered suite of solutions called Watson Media was deployed to look for and analyze images and video, as well as language, sentiment and tone—for example, sounds from the crowd that signaled awesomeness on the court, but also footage of winning plays, victory-like actions such as fist pumps by players, and facial expressions. The system was then used to rapidly generate shareable content.

Why it's revolutionary: At massive events like the US Open, immediately parsing all the raw video content being produced is beyond daunting (edited broadcast coverage from the Billie Jean National Tennis Center on ESPN and ESPN 2 alone totaled 150 hours). Computer vision can make sense of it all, helping to identify and package highlights in near-real-time. Surfacing Hidden Brand Exposure What value does a brand get out of visually associating its name/logo with a sports team or sports venue? Traditionally, brands were looking for broadcast exposure—for instance, having a brand's logo appear x number of times on screen during a game. But social and streaming media has completely changed the way images from sporting events make their way out into the world and into consumer consciousness. Using proprietary computer vision tech, GumGum Sports tracks logo impressions across traditional TV and online streams as well as social platforms including YouTube, Facebook, Instagram and Twitter and then calculates value on factors including prominence and clarity.

Why it's revolutionary: Social media exposure of sports team and venue signage now surpasses traditional broadcast exposure. With the "all-seeing eye" (and brain) of computer vision, marketers and rights holders can now fully gauge what consumers are actually seeing across all platforms, not just TV, and without necessarily requiring the presence of team- or brand-identifying hashtags or other text mentions. In a recent analysis of the U.S. Open, GumGum Sports found that sponsor value increased 82 percent when scanning non-owned social media accounts. That’s a lot of money that doesn’t have to stay on the table.



TheTake, a company that helps consumers find out where to buy stuff they see in movies and on TV during shows (as opposed to during commercials) has launched a platform called TheTake.AI that helps studios automate the process of figuring out merchandising opportunities from their content. TheTake deploys computer vision to detect when products—including fashions that characters and stars are wearing—appear on screen. Using an AI neural network, the system then matches those products with a database of more than 10 million products. So if a character on a Bravo show is wearing a Victoria Beckham strap dress, TheTake.AI can "see" it and automatically determine which retailers stock it.

Why it's revolutionary: Product placement has been a part of TV since the dawn of the medium, and more and more movies have been getting in on the act, but doing it right is labor-intensive and involves a lot of relationship management and negotiation done far in advance. Using computer vision to surface merchandising opportunities in existing video is a way for studios to build a new revenue stream after-the-fact.



How do viewers feel about, say, a commercial? You could ask them—which is what agencies and marketers have been doing for decades through focus groups and other such low-key discovery methods. Or you could just watch them—actually, let a computer watch them. A number of tech startups are deploying computer vision to do just that. Affectiva, an outgrowth of the MIT Media Lab, helps brands (including Mars, Kellogg’s and CBS) use computer vision and deep learning algorithms to make sense of viewer facial reactions as they watch content; the company's Emotion SDK (software development kit) works across mobile devices (with front-facing cameras) and standard desktop webcams. And TVision Insights measures what it calls "actual eyes-on-screen attention" as viewers watch TV shows and commercials (for clients including ESPN, Turner and McDonald's) by using computer vision (a mix of motion capture, thermal sensing, and lasers) to track and make sense of viewer face and eye movements, to see who is watching TV and for how long. The company uses Microsoft Kinect devices installed atop participants’ TVs, along with custom software, to do that.

Why it's revolutionary: You may be tempted to dissemble in a focus group (due to, for instance, peer pressure or other group dynamics), but your eyes—and your face—don't lie. Flinch or frown while watching a commercial? Or momentarily lose interest in a TV show (and switch focus to texting on your phone)? Computer vision can translate that behavior into quantifiable data.



Speaking of the iPhone X, Face ID is just the beginning of how computer vision is deployed in Apple's latest mobile wonder. Within a day of the phone's release in November, University of Sydney doctoral student Mia Harrison and her colleagues released a short video set to the Queen classic "Bohemian Rhapsody" and starring animated iPhone X emoji. The mini production took advantage of the iPhone X's “Animoji” feature, wherein the phone's camera can capture your facial muscle movements using computer vision and then transpose them into facial expressions of animated emoji. Versions of this sort of technology have been used before at great expense—for instance, to painstakingly animate the faces of apes in the recent "Planet of the Apes" remake—but now the tech is literally at your fingertips and fits in your pocket. Animoji leverages the power of the iPhone X’s A9 processor and Apple’s ARKit (Augmented Reality Kit) software framework. (Google has a competing framework for Android called ARCore.) Mia Harrison, by the way, is not, as you might suspect, a computer scientist; the degree she's working toward is in gender and cultural studies.

Why it's revolutionary: As famed tech writer and former Wall Street Journal and Recode columnist tweeted, “Genius iPhone X creation! And it’s only been out a couple of weeks. Watch this (or any of the thousands of other animoji karaoke videos that have already cropped up online) and imagine what’s coming with this technology.” To put this all in perspective, in 2015 a critically acclaimed indie movie called “Tangerine” debuted at the Sundance Film Festival; the narrative feature was shot almost entirely on an iPhone 5S in a triumph of DIY filmmaking that’s inspiring a new generation of indie auteurs. Computer-vision-powered Animoji-type software applications promise a similar DIY revolution in the digital-animation movie market currently dominated by Disney Pixar and its peers.

Illustrations by Neil Stevens