M&E Journal: Applying Machine Learning and Analytics to Maximize the Value of Your Media Assets

By Hector Leano, M&E Marketing Global Industry Lead, Amazon Web Services

Digital has disintermediated content creators, distributors, and consumers, overturning traditional media business models. Strategy Analytics estimates that the total market revenue for global TV, video subscriptions and advertising will grow by nearly $70 billion from 2017 to 2022, with 90 percent of that growth coming from OTT alone.

Audiences, meanwhile, will continue to expect a steady stream of high-quality content for a variety of screens and form factors. In this context, M&E firms are looking not just for cost-saving efficiencies, but for new revenue streams for their content in these new direct-to-consumer mediums.

Research from IDC shows that unstructured content accounts for 90 percent of all digital information locked in a variety of formats, locations, and applications made up of separate repositories. When connected and used properly, such information typically can help increase revenue, reduce costs, respond to customer needs more quickly and accurately, or bring products to market faster. This is leading to a growing wish among media companies to have unified search/metadata across all forms of their assets.

As most media content (video, images, audio, caption and metadata files) is unstructured, M&E firms therefore stand to be one of the largest beneficiaries of cloud-based data and content lakes combined with machine learning (ML)-aided metadata extraction and analytics. As background, unlike early AI — which sought to imitate human intelligence by following human-programmed rules — ML performs a variety of applications (e.g., speech-to-text or image recognition to extract interesting areas in a content) without having to be programmed by a human being.

ML does this by creating statistical models derived from both structured and unstructured data.

Based on successful client work, we have identified three main areas for leveraging ML to optimize the value of media content:

Know your content: Understand what’s in your content archives and make it easily searchable and accessible to immediately find the needle in a haystack (really a specific needle in a stack of needles) via a unified metadata repository.

Know its value: Better price your content and ad inventory through deeper insights and machine learning into usage patterns and audience behavior.

Know your audience: In a sea of choice, give your audiences the most relevant content and advertising for them based on smarter customer segmentation and content insights.

Know your content

For M&E companies with large content archives, the principal challenge is surfacing the most relevant content for use by producers or licensors. Taking it further, some customers such as sports leagues (that have access to decades’ worth of content) want to make their content archives available to their fanbase via an API so that the entire archive is searchable by the end user and licensable via an API. Unfortunately, most content metadata has been manually entered across different systems until recently, making it too resource intensive to be practical for owners of large media libraries to get a thorough understanding of what is in their archives.

Machine learning applications such as facial recognition for celebrity detection can automate the generation of searchable metadata tags by timecode, enabling new levels of search and discovery in content libraries. Modern cloud-based asset management systems leverage these additional metadata and can provide a unified search interface.

For example, Videofashion is the world’s largest fashion video licensor. Working with AWS partner GrayMeta, it was able to automate generating time-coded metadata for thousands of hours of fashion show footage going back four decades.

Videofashion is leveraging the GrayMeta API for a web portal that allows licensees to easily search for individuals by name (either separately or when they appear together) and metadata that was across multiple systems prior.

With this new system, licensees are able to pull up only those relevant clips. They are then able to download and license the relevant clip for use directly from Videofashion’s web portal.

Know your content’s value

For ad-supported broadcasters, setting ad rates can be as much art as it is science. This year, an AWS broadcasting customer decided to take a closer look at its CPM ad rates across a variety of its networks, aggregating its disparate first- and third-party data, including digital and analog viewership data, customer segmentation, and historic CPMs for a variety of dayparts across several networks.

Conventional analytics approaches had a difficult time working through such different sets of structured and unstructured data. However, the customer saw an opportunity with machine learning to spend less time cleaning data sets and more time deriving insights.

Using Amazon SageMaker, its data science team was able to build, train, and deploy a self-tuning pricing model in weeks instead of months. Through this model, it discovered an opportunity to increase CPM rates for its tier-2 and tier-3 ad inventory, which up to that moment had been underpriced relative to advertiser value for its audience. In aggregate, this opportunity was greater than total ad revenue from tier- 1 inventory.

Know your audience

Subscription revenue models require building a long-term relationship with the consumer so that he or she feels like there is a steady stream of new, high quality content that interests them enough to find value in subscription renewal. Even for advertising supported businesses, increasing engagement not only leads to increased monetizable impressions, but as advertisers move from beyond quantity to quality of engagement, increased time on site and complete pages/ video views assure advertisers they are reaching highly engaged, valuable audiences.

Digital publisher Focus Online, the leading digital news magazine in Germany with over 24 million unique monthly readers, needed to enable large scale personalization and content recommendation.

First, it moved its content, archive, and data storage to cloud-based content and data lakes. This enabled audience data in its CRM to immediately serve relevant content for large scale personalization.

Lambda and step functions tag content for topics which then allowed the publisher to quantify its immediacy/expiry and redundancy with other content on site. This was then matched against audience segments to elevate the next most relevant article for that unique user based on segmentation (e.g., news junkie, sports lover, etc.). After its first test, this algorithm increased click through rates by over 50 percent. Based on the success of the initial trial, Focus Online is expanding the use of analytics-derived recommendation algorithms, including algorithm enhancements using Amazon SageMaker.


API services have made machine learning easy for data scientists and non-data scientists alike to take advantage of applications like computer vision and natural language processing, but the greatest dependency comes having an architecture that can support modular services. Does your architecture support quick testing and deployment of different services to decrease the lag from idea to proof of concept to full rollout?

Can your content and data lakes easily interact with each other so that data insights feed into the viewer experience (i.e., personalized merchandising and advertising)? Taking advantage of the automation and augmentation possibilities of machine learning is less around designing for specific use cases and more about designing for fast change and development in this space. As more and more ML-enabled analytics uses can be applied to your content and user data, will you be able to take advantage of it?


Click here to translate this article
Click here to download the complete .PDF version of this article
Click here to download the entire Spring/Summer 2018 M&E Journal