HBO Looks to Demystify Language Metadata

Yonah Levenson, manager of taxonomy for HBO, and Laura Dawson, metadata analyst for the company, have a unique — and absolutely vital — job: make sense of multiple language standards, and help HBO meet global distribution requirements around languages for audio, text, packaging and more.

“We’re the crossroads for the company, where metadata request terminology vetting comes in, we have a very hard line in the sand … we don’t deal with the data, we’re all about the terminology,” Levenson said July 24, during a presentation at the Smart Content Summit East event, part of the Media & Entertainment (M&E) Day at the Microsoft Conference Center In New York.

HBO began confronting potential language metadata problems not too long ago, after it saw inconsistencies across systems and departments, with requests for solutions coming from s wide range of departments, from production to archive. When it became clear HBO needed a single, unified standard of language terminology, Levenson, Dawson and their team adopted a new standard: IETF BCP 47 (Internet Engineering Task Force Best Common Practices), which allows the company to organize language metadata from a preliminary list of nearly 140 languages and dialects.

Now, terminology used across the company is consistent, which helps with everything from analytics to programming, reducing both time and resources spent, and calling languages what they should be called. That terminology has become the source of truth across the company, including the language metadata terminology for audio, subtitles, closed captions, rights and licensing.

“It’s a standard of standards, but taking it to the next level, so we can get the level of granularity we need within the company,” Levenson. “Now that we have this, we don’t need to spend time within the company meetings saying ‘What are we going to call Latin American Spanish?’”

The solution includes a taxonomy tool which lets people enter the information on a term-by-term basis, where terms can be related to each other. “We have documentation, we have provenance, we have a record showing this is what it’s called, this is how we map it back to the standard, and then we can share that information back out with the developers,” Levenson added.

Dawson said the standard helps HBO confront potential language metadata problems from the outset: “At the stage of acquisition we don’t specify, ‘OK we’re going to broadcast Spanish language programming in these countries,’” she said. “We just say, ‘OK, we’re acquiring Spanish language rights for the content.’ But when it comes down to distribution, we need to know what kind of Spanish are the end users consuming.”

Levenson said she hopes other content owners and distributors look at HBO’s solution as a model. “I would encourage all of you, if you don’t have a group like ours, to really think about getting someone like us inside, who understands hierarchies — even if you don’t always want to use them — to establishing the different types of relationships through an ontology, so this way the data then has more value out,” she said.

On Aug. 7 the Media & Entertainment Services Alliance (MESA) published an HBO-developed Language Metadata Table (LMT), developed around the standard, with the goal of uniting data specialists with a single, open-source table of language metadata values for the media and entertainment industry.

The 2018 M&E Day also included Content Protection Summit East and Entertainment Production in the Cloud (EPIC) conference tracks, providing M&E technology teams insights into the creation, production, distribution, security and analysis of content.

The conference was presented by Microsoft, with sponsorship from IBM Watson Media, Amazon Web Services, IBM, LiveTiles, Microsoft Azure, NAGRA, NeuLion, Ooyala, EIDR, GrayMeta, MarkLogic, Qumulo, Avid, Cloudian, SoftServe and TiVo. The event was produced by the Media & Entertainment Services Alliance (MESA), the Content Delivery & Security Association (CDSA), the Hollywood IT Society (HITS) and the Smart Content Council.