HBO, DPL Execs: Preserving Metadata Still Challenging, But Hollywood’s on the Right Track

There continue to be challenges that content companies face when cataloguing unstructured metadata for preservation in distributed databases, but Hollywood is on the right track, according to executives at Digital Preservation Laboratories (DPL) and Time Warner’s HBO division.

Among the takeaways from their presentation at the recent Reel Thing symposium in Los Angeles was that migrating physical elements to homogenous, file-based structures can often leave out important pieces of legacy physical metadata that may prove important sometime in the future. New metadata is often generated, but usually not documented, regarding the decisions preservationists made at some point in time, they noted during the session “The Digital Post-It: Cataloguing Unstructured Metadata for Preservation in Distributed Databases Using Open Standards.” Associating those new categories of metadata with digital image and audio can be problematic.

At the same time, as preservationists increase the number of metadata fields, databases are quickly becoming burdened with expansive and redundant metadata that was never planned, and that leads to slow searches and new challenges for database backups and archiving. Systems and workflows often become tightly coupled, so one aspect of the system is heavily dependent on another to function.

“We’ve recognized that there is a problem in the preservation industry in that” the media asset management (MAM) and digital asset management (DAM) system that usually “contains the audio and video essence is often disjointed from the database that contains all the metadata,” Steve Kochak, president of DPL told the Media & Entertainment Services Alliance (MESA) after The Reel Thing presentation.

In response to that challenge, Kochak said: “We’ve sort of set out to try to figure out [if there is a way] to embed the metadata with the audio and video essence, so that instead of worrying about two disparate systems … we could have one singular thing that was most important, which would be the audio and video essence.”

In traditional MAM and DAM, “there’s usually a proprietary database that contains all of the metadata associated with the images and audio in addition to other things like proxies,” Kochak explained. But he added: “The problem is if that ever gets separated from your audio and video, you’re sort of in the dark because one system needs the other in order to have a successful preservation system.”

As he did during The Reel Thing presentation, Kochak pointed to the importance of the Archive eXchange Format (AXF), standardized by the Society of Motion Picture and Television Engineers (SMPTE) a couple of years ago. SMPTE introduced a new revision of that format this year, he told us, noting: “There’s a way to embed all of the metadata inside of these preservation objects, which would allow someone who had an archive system to switch from one MAM or DAM to another relatively easily because you have no complexity with migrating the database in addition to your audio and video. You only have to worry about your audio and video.”

Of course, there are standards all over the place today in terms of how we’re dealing with metadata. But “what’s right for one content owner may not be right for another,” Kochak noted. That said, he told us: “We like SMPTE just because it is so broadcast television- [and] feature film-centric. And we want to use those open standards. I should also mention that AXF is not just a SMPTE standard, it’s also an IEEE standard …. And it seems to be growing in momentum tremendously, especially in media and entertainment. Not all studios necessarily know they’re using it though because it’s the backend of a lot of other systems. So, I would say [that] we are headed down the right track, just because of the growing momentum behind it.”
There are, meanwhile, very specific challenges faced by companies like HBO when it comes to the huge amount of catalog titles in their libraries and the accompanying metadata for all that content.

The industry has been “managing the description of the assets separately from the assets themselves for a number of years in the analog world,” Randal Luckow, director of HBO Archives and Asset Management, told us. “It was easy enough to have a database that spoke to a single object: a one-to-one, a catalog record in a database that speaks to something on a tape or a piece of film or a reel. It becomes problematic when you have more than a whole-part relationship with your assets or if … the databases become disconnected.”

In addition, Luckow went on to point out, that model “doesn’t work in a digital realm at all.” That’s because, he said: “There’s all kinds of other relationships between whole-part or parent-child: There’s part-to-part. There’s synchronous and non-synchronous. There’s a number of other richer relationships that we aren’t taking advantage of in the way that databases are constructed. So, there’s a disconnect there already. The long-term preservation of these assets require that we look at archival principles, provenance, chain of ownership, original order, and we try to reflect these principles in the way that we manage our assets — whether they’re physical or digital, in the archive space.” Luckow also noted that archival principles, “which have a long history going back to the 1700s if not earlier, are actually still applicable in the digital space.”

Luckow added: “What we’re discovering in this history in the last 20 years of digital asset management is that if databases tend to become disconnected, maybe it makes more sense for us to just to, as an archival principle, to try to make our objects self-describing. And when we say that we want to make sure that the original order is captured because when we migrate a digital object from an analog object, we are changing the structure of that object.”

On a videotape, the video track and the audio track are “tightly coupled,” Luckow noted. But when we migrate those tracks, two separate tracks are created: the video essence and the audio essence, he said. Therefore, he told us: “We need to contextualize that in some way that allows us to – and the people who follow us – to understand how that original thing was constructed and what those relationships were.”