HITS

M&E Journal: Contextual Awareness and Creating the Eternal Archive

By Erik Weaver, Director Product Marketing for M&E Solutions, Western Digital Corp.

The M&E industry is undergoing a paradigm shift in data storage, driven by massive volumes of digital content generated from much larger assets, and is continually growing in both size and density. Petabyte- scale storage architectures and new data-intensive formats are required to address the creation, manipulation and transmission of this digital data in support of initial production efforts, collaborative editing and post-production activities. The content can be mined in the future with the hope of finding something important. In an industry where content is paramount, it’s never been more important to develop new ways to archive and protect digital assets (and that’s any piece of digital content: a clip, a movie, a video, a short, a segment, a photo, could even be an article or a series of tweets).

As data sits in a storage repository waiting to be accessed, the longer it sits, the higher the probability that it may experience a gradual deterioration. This slow data decay is known as bit rot, and can affect data access and transfer performance, as well as data integrity that can lead to system crashes and storage device failures.

Technology obsolescence is another challenge that impacts data storage, as software versions used to create digital assets become out-of-date and inconsistent with current formats. Known as the digital dilemma, bit rot and technology obsolescence are challenges to archiving and protecting digital assets over the long-term.

Like the division of church and state, there is a significant difference between content and the medium for which the content is stored that must be treated separately and requires a future-proof strategy that archives data to last forever. To address the digital dilemma, and achieve long-term data archival, there are industry standards developed to help meet these objectives.

Future-proofing digital assets

The most important of these standards is the Archive eXchange Format (AXF), or SMPTE standard ST 2034-1:2014. This open-file format standard supports interoperability amongst disparate storage systems and ensures the long-term availability of digital content no matter how the storage or file systems evolve. AXF is used in back-end server applications, transparent to users, and a final step in the data storage process.

There are front-end archiving formats that users can interact with directly, such as the Metadata Exchange Format (MEF), or SMPTE standard ST 377-1:2011. This file format enables metadata to be exported as an MEF file for importation on any platform that supports the standard. It can be used as an interoperability format between different platforms, or can help categorize metadata to create relationships.

Another issue to address when protecting digital assets is the accurate identification of data across a wide enterprise domain. There are many identification systems in enterprises today that use a series of file names, URLs, UUIDs, UMIDs, and others, to identify data. Unfortunately, many modern-day servers and storage systems within the enterprise cannot easily identify, or discern, these worldwide assets.

It is for this reason that the Cinema Content Creation Cloud standard (SMPTE standard ST 2114-2017), or C4, was instituted. It provides an unambiguous, universally unique identifier that can be assigned to any file or block of data resulting in universal consistency. Given identical files from two different groups, the C4 ID spec would be used to discern the identity of each without the need for a central registry.

In quick review, the AXF standard is used for long-term archiving — the MXF standard to import or export metadata on any platform — and the C4 ID standard to discern digital files with a unique identifier.

Metadata tagging and the fundamentals of taxonomies

The next step to archiving and protecting digital assets is to break down the data into categories that are searchable, discoverable, and understandable. It’s not enough any longer to simply say “I saved it!” but more about being able to prove that you’ve saved it, understand what you’ve saved, how to get to it, and how to extract value from it. Assigning different types of metadata to digital content through taxonomy and ontology supports these scenarios.

In simplest terms, taxonomy defines data, whereas ontology is the relationship of data in a group. For example, the taxonomic definition of a bald eagle is a bird. Its relationship amongst other birds in a group is as a predator, carnivore, egg-layer, tree-dweller, etc. These types of descriptions and classifications are created through metadata, and identify digital M&E content, as the tribal knowledge days of calling up the person who took the photo, or shot the film, to ask for the content is no longer a reality.

Metadata can include information about almost anything — the content creator, equipment used, date and time that data was captured, the location of the capture, etc., and as such, there are types of metadata that can be assigned to an asset:

• Descriptive identifies digital data using titles, abstracts, authors, and keywords;

• Technical describes information to access digital data, such as where the data resides or the native structure of the data itself;

• Transactional describes information regarding an online transaction when processed.

Once a storage device or repository is populated with digital data, appropriate templates, fields, tags and link groups are applied to the content, which requires added security to protect it. As such, security tags, when applied to content, enable only the user assigned to the tag to access that content.

Version control is another form of security that enables content changes to be retained in the version history, protecting the asset from one version to another. And, digital signatures provide a mechanism to authenticate a digital asset to an owner requiring owner approval when modifications to the data are applied.

Creating enhanced storage capabilities

With an archival and protection strategy in place, the next step is to create storage systems that deliver real-time access to the digital data, and store it without losing data, or having to worry about exhausting memory capacities.

Devices based on flash storage or object-based storage (OBS) is where the M&E market is heading as these storage platforms are well-suited for digital data archival and protection. An active archive strategy should also be considered for infrequently accessed data.

Flash-based storage

Flash-based storage has made significant inroads within the enterprise delivering extremely fast performance and terabyte storage capacities that support such data-intensive M&E applications as virtual reality (VR), artificial intelligence (AI), big data analytics, and 4K/8K ultra video. Since data is stored electronically (versus magnetically as with hard drives), and since the memory itself has no moving parts to slow operations, data transfer to and from storage media is much faster than spinning disks.

Flash-based solid-state drives (SSDs) have zero seek time so latency responses are considerably lower than hard disk drives (HDDs). The SSDs can also be assembled in a system called an all-flash array. Whether they attach locally to a server host, or are network-attached through all-flash arrays and shared by many servers, flash-based storage is an ideal choice for archiving and protecting digital.

Object-based storage

Object-based storage is an architecture that manages data as objects, versus traditional block-based or file-based approaches, and can store unstructured data at petabyte scale.

Unlike file-based storage that manages data in a folder hierarchy, or block-based storage that manages disk sectors as blocks, OBS manages data as objects, and includes:

• Metadata associated with each object;

• Simplified data access by placing data and metadata in a flat address space (or single namespace);

• A unique object identifier that simplifies data indexing or retrieval, or to find a specific object;

• Local data analytics or data discovery for large data volumes at scale.

Active archive strategy

An active archive is data that is too valuable to discard, but is only accessed occasionally. The data stored is not read or write intensive, and does not need to be modified – only read from time to time. This colder data is usually stored on less expensive, slower performing storage media, such as tape or capacity disk.

The active archive strategy may also include software to move and manage the data between storage tiers so it can be directly and seamlessly accessed by applications or users. Data protection is usually through replication to an alternate active archive system, versus performing traditional backups.

Summary

M&E digital content is continually growing in numbers and size, which means that data storage capabilities must expand at similar rates. As the M&E use-cases require storage with large capacities, high scalability, and fast I/O access, delivering these requirements is not only about reliably archiving digital assets, but protecting them to last forever.

—-

Click here to translate this article
Click here to download the complete .PDF version of this article
Click here to download the entire Winter 2018 M&E Journal