Unstructured data management remains one of the largest challenges facing the enterprise technology sector, according to the second annual Western Digital (WD)/451 Research Group report on the trend.
One of the key findings of the new report, “Addressing the Changing Role of Unstructured Data with Object Storage,” was that “unstructured data continues to grow faster than traditional database data for customers in most vertical markets, and is rapidly exceeding the ability to manage it,” according to Steven Hill, 451 Research Group senior analyst, storage.
File systems, meanwhile, “lack the rich metadata capabilities needed to identify, classify and contextualize many forms of unstructured data,” he said in the report, which was commissioned by WD.
Also key is that artificial intelligence (AI) and machine learning (ML) platforms are “evolving and becoming mainstream,” offering “new opportunities for extracting better and ongoing value from business data,” while new AI/ML platforms may also provide tools to generate “reliable, rich metadata” about media-based objects that can then serve as criteria for policy-based, long-term data management, he said.
Privacy initiatives like the European Union (EU) General Data Protection Regulation (GDPR) that was implemented in May and the California Consumer Privacy Act of 2018 “will have a substantial impact on business data, requiring better identification and granular management of both structured and unstructured data,” he projected.
Other key findings of the report, he said: The metadata capabilities of object storage “provide a framework for identifying and contextualizing data that can be used to automate long-term data management”; an increasing number of applications create or use very large and/or dense data sets that could exceed the capabilities of traditional file systems; and there is an increasing need for personnel with the data science and business intelligence skills so useful and actionable insight can be gained from stored information.
Over the past decade, the “nature and makeup of business data has been shifting from structured database information toward a vast array of unstructured data in the form of documents, media and dense data sets used by medical imaging, scientific modeling, engineering and other technical applications, which can generate massive quantities of information every time they’re used,” he said.
In addition to that “explosion of unstructured data is the growing need to maintain and manage all that data for an extended period of time, either for use in future research or to categorize and protect it to meet legal or industry compliance requirements,” he said.
The analyst also examined the major expectations and concerns of object storage customers and how they differed between 2017 and 2018. While the number of respondents between the two surveys conducted by the analyst could have had an impact on the findings, there were still “some clearly defined trends,” he said.
For one thing, the need for industry compliance moved to the top of the list of concerns in 2018, which he said was “no surprise given the demanding requirements” of GDPR and other new initiatives like it.
These regulations have become “major concerns for businesses worldwide” because they address businesses located in those jurisdictions and also protect individuals living in those areas, he said.
That means a privacy violation of an EU resident’s information by a U.S. company carries the same potential penalty, which for GDPR is a maximum fine of whatever is greater between 4% of a company’s annual global revenue or 20 million euros, he noted. With the California initiative, the penalty can be as much as $7,500 per violation – a fine that might not seem so horrible…. “until you realize that a breach affecting half a million California-based users could theoretically result in a fine” of $3.75 billion, he pointed out.
Despite being “worst-case scenarios, we believe regulations like these elevate the need for accurate and functional unstructured data identification to mission-critical status – and for the first time, retaining data becomes both an asset and a potential liability,” he said. And that “will be especially critical for companies whose primary business model is based on user information, but it could be an issue for companies that store any personal information in any manner,” he said.
He predicted that unstructured data management will remain “one of the largest challenges for IT going forward.” He added: “The next stage in storage evolution has to move beyond the simple nuts and bolts of storing data. It should focus on leveraging the information that data contains and contributing value through visibility, control and automation, regardless of where that data physically resides. We can no longer afford ‘dumb’ storage, and object storage currently offers the only model capable of bringing storage to the next level.”
Managing unstructured data using object storage is the “easy part” for companies, he went on to say, adding: “What’s harder is determining the metadata that’s of value and then creating that metadata. This is something that should have been addressed long ago, but now the problem is reaching the point of unmanageability, with growing legal and business ramifications.”
Recommendations that he said will be “critically important moving forward” include companies understanding their data environment – taking the time to “explore what types of unstructured data are growing fastest, what applications are creating it, and what it’s being used for”; talking with internal business stakeholders; considering what information is business- or mission-critical; and reconsidering the need to gather and store personally identifiable information.
He also urged organizations to “start somewhere” because just “tagging object data with an accurate time-stamp, nation of origin and information about the author … can be a valuable starting point that associates the document with an individual and a role within the company.”
It’s also important for organizations to think on a “long-term” basis, he said. While the majority of unstructured data in the form of images, audio and video could be hard to identify now, “new technologies are appearing that will eventually make it easier to build useful, contextualized metadata about dense media,” he said.
To access the report, click here.