Workload Evolution and the Rise of Metadata
I have written several blog posts about the hypothesis above. In order to prove this point, especially in regard to storage system IT infrastructure, I have written several articles describing:
- The limitations on the amount of application workload a single disk drive could support in the 1980s.
- Storage system innovation that resulted from these performance workload problems.
- The service level requirements of application workloads from the 1990s that also resulted in significant innovation.
- The emergence of multi-departmental application workloads in the late 1990s that caused the emergence of shared storage and SAN approaches.
I spent a good part of my career writing software that handled application workloads for VNX (block-based I/O). Â In particular, I wrote a lot of software that performed caching algorithms and RAID algorithms inside of a cached disk array.
As part of my experience handling application workloads (e.g. block read and write requests) I was handling raw bytes of data and didnâ€™t often consider the topic of metadata.
In a recent conversation with EMC colleague Stephen Manley, we discussed the rise of metadata in the context of the application workloads of the 1990s. More and more applications began to emerge that focused on the management of raw blocks of data (otherwise known as files). By associating metadata with the raw content, applications realized the following benefits:
- The ability to store data, and access it, in a more logical fashion.
- The ability to easily share it by creating metadata with access rights.
- The ability to protect content from unauthorized use.
- The ability to create multiple copies (e.g. active, backup, and compliance copies) and to know where those copies are.
- The ability to create workflows around the content and share it with the right people at the right time.
One of the most significant innovations in response to the metadata trend was NAS: Network-Attached Storage. NAS is predominantly about metadata management, or, as Stephen Manley likes to say, “knowing information about the data you are storing so you can do really cool things with it”. The value is in the metadata. It dictates the accessibility to the content.
In the same way that application workloads drove the industry from physical disk drives to cached disk arrays, the rise of metadata drove the industry from local file systems to network-attached, shared storage. An example side-by-side view of block and file storage highlights this point.
The deployment of block and file I/O systems into customer data centers eventually drove the industry to unified architectures supporting both block and file. One notable innovation that resulted was a hybrid Â approach known as MPFS. This innovation allowed an application workload to write using a file protocol, but transparently read using the BLOCK protocol. This approach provided, for example, a 3-4x performance increase over traditional file system techniques (the industry eventually adopted this innovation into an industry standard approach called pNFS).
Due to the surge in the generation of unstructured content, the NAS market exploded. In some cases, application workloads began to exceed the capacities (and capabilities) of file system technology, pushing the industry toward a new paradigm: OBJECT.
Applications desired to associate increasing amounts of metadata with their content, which stressed the existing approach for interspersing metadata and content. These workloads began to push the industry deeper into the realm of capacity-oriented workloads, which required further innovations in the IT infrastructure. The diagram below highlights this push down the Y-axis.
Workloads pushed capacity-oriented infrastructure in two directions (as highlighted by the diagram above). Some applications began storing massive amounts of metadata and content with high service levels (e.g. fast and available X-ray retrieval during a hospital procedure), while other applications had less rigid availability and/or performance requirements (think YouTube videos).
In either case, application workloads desired to do more and better things with their metadata. This phenomena gave rise to a new class of storage system: object-based storage.
image credit: murraystate.edu
Wait! Before you go…
Choose how you want the latest innovation content delivered to you:
- Daily — RSS Feed — Email — Twitter — Facebook — Linkedin Today
- Weekly — Email Newsletter — Free Magazine — Linkedin Group
Steve Todd is an EMC Fellow, the Director of EMCâ€™s Innovation Network, and a high-tech inventor and book author Innovate With Global Influence. An EMC Intrapreneur with over 200 patent applications and billions in product revenue, he writes about innovation on his personal blog, the Information Playground. Twitter: @SteveTodd
NEVER MISS ANOTHER NEWSLETTER!
One of the most important driving factors for any successful business is a high-performing team. Having people working for you…Read More