By now, we understand data is valuable and that the value of data is relative to its use.Indeed, value is one of the 5 V’s of big data (the others being volume, variety, veracity, and velocity). Put simply, data is useless if it can’t be converted into something of value. To assign this valuation, you must consider what uses the data might have, as well as how often it is used or called upon.
Structurally, value is often reflected in the temperatures (or tiering) of data, where hot data (current, accessed most frequently) is placed on the fastest and often most expensive storage tier, while the colder (lesser used) data sits in lower cost storage. Companies often distribute data roughly along the 90/10 rule: 90 percent of their data is stored on hard drives and the top 10 percent on flash memory devices.
However, when using artificial intelligence and machine learning (AI/ML), it’s worth noting the models and algorithms improve as they are exposed to more data, and as the datasets that they access grow. The larger the available dataset, the more accurately you can train the model, and the better the predictions become (and therefore, the more you can trust the outcome).
That is why data storage companies like Seagate are focusing on the development of a real density growth and performance breakthrough in high capacity drives with HAMR and MACH2. It’s also why we are incorporating more data transformation capabilities into the same hardware unit, effectively allowing more of the data pipeline to be run from a single place (rather than needing to stitch together multiple services, tools and layers, as is currently the case).
In the longer term, data movement architectures will need to ensure that hardware acceleration or hardware offload through the storage systems is done at appropriate points. Specifically, compression, encryption, and deduplication of data sets today get done in compute. As a result, big architectures are having to scale because these tasks are done at the higher level. This need not be the case if innovation moves the hardware acceleration and offload to the storage or network layer.
Another focus of storage innovation is in delivering higher bandwidth to enable more robust movement of data among storage, networking, and compute functions. This is important for analytics. The backbone of today’s analytics is graphics processing units or GPUs, which require high bandwidth to ingest data. To improve bandwidth, for example, organisations use composable disaggregated architectures in large-scale AI applications.
The value of data isn’t only reflected in the amount of investment we put into its storage and transformation; it’s also in the way we treat it from a security and privacy-preserving standpoint.
You can expect to see continued investment in device integrity through open enclaves for making firmware and compute carry and house appropriate protocols to digitally verify devices. System solutions benefit from securities at the component and device level. Networking’s security benefits from the system’s security. Finally, compute is made more secure because of more security networking.
Read more about storage innovation and the value of data in Seagate’s Rethink Data Report.