In 2006, Clive Humby coined the phrase “Data is the new oil”. This is often misinterpreted as “Data powers the economy”, particularly by folks who sell data processing and storage, but it’s useful to see what someone who actually uses data says. In 2013 Michael Palmer, of the Association of National Advertisers, expanded on Humby’s quote: “Data is just like crude. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analysed for it to have value.”
Much like refined oil (or processed ore for that matter), the volume of material is reduced as the value is increased. This concept should be intuitive to anyone who’s ever manually sifted a pile of data into a thin report, but it’s sometimes lost in contexts where locating a needle in the haystack is the analyst’s goal. Adding human effort reduces volume.
Corollary: if the project requires massive amounts of data storage, it might be worth asking how much value is going to be in there? Is the purpose to store as cheaply as possible and rarely retrieve? Or maybe the plan is to figure out value later?
There’s an interesting confluence of partially aligned incentives between people who have to retain large amounts of data, people who want to retain and access large amounts of data for analytics, and people who just want answers to problems today. “Storage and compute are practically free”, which is why Amazon Web Services is worth 1.5 trillion USD in 2021. If you don’t have to pay the bills, collecting raw data for real time analytics sounds great. Otherwise, you’ll need to consider this project’s needs for data modeling.
One last thought – unlike crude oil, which is permanently transformed when it’s made into gasoline or plastic or perfume, data can be transformed multiple times. This is the primary reason to continue paying for its storage: as long as you can produce sufficient context to refine it with, you could produce new low volume, high value outputs from data you’ve already used before.