People love metaphors. They used to say data was the new oil, then they said it was the new oxygen, but there’s so much data about now that it’s probably more accurate to say data is like sand slipping through your fingers.
So why is that? Put simply - it’s all down to data sprawl.
Data sprawl is what happens when you collect and store so much data, from different sources and which is all kept in different places, that it becomes hard if not impossible to track it all - let alone use it effectively. One survey found 43% of companies used an average of between four and six platforms to manage their data, while another 11% used an average of 10-12 platforms.
And the financial impact can be staggering. IBM estimates that the cost to business in the USA alone of poor quality data is $3.1 trillion. None of this comes as a surprise.
The way to even begin to tackle the problem of data sprawl starts with a number of fundamentals. The first is determining what makes good data in the first place. What does your company need to collect and use but perhaps more importantly - what doesn’t it need ?
The second important step is making sure that data is stored properly in a way where it’s accessible.
Okay, here’s another way to think about it. We’ll park the ‘sand through the hands’ metaphor and picture a library instead. As databases go, a library is about as old school as they come.
Now this library could be the biggest library in the world - and every couple of years it doubles in size. Great.
But what happens if those books aren’t organized properly and you can never find any of them? Sure, you can brag that you’ve got the world’s biggest collection of books, but so what? They’re all just taking up space. You can’t find the ones you want, you certainly can’t read them or get any clear information from them either.
But this is the very situation a lot of companies are facing with their data. The life blood of their business. The stuff they rely on for everything.
Whether it’s regulatory compliance or business planning, customer intelligence or training machine learning (ML) algorithms. It doesn’t matter. It might be laying around somewhere, but nobody knows where.
It’s obvious to see why this causes problems because the very definition of data in a business context is that it should be ‘actionable’, but with data sprawl data becomes the very opposite of actionable. It becomes worthless. Just numbers floating around in the ether.
What’s more, databases are now becoming so large that they are running up against the limitations of previous-generation technology. So not only are the books scattered all over the place in that library, the library itself is getting too small.
So what we need to do is take a new approach and do things right or not bother doing them at all. We need to be better about selecting the data we need to keep, and we need to store this ‘good data’ properly in as few locations as possible. We hold on to the good stuff and forget the bad, and we keep it where it can be found easily and put to proper use. We do that, and we might not have the biggest library - we don’t need to - but we’ll certainly have the best.
First they said data was the new oil, then the new oxygen, but such is the proliferation of data sources out there today – it’s probably more accurate to say data is like sand slipping through your fingers.
Metaphors aside, what does data sprawl mean for businesses and how can they reduce its impact?
Whether it’s regulatory compliance or business planning, customer intelligence or training machine learning (ML) algorithms, companies need access to vast amounts of data in what is now a data-driven business world. But this need creates a major potential problem – data sprawl.
Data Sprawl is what happens when companies collect and store so much data, from different sources and put it in different places, that it becomes hard if not impossible to track it all – let alone use it effectively. One survey found 43% of companies used an average of between four and six platforms to manage their data, while another 11% used an average of 10-12 platforms.
“The vastness of the problem is causing issues.” Said Toney Jennings, CEO of Everything Blockchain Inc (EBI).
“There’s just so much information being collected but not all of it is useful, it’s not all ‘good data’. It needs to be filtered, stored and used properly otherwise it’s less than worthless. It’s simply taking up space.”
And the financial impact can be staggering. IBM estimates that the cost to business in the USA alone of poor quality data is $3.1 trillion.
The way to even begin to tackle the problem of data sprawl starts with a number of fundamentals. The first is determining what makes good data in the first place. What does the company need and what doesn’t it need to collect and use?
The next important step is making sure that data is stored properly in a way where it’s accessible, but stored in as few a places as possible. And this requires new ways of thinking.
Jennings said that databases were now becoming so large that they were running up against the limitations of previous-generation technology. This causes problems because the very definition of data in a business context is that it should be ‘actionable’, but with data sprawl data becomes the very opposite of actionable. It becomes worthless.
“I keep coming back to the fact that there's just a lot of data out there,” Jennings aid. “But I’ve got to make sure I've collected the right kind of data. I think that's the problem for many companies, they’re not collecting the right data and they’re not storing it where they can get at it.
“Sure you’ve got a lot of data stored away, but what’s the point if you can’t find it and use it?”
So firms need to identify what they mean by good data, what is actionable, what is useful, then they need to keep it centrally where they can access and use it. Only then will it stop slipping through their fingers.