The power of location data is in its quality. But to obtain high quality data is not easy. Firstly, you need to maintain your own data, and we will talk a little bit more about that in this blog, and, secondly, you need to follow recommendations from our last blog post, A Checklist For Large Location-Based Data Acquirers, where we established a common basis for data assessment. In today’s blog post, we will talk more about data quality but this time we will be focusing on numbers. What is the cost of having bad data in your system?
Although the phrase big data is quite self-explanatory and anyone can figure out that it refers to high-volume data, we were stunned when we discovered the number behind it − $136 billion per year. That is the size of the big data global market, estimated in 2016. Another statistic says that, on average, corporate data grows at 40% each year, which means that the overall size of the market will only increase in the following years. On the one hand, we have yearly data growth, but on the other hand the amount of poor data is not decreasing. According to the same research, 20% of every database consists of dirty and bad data, and by bad data we are referring to all incorrect, incomplete or missing data, duplicates and outdated data information. IBM estimated that the yearly cost of having poor data is $3.1 trillion in the US alone. Another shocking fact, right? To better understand how the cost of bad data escalates so quickly, let’s look at the costs of bad CRM data records:
Now imagine a company with a small database of 10,000 records where we assume 20% of it is bad data. If they choose to do nothing about their data quality, they will spend $180,000 on bad data. Moreover, next year the size of their database will have increased by 40% and if they continue this trend of ignoring bad data, they will lose another $252,000.
How can you avoid having bad data? How does good data become bad? If you read our last blog post, you should be familiar with the fact that companies providing data solutions buy data from third-party data suppliers. After the acquisition, data starts “living” in the company and goes through three phases, also known as the location data lifecycle:
- In the first phase, data is being collected from a/an (un)known supplier, validated and verified before the acquisition.
- After the acquisition, data is ingested into the system and it goes through the phase 2 process: benchmarking, competitive analysis, data enrichment, etc.
- In the third phase, data is being used/reused to create business solutions or is visualized and published.
The location data lifecycle requires periodically going back to phase 1 in order to monitor the health of the data and to keep it fresh and updated.
Bad data could be generated at any moment during the lifecycle, for instance:
- Skip phase 1 – Such companies risk becoming data-rich but insight-poor and they are ingesting data without being able or willing to evaluate it.
- Bad company policies − When bad data has already been ingested, it becomes an issue for the IT department rather than a business issue. IT workers, stuck with a deadline, spend a lot of time hunting for bad data and fixing it ad hoc, but the accuracy of the data remains questionable.
- At some point, even high quality data will become outdated. Companies need to understand the location data lifecycle flow, to periodically go back to data validation and verification and to ask the supplier for a data update when needed. Ignoring this step is easy, but you would end up with a large volume of bad data.
Although you might be overwhelmed by the volume of data and ignore a couple of bad data records, as your business grows, more and more information becomes affected by the bad data issue. The reason bad data costs so much is that in the end decision makers, managers, data scientists, and others must accommodate it in their everyday work. According to the research, 40% of initiatives are never completed because they are based on poor data. On the other hand, having high quality data can generate up to 70% more revenue for your company and can also improve your credibility.
If you want to avoid the overall costs of having bad data, you need to have in place structured data management through all three phases of the location-data lifecycle. PlaceLab has plenty of services that could be used in each phase, from data evaluation to competitive benchmarking.