In this article we will discuss data accuracy and the impacts of existing gaps. With companies collecting, using, and processing large amounts of data daily, how much of this do you think is accurate?
Data accuracy is how close the data is to actual information and it is one of the components of data quality. Data can be valid but that doesn’t necessarily mean that it’s accurate. To be accurate, data must be correct and represented in a consistent and unambiguous format. If you are wondering how data can be valid but inaccurate at the same time, here is one example: two values can be, both, correct and unambiguous but still cause problems. For example, the city values ST. Clara and Saint Clara both refer to the same city. Additionally, everyone looking at values would have no trouble interpreting what values mean. However, inconsistent values could not be accurately aggregated and compared, this could create an opportunity for inaccurate usage of the data.
Accurate and consistent data is essential to business success. As the number of customer engagement channels and data sources has risen, analyzing data has become increasingly challenging. Flawed data entry processes usually result in inaccurate data entries. This can be deliberate or it can be the byproduct of system errors. When the end user applies the data in the decision making process, the problem becomes evident causing serious damage to processes and businesses.
Case study in Data Accuracy
To demonstrate how inaccurate data collection can be, we took a data sample from Open Street Map. The sample contained information about random shops and amenities in Rome, Italy.
We can divide the analysis into two segments:
- Attributes accuracy.
- Attribute validity and inconsistency.
The analysis shows that 39.49% of all places are either closed or have invalid attributes since most prominent geolocation providers, such as Google, Facebook, and Foursquare, did not find these places. The subset of found places (60.5%) was further analyzed for name and phone number accuracy.
Analysis of input addresses showed that around 66% of addresses were geocoded with high precision, with a house number in the structure. The remaining 34% of input addresses are either incomplete or have inaccurate street and home number values. These addresses describe a street or intersection and anything below the street administrative level, for example city or country. Another data inconsistency was noticed during the input file analysis. It was found that 46% of the postal codes are missing the leading one or two zeros.
Further more, website activity was also checked for the entire data set and 13% of the websites are either offline or return an error.
Detailed analysis by the attribute is shown below.
Due to the fact that data changes constantly, at some point your data will contain a certain number of inaccurate records.
Data accuracy gaps can impact your business in a couple of ways:
- Misinformed decisions.
- Errors in product/mail delivery.
- Decreased revenue.
- Higher consumption of resources.
- Higher maintenance costs.
- Dissatisfied sales and distribution channels.
- Lower customer satisfaction and retention.
Identifying gaps in data accuracy is the first and most important step to get reliable data that can be intelligently used. If managed effectively, the obtained data can help organizations improve customer retention, streamline operations, and increase the value of each customer.