Data-driven advertising is not just 21st-century hype but an actual commercial ‘lifestyle’. It defines how businesses run and who their major customers are. And it actually started back in the early ʼ90s, when Domino’s Pizza began using sophisticated technology to collect their  customers’ personal information and preferences. In the beginning, all they had was a huge collection of data, but by combining customer information with their outlet attributes such as location, contact information, ratings and so on, they were able to find out who their customers were and how to keep them coming back for more.

Apart from identifying a user’s favorite meal or preferred time for ordering, it’s important to continue to monitor your online visibility on commonly used data platforms such as Google, Apple, Foursquare and Facebook, because that is how most customers search for a business like yours. Incomplete and inaccurate data creates a negative experience for customers and undermines their loyalty to your brand and data provider. They will go elsewhere, thus negatively impacting your reputation and revenue.

Is my data reliable?

By regularly checking the accuracy of your data and how your business appears online, and by handling data issues properly, you are already on the path to success. In this blog post we will demonstrate how data reliability is measured by PlaceLab, using the example of McDonald’s. In the US, McDonald’s has hundreds of stores in each state and is continuing to expand. If we assume that data about their businesses is accurate, a useful question to ask is how complete and accurate their data is on, for example, Foursquare and OpenStreetMap.

For demonstration purposes, we analysed an area of New York City. We took a list of 163 Foursquare McDonald’s locations found on the following link, and pulled out 59 OSM McDonald’s locations using Overpass API. We chose FourSquare and OpenStreetMap (OSM) because they are commonly used search engines and because they are also used as a source of data for other data companies (FourSquare as a paid data source and OSM as a free data source). The retrieved data was compared to an official list of New York City restaurants found on www.mcdonalds.com.

In this analysis, we performed the following quality checks:

  • Attribute completeness
  • Place validity
  • Missing restaurants
  • Address verification
  • Duplicate records

Attribute completeness

Attribute completeness analysis focused on the main attributes of any business: name, address, phone number (NAP), website, and category. The name field is completely populated in both datasets with one difference: OSM provides only the official name of the restaurant—McDonald’s—while Foursquare added district names to the name value. So, in the Foursquare database, we will find values such as McDonald’s Chelsea or McDonalds Theatre District. Although Foursquare names are very descriptive and clear, non-standard names make the database unnecessarily complex, which increases the risk of error.

The completeness of address, phone number, and website attributes across two data sets is shown below: Due to poorly populated data, we were able to validate only 40% of OSM data, data with populated address attributes.

Place validity

As mentioned above, McDonald’s locations found on Foursquare and OSM were compared with an official website, mcdonalds.com, for validity. Interesting findings came out of this analysis.

We were able to validate almost all Foursquare restaurants. Five locations were not validated due to insufficient data, and another nine were found to be permanently closed.

See below for examples:

Due to poorly populated data, we were able to validate only 40% of OSM data, data with populated address attributes.

Missing restaurants

According to internet sources, there are more than 300 restaurant locations in New York City, meaning neither Foursquare or OSM has a full list of all restaurant features in this particular zone.

Address verification

Location attributes were validated using reliable geocoders. We concluded that both suppliers provide largely accurate address information (when address attributes were populated). Foursquare has several places with incomplete address information, which caused geocoding to street or city level.

Duplicate records

Duplicate data is harmful for a business because:

  • It negatively affects business ranking on search engines
  • Storing unnecessary data is expensive
  • Inaccurate information about the volume of the data can be misleading in the case of buying/selling data
  • It can produce inaccurate analysis, which will lead to bad business decisions.

When we analysed the list of locations found by Foursquare and OSM, even in this small data sample we found a significant percentage of duplicate records. Foursquare has 7.36% while OSM has 6.78% of additional records. Having duplicate records is a very common issue for all data providers and business owners.

For this use case, we did not have a full official list of McDonald’s features, so we couldn’t analyse the official data. But our analysis of Sparkasse Bank data in our last blog post highlights how official business data can be corrupted and can contain high number of duplicate records.

Conclusion

Today, almost 30 years after it was introduced, data-driven advertising is an essential marketing strategy. Finding an empty space to open a business is no longer enough to ensure a business keeps afloat. You need to find a way to get closer to your customers, to understand their needs and to constantly attract them with new offers and promotions. Online information about your business must be accurate and consistent across main data platforms. Both accurate in-house data about customers and publicly available data about your business’s objectives increases your chances of improving your online visibility, attracting more customers and turning the data into revenue.

Leave a Reply