Top 5 challenges related to data quality and accuracy and how to overcome them

We’re excited to bring Transform 2022 back in person on July 19 and around July 20-28. Join AI and data leaders for insightful conversations and exciting networking opportunities. Register today!

Every company today relies on data or at least claims to be. Business decisions are no longer made based on feelings or anecdotal trends as they were in the past. Tangible data and analytics now power the most important corporate decisions.

As more companies leverage the power of machine learning and artificial intelligence to make critical choices, there needs to be a conversation about the quality — the completeness, consistency, validity, timeliness, and uniqueness — of the data used by these tools. The insights that companies expect to be delivered by machine learning (ML) or AI-based technologies are only as good as the data used to power them. The old adage of “garbage in, rubbish out,” comes to mind when it comes to data-driven decisions.

Statistically, poor data quality leads to increased complexity of data ecosystems and poor decision-making in the long run. In fact, roughly $12.9 million It is lost every year due to poor data quality. as data volumes keep increasing, as well as the challenges that companies face in validating data and its data. To overcome issues with data quality and accuracy, it is first necessary to know the context in which the data elements will be used, as well as best practices to guide initiatives along.

1. Data quality is not a one-size-fits-all endeavor

Data initiatives are not limited to a single business engine. In other words, determining the quality of the data will always depend on what the company is trying to achieve with that data. The same data can affect more than one business unit, function, or project in very different ways. Moreover, the list of data elements that require strict governance may vary according to different data users. For example, marketing teams will need a highly accurate and validated email list, while research and development will be invested in high-quality user feedback data.

So the team that best characterizes the quality of the data item will be the team closest to the data. Only they will be able to recognize the data as it underpins business processes and ultimately assess accuracy based on the purpose and how of the data used.

2. What you do not know Could you hurt you

Data is an enterprise asset. However, actions speak louder than words. Not everyone within the organization does everything they can to ensure that data is accurate. If users don’t realize the importance of data quality and governance – or simply don’t prioritize them as they should – they make no effort to anticipate data issues from mediocre data entry or raise their hand when they find the data issue that needs to be addressed.

This can be addressed in practice by tracking data quality metrics as a performance objective to promote greater accountability for those directly involved with the data. Additionally, business leaders must advocate for the importance of their data quality program. You should align with key team members about the practical impact of poor data quality. For example, misleading insights shared in inaccurate stakeholder reporting, which can result in fines or penalties. Investing in better data literacy can help organizations create a culture of data quality to avoid making reckless or ill-considered mistakes that damage the bottom line.

3. Don’t try to boil the ocean

It is not practical to fix a large laundry list of data quality problems. It’s not an efficient use of resources either. The number of active data elements within an organization is huge and steadily increasing. It is best to start by identifying Enterprise Critical Data Elements (CDEs), which are the data elements that are integral to the main functionality of a particular business. CDEs are unique to every business. Net revenue is a common CDE for most companies because it is important for reporting to investors, other shareholders, etc.

Since every company has different business goals, operating models, and organizational structures, each company’s CDE will be different. In retail, for example, CDEs may relate to design or sales. On the other hand, healthcare companies will be more interested in ensuring the quality of regulatory compliance data. Although this is not an exhaustive list, business leaders may consider asking the following questions to help identify their unique CDEs: What are your critical business processes? What data is used in those operations? Are these data elements included in regulatory reports? Will these reports be audited? Will these data elements guide initiatives in other departments within the organization?

Validating and processing only essential elements will help organizations scale their data quality efforts in a resourceful and sustainable manner. Eventually, the enterprise data quality program will reach a level of maturity where there are frameworks (often with some level of automation) that will classify data assets based on pre-defined elements to eliminate variance across the enterprise.

4. More clarity = more accountability = better data quality

Companies drive value by knowing where their CDE is, who is accessing it and how it is being used. In essence, there is no way a company can define its own CDE if it does not have proper data governance in place to begin with. However, many companies struggle with unclear or non-existent ownership of their data stores. Determining ownership before setting up more data stores or sources reinforces the commitment to quality and usefulness. It is also wise for organizations to set up data management software where ownership of data is clearly defined and people can be held accountable. This can be as simple as a shared spreadsheet dictating ownership of a set of data items or it can be managed by a sophisticated data management platform, for example.

Just as organizations must design their business processes to improve accountability, they must also model their data, in terms of data structure, data lines, and how data is transformed. Data engineering attempts to model the structure of an organization’s logical and physical data assets and data management resources. Creating this kind of insight is at the heart of the data quality problem, i.e. without seeing the *lifecycle* of the data – when it is created, how it is used/transformed and how it is output – it is impossible to guarantee real data quality.

5. Excess data

Even when data and analytics teams have created frameworks to categorize and prioritize CDEs, they are still with thousands of data items that need to either be validated or processed. Each data element can require one or more business rules specific to the context in which it will be used. However, these rules can only be set by business users who work with these unique data sets. Therefore, data quality teams will need to work closely with subject matter experts to define rules for each unique data element, which can be very intense, even when they are prioritized. This often leads to burnout and overload within data quality teams because they are responsible for handwriting a large set of rules for a variety of data items. When it comes to the workload of data quality team members, organizations must set realistic expectations. They may consider expanding the data quality team and/or investing in tools that take advantage of ML to reduce the amount of manual work on data quality tasks.

Data isn’t just the world’s new oil: it’s the world’s new water. Organizations can have the most complex infrastructure, but if the water (or data) passing through these pipelines is not potable, they are useless. People who need this water must have easy access to it, they must know that it is usable and not polluted, they must know when the supply is low, and finally, suppliers/janitors must know who is accessing it. Just as access to clean drinking water helps communities in a variety of ways, improved access to data, mature data quality frameworks, and a deeper data quality culture can protect data-driven programs and ideas, helping to stimulate innovation and efficiency within organizations around the world.

JP Romero is the Technical Director at Calypso


Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including technical people who do data work, can share ideas and innovations related to data.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data technology, join us at DataDecisionMakers.

You can even think Contribute an article Your own!

Read more from DataDecisionMakers