A grip on data quality

Everyone can imagine what is meant by the term data quality, but it is difficult to say exactly when that quality is high or low. Thomas Redman provides a useful definition in his book Data Driven (http://www.dataversity.net/contributors/thomas-redman/): data quality is high if the data in question is suitable for its intended use in business activities, decision-making and planning. And John Morris of Data Quality Pro aims to explain what “suitable for its intended use” means: perfection in data quality is infeasible, but ensure a sufficient level at the right place at the right time (https://www.dataqualitypro.com/data-quality-for-data-migration-techniques/).

Data eXcellence looks at data quality through data migration glasses. What are the requirements for data in a migration? How do I ensure that my data meets those requirements? And how do I gain insight into this? In short, when is my data “suitable for its intended use”? This article will help you answer these questions.

Data quality is one of the determining factors for the success of data migration. Poor data quality can turn a migration into a drawn-out problem. Mandatory data is missing or does not meet the requirements: an IBAN or citizen service number does not pass the 11-digit check. Important data is not suitable for automatic processing: it is not structured or available as hard copy only. Or a database is inconsistent, which means that data has to be linked using all sorts of unconventional trickery. All of this takes a lot of effort and increases the completion time.

High data quality, on the other hand, yields gains in many areas. If all mandatory fields have been filled correctly, the migration team does not have to spend time on this. If all data adheres to strict rules, the migration will be easier, as the migration software does not have to take into account numerous exceptions.

That is why it is important that the data quality in a migration project becomes clear at the earliest possible stage. There are many tools that can analyse the content of databases. The most basic versions will tell you how varied the content is and what the referential integrity is like. This provides an indication of the overall quality, but does not say much yet about the usability of the data in a target system. In more extensive tools you can indicate which specific rules the data must meet. And some tools even provide for a controlled method of repairing data contamination. Gartner has compiled a useful overview at https://www.gartner.com/reviews/market/data-quality-tools.

Data eXcellence has developed its own data quality technology, completely focused on data quality from a migration perspective. This tooling has been integrated in DXF: the Data eXcellence data migration factory.

The target system is leading in a migration

During data migration, the target system dictates the requirements which data quality has to meet. This goes beyond the filling of mandatory fields. The effectiveness of a system can be strongly affected by the presence or absence of certain data. If, for example, the history of entries in mortgage administration records is incomplete, a target system may mistakenly charge arrears. Therefore, the migration team must first determine under which conditions the target system will work correctly. The team then converts these conditions into data quality requirements.

Data that is not perfect, but which does not cause the target system to work incorrectly, remains unchanged. Remember the definition of Redman: suitable for its intended use.
Only consider data that is actually moved to the target system. For this, it is important to determine the exact scope of the data. This refers to the width (what type of data) and depth (how much data history).
Try to limit the history to data that is actually necessary for the target system to operate. Historical data often originates from previous migrations or dates back to a time when the source system set different demands. More history means more variety and thus more data quality rules, more exceptions and increased effort to eliminate data quality issues.

Keep all data quality requirements not imposed by the target system out of the scope of the data migration. The focus should be entirely on the correct functioning of the target system. Allow for plausibility checks at the most: an unlikely age of a mortgagee (under the age of 18 or over 110), or an unlikely mortgage amount. The target system will not be affected in terms of its functioning, but it is clear something is not right. However, the general requirement for data quality must remain for the target system to work correctly. No more and no less.

There are several ways to resolve data quality problems. The most common are:

Defining conversion rules. It is often possible to derive a missing piece of data from other source data. In that case, a conversion rule can perform this derivation automatically. In other cases, using a default value suffices. For example, a missing country code is filled with “Netherlands” by default.
Cleaning of data in the source system. By adjusting information in the source data, you literally tackle a problem at the source. This is the most time-consuming method of data cleaning, but in situations where individual information is missing, it is often the only option. This approach towards resolving data problems will result in the longest completion time.
Adjust the target system. Usually, this is an emergency solution, but if the above two solutions are not feasible or too time-consuming, the choice can be to adjust the target system. This is often a short-term measure to save time, so that users can still enter missing data after migration.

The experience of Data eXcellence is that 80% of data quality issues can be quickly located and resolved, partly with the proper support of tooling. Most effort is accounted for by the remaining 20%. It often involves manual investigations and sorting of specific exceptions.

Introduction

Data quality: a determining factor

The target system is leading in a migration

The Data eXcellence approach

Some examples of data quality rules:

CONCLUSION

Interested in our approach or our technology?