Data Quality is an issue that effects every application. Factors such as the quantity of data, the number of entry points for data, the fewer number of application restrictions, all contribute to making this issue more serious.
There are 6 major aspects to data quality:
Completeness: Missing information.
Conformity: Standards followed, data formatted properly.
Consistency: Same standards followed between tables, or even systems.
Accuracy: Do the data entities reflect the Real World objects they are meant to represent.
Duplication: More then one record representing the same object can cause far reaching repercussions.
Integrity: Orphan records are a sign of missing relationships between entities.
Correcting these data quality issues has evolved into many massive projects, which rear their head every few years. Correcting these symptoms is not the right solution - correcting the cause is a better one. Creating real time corrections at the data entry point is the best, using business rule generated events.
Allow us to perform a Data Quality Assessment, and we will identify not only the major symptoms currently effecting you, but also the hidden ones, waiting to cause untold headaches in the near future.
According to Wikipedia, Data is of High Quality "if they are fit for their intended uses in operations, decision making and planning or, if they reflect the Real World construct to which they refer".
If any of the 6 aspects above are lacking, then that data becomes in some cases not only hard to operate on, but may actually cause poor decisions, negatively impacting planning. In other words, the inability to operate on data is bad enough, but just imagine using data that is inaccurate, to make key business decisions.
Data Quality fits everywhere
Data Entry Points:
Concentrating at the source of the problem is the quickest way to solve it. Don't deal with an issue AFTER it has entered your system. Dealing with it BEFORE it actually enters your system is being proactive.
Examples of these entry points are Sales Representatives when using your application, or Data Partner source data (tables, files, etc).
There are many tools in the market, which assist the Sales Rep with profile recognition before letting them enter duplicate, or inaccurate address information.
The application itself can utilize ‘picklists’ rather then free text fields, and upfront field validation before data is accepted.
Application:
Once data is within the application, it can be corrected in a reactive mode through batch correction processes, or via one of the integration processes mentioned below.
It should be noted, that batch cleansing processes are a sure sign of the need to rework data entry point processes, or the requirement for a Master Data Management solution.
Data Migration:
When migrating data from legacy systems, they should be treated as a one time data entry source. A Data Migration is the perfect time to avoid bringing forward previous years worth of poor quality, inaccurate, or stale data.
Application upgrade:
As with data migration, when upgrading your system, this is the perfect time to revisit business decisions regarding how much data to being forward. Purge stale, inaccurate data wherever possible and/or data that may have met previous standards, but which is of limited, or no use in the current business.
Data Integration:
Integration points should also be treated as an entry point into your application. Whether the integration is through batch data file loading, or Real Time EAI, the integration process should transform data to meet all requirements of the application.
Master Data Management:
For data that is crucial to your business, and which can be verified by a ‘trusted source’, a Master Data management solution is required. See our MDM content for more details, but regarding data quality aspects specifically, a proper MDM solution will contain processes for the standardization and subsequently, validation of data.
During the Standardization, the solution should recognize, and correct the following 3 major categories:
· Accuracy: Correct common errors, such as changing “Robret” to “Robert”
· Completeness: Fill in missing attributes, utilizing logic or lookups, such as adding Title to a person, or the zip code for an address.
· Completeness: Ensure that formats and abbreviations are consistent, as well as other standards, such as changing Dr. to Drive or Phcy to Pharmacy. Making sure phone numbers are in a consistent format, Suite information is in the proper field, etc.
During Validation, the solution should perform actions regarding 4 major areas:
· Assign trust levels: Determine trust levels based on business rules. For example, the business must decide when to trust data entered from your sales representatives more than data from your Call Center, etc.
· Match: Utilizing business rules, determine whether 2 entities are in fact the same.
· Merge: Utilizing the confidence levels set during the Match process, the solution should perform automated merging.
· Publish: The results of the above operations must be sent out to every subscribing system in the enterprise. This is crucial to the realization of the single view of the Master Data.