One of the most common and challenging data quality problems that organisations face is the identification of duplicate records – ie the redundant representations of the same information within and across systems throughout the enterprise. Research indicates that up to 5% of the average database is made up of duplicate records. Finding these data duplicates can help you improve basic customer interactions and communications and also help manage risk.
Preventing the entry of duplicate records into your database in the first place is key to maintaining data quality. Deduplication can be done as a preventative measure at point of data capture, and also retrospectively to prevent duplicate records existing in or entering your database.
By realising the requirement to dedupe your records, you understand the importance of identifying survivor records. Creating a set of business rules to define duplicate records is a fundamental part of data management, as is actually appreciating the potential problems caused by having duplicate records in your database. Recognising the strategy and objectives of any deduplication programme will improve buy-in across your organisation, and ensure a culture that understands the fact that duplicates can and probably do exist in your database. Hopefully, the fact that you will need to approach this issue to have a better, more reflective view of the people you are targeting in your every day communications should become evident across your organisation as well.
This article is free for republishing
Source: http://danielcollins.articlealley.com/the-importance-of-data-deduplication-2133845.html