| Jørgen 的个人资料Guldmann Fumbles with Ma...照片日志列表 | 帮助 |
It’s not often this old rat is moved emotionally by a singerIt’s not often this old rat is moved emotionally by a singer, but this is the most beautiful song I've ever heard. I've been back to this site at least 100 times to listen and watch Josh sing this song. Even though I don't understand much Italian, it doesn't make any difference. He makes the hair stand up on my neck every time I listen to it.
http://www.youtube.com/watch?v=nKDkXOOHG34
Identify master data duplicates in a MDM repository contextMaster Data repository should provide a validation framework to check the consistency of your master data.
When looking into identify master data duplicates in a MDM repository, it is important that it support some basic functionality. Beyond its ability to merge specific master data records into a central repository a repository must be able to provide means to detect doublets through a fully parametric search. This resulting into a overview over the degree of similarity between different records.
The Identifying plug-in must operate fully automatically, yet the merging conclusion can only be taken by the steward. The automated matching is based on matching rules, and these rules are bundled into matching strategies. According to a matching strategy, a probability score is calculated ranking the probability that specific records are duplicates. The steward can then compare potential duplicates and merge any records in the entities that are in fact redundant. When having a strong Business key to match upon, it is unwise to automate the collapse process. Say we have a VAT number on the suppliers, a strong unique key, seemingly obvious key to react on, and to automate on. But it’s not safe; consider that the people typing data into the required field VAT number could do when they don’t have the suppliers vat number? All sorts of silly solutions are applied in an ungoverned process; all from using your own company’s VAT number to just entering a random text. Data ScrubbingData Scrubbing is a technical term used when correcting data towards a better data quality. I believe the term originates from the data warehouse world. It is actually the same as the meaning of data cleansing in MDM. The process of heightening data quality can be everything from detecting, removing, and/or correcting the dirty data in a database. Dirty data can be missing, incorrect, out-of-date, redundant, incomplete, or formatted incorrectly. The reason for low quality of data can be found in everything from the result of human error in entering the data, the merging of dispersed systems. Lack of company wide standards and lack of governance takes, or due to old systems containing deprecated data. The data scrubbing program belongs as a plug-in in the MDM repository used to clean the repository. But oddly enough is it mostly implemented in the ETL jobs transporting data to between dispersed systems or even into the Enterprise Data Warehouse. So instead of implementing this facility one place it is implemented redundantly, and with implementations of diverting quality, causing even more inconsistencies. Data Profiling and Quality Report KPI’sDo you know how good your data really is? Data profiling helps stewards learn what's really in their domain entities. It is important that data within your repository undergo automated "bottom-up" data analysis and the result is presented through consistent KPI’s. It is important to emphasize the word automated. Far to many time have I seen profiling tools profiling tools rely on complex SQL queries that must be run by experienced technical staff. Worse, the resulting analyses is not comparable, and offers no means to get an overview if the data quality is getting better or worse, or even what is the course of the bad quality. Do break down the KPI’s so every data producing unit can bench mark itself. It is important that the KPI’s allows drilling down to the very lines of the data problem. Should repositories hold historyShould repositories hold history, my initial opinion would be why not. Seeing the changes over time might be useful in supporting applications especially with a not clean cut uniquely identified scenario with a distributed system of entry. Utilizing Type 2 history in the mapping between the business keys and the repository will yield a lot of useful information, synchronizing wise.
In a analytical MDM context, Dimension creating would also have benefit of such a secure source of history.
I don’t really buy the argument of this meaning an explosion in the amount of data. After all, only some 30% of customers have changes to their contact info a year. This should indeed be manageable. Should the amount of data be an issue, then this is one of the situations where I see Dan E. Linstedt's Data Vault Data Modeling thoughts being utilized. |
|
|