| Jørgen 的个人资料Guldmann Fumbles with Ma...照片日志列表 | 帮助 |
Data ProfilingWhen venturing into any data quality program, is data profiling an essential cornerstone to turn. It provides a wealth of information about the data that you have. A master data repository must hold functionality to automatically identify data quality issues in a more than one way. Common for needed profiling plug-in is the requirement to drill trough to the very entries causing the anomalies, and the very same data must facilitate KPI reports over the same. Basic statistics, frequencies, selectability, data patterns, ranges and outliers. Through patterning the content of the attributes the possibility to detect a vararity of condition emerges. Say Postal code is mostly created with two chars indicating the country, followed by a space and then 4 numeric chars indicating the postal area, consider the mount of information deriving from patterning of this information.
e.g.
This patterning makes it easy to query on, combined with the knowledge of how many times the pattern emerges, the count of spaces and NULL values, the max length, min length, average length makes the foundation of any data analysis.
Looking into the attributes selectability gives strong indication if the attribute is a candidate for a unique business key. Numeric range analysis provides knowledge about utilization. Data patterns gives info regarding which entries doesn’t applied with a given mask Datatype selection This example is boiled down, naturally there will be far more data types and attributes in a real life scenario.
|
|
|