Jørgen 的个人资料Guldmann Fumbles with Ma...照片日志列表 工具 帮助

日志


Data Profiling

When venturing into any data quality program, is data profiling an essential cornerstone to turn. It provides a wealth of information about the data that you have. A master data repository must hold functionality to automatically identify data quality issues in a more than one way. Common for needed profiling plug-in is the requirement to drill trough to the very entries causing the anomalies, and the very same data must facilitate KPI reports over the same.

 Basic statistics, frequencies, selectability, data patterns, ranges and outliers.

Through patterning the content of the attributes the possibility to detect a vararity of condition emerges. Say Postal code is mostly created with two chars indicating the country, followed by a space and then 4 numeric chars indicating the postal area, consider the mount of information deriving from patterning of this information.

e.g.

Attribute Entry

Pattern

DK 8000

XX[_]9999

DK 9000

XX[_]9999

DK8000

XX9999

8000DENMARK

9999XXXXXXX

9200

9999

This patterning makes it easy to query on, combined with the knowledge of how many times the pattern emerges, the count of spaces and NULL values, the max length, min length, average length makes the foundation of any data analysis.

Looking into the attributes selectability gives strong indication if the attribute is a candidate for a unique business key.

Numeric range analysis provides knowledge about utilization.

Data patterns gives info regarding which entries doesn’t applied with a given mask

Datatype selection

This example is boiled down, naturally there will be far more data types and attributes in a real life scenario.

DataType

Attribute Name

Result

INT

CustomerNumber

100%

DATETIME

CustomerNumber

0%

INT

Name

0%

DATETIME

Name

0%

INT

PhoneNumber

80%

DATETIME

PhoneNumber

0%

INT

Birthdate

0%

DATETIME

Birthdate

100%

评论

请稍候...
很抱歉,您输入的评论太长。请缩短您的评论。
您没有输入任何内容,请重试。
很抱歉,我们当前无法添加您的评论。请稍后重试。
若要添加评论,需要您的家长授予您相应权限。请求权限
您的家长禁用了评论功能。
很抱歉,我们当前无法删除您的评论。请稍后重试。
您已超过了一天之内允许提供的评论数上限。请在 24 小时后重试。
因为我们的系统表明您可能在向其他用户提供垃圾评论,您的帐户已禁用了评论功能。如果您认为我们错误地禁用了您的帐户,请联系 Windows Live 支持部门
完成下面的安全检查,您提供评论的过程才能完成。
您在安全检查中键入的字符必须与图片或音频中的字符一致。

若要添加评论,请使用您的 Windows Live ID 登录(如果您使用过 Hotmail、Messenger 或 Xbox LIVE,您就拥有 Windows Live ID)。登录


还没有 Windows Live ID 吗?请注册

引用通告

此日志的引用通告 URL 是:
http://jguldmann.spaces.live.com/blog/cns!4A5E70A8D96CE85D!253.trak
引用此项的网络日志