More servicesWindows Live
HomeHotmailSpacesOneCare
 
MSN
Sign in
 
 
Spaces home  Guldmann Fumbles with Ma...PhotosProfileFriendsMore Tools Explore the Spaces community

Guldmann Fumbles with Master Data

MDM GULDMANN PLATON TSQL MASTER DATA MANAGEMENT
Updated 4/8/2008
Updated 8/24/2006
Updated 8/24/2006
by 

The WOW factor has reached Microsoft MDM

News just in… Microsoft has purchased Zoomix, a data quality software vendor. This is truly interesting while Zoomix will contribute with the missing data improvement layer to the Microsoft Sql server data Platform. By this expanding Microsoft already ongoing MDM effort (stratature/bulldog/+EDM)

I found it intriguing reading Kirks Haseldens blog on the Data Profiling abilities efforts in SQL Server 2008. It seems that SSIS and SQL server is going to bridge Microsoft’s effort to integrate Zoomix and +EDM, I find it fair to say this is really interesting news, seen in a MDM perspective.

Guldmann

Data Profiling

When venturing into any data quality program, is data profiling an essential cornerstone to turn. It provides a wealth of information about the data that you have. A master data repository must hold functionality to automatically identify data quality issues in a more than one way. Common for needed profiling plug-in is the requirement to drill trough to the very entries causing the anomalies, and the very same data must facilitate KPI reports over the same.

 Basic statistics, frequencies, selectability, data patterns, ranges and outliers.

Through patterning the content of the attributes the possibility to detect a vararity of condition emerges. Say Postal code is mostly created with two chars indicating the country, followed by a space and then 4 numeric chars indicating the postal area, consider the mount of information deriving from patterning of this information.

e.g.

Attribute Entry

Pattern

DK 8000

XX[_]9999

DK 9000

XX[_]9999

DK8000

XX9999

8000DENMARK

9999XXXXXXX

9200

9999

This patterning makes it easy to query on, combined with the knowledge of how many times the pattern emerges, the count of spaces and NULL values, the max length, min length, average length makes the foundation of any data analysis.

Looking into the attributes selectability gives strong indication if the attribute is a candidate for a unique business key.

Numeric range analysis provides knowledge about utilization.

Data patterns gives info regarding which entries doesn’t applied with a given mask

Datatype selection

This example is boiled down, naturally there will be far more data types and attributes in a real life scenario.

DataType

Attribute Name

Result

INT

CustomerNumber

100%

DATETIME

CustomerNumber

0%

INT

Name

0%

DATETIME

Name

0%

INT

PhoneNumber

80%

DATETIME

PhoneNumber

0%

INT

Birthdate

0%

DATETIME

Birthdate

100%

It’s not often this old rat is moved emotionally by a singer

It’s not often this old rat is moved emotionally by a singer, but this is the most beautiful song I've ever heard. I've been back to this site at least 100 times to listen and watch Josh sing this song. Even though I don't understand much Italian, it doesn't make any difference. He makes the hair stand up on my neck every time I listen to it.
http://www.youtube.com/watch?v=nKDkXOOHG34

Identify master data duplicates in a MDM repository context

Master Data repository should provide a validation framework to check the consistency of your master data.

  

When looking into identify master data duplicates in a MDM repository, it is important that it support some basic functionality. Beyond its ability to merge specific master data records into a central repository a repository must be able to provide means to detect doublets through a fully parametric search. This resulting into a overview over the degree of similarity between different records.

 

The Identifying plug-in must operate fully automatically, yet the merging conclusion can only be taken by the steward. The automated matching is based on matching rules, and these rules are bundled into matching strategies. According to a matching strategy, a probability score is calculated ranking the probability that specific records are duplicates.

The steward can then compare potential duplicates and merge any records in the entities that are in fact redundant.

 

When having a strong Business key to match upon, it is unwise to automate the collapse process. Say we have a VAT number on the suppliers, a strong unique key, seemingly obvious key to react on, and to automate on. But it’s not safe; consider that the people typing data into the required field VAT number could do when they don’t have the suppliers vat number? All sorts of silly solutions are applied in an ungoverned process; all from using your own company’s VAT number to just entering a random text.

Data Scrubbing

Data Scrubbing is a technical term used when correcting data towards a better data quality. I believe the term originates from the data warehouse world. It is actually the same as the meaning of data cleansing in MDM. The process of heightening data quality can be everything from detecting, removing, and/or correcting the dirty data in a database. Dirty data can be missing, incorrect, out-of-date, redundant, incomplete, or formatted incorrectly.

The reason for low quality of data can be found in everything from the result of human error in entering the data, the merging of dispersed systems. Lack of company wide standards and lack of governance takes, or due to old systems containing deprecated data.

The data scrubbing program belongs as a plug-in in the MDM repository used to clean the repository. But oddly enough is it mostly implemented in the ETL jobs transporting data to between dispersed systems or even into the Enterprise Data Warehouse. So instead of implementing this facility one place it is implemented redundantly, and with implementations of diverting quality, causing even more inconsistencies.

Data Profiling and Quality Report KPI’s

Do you know how good your data really is? Data profiling helps stewards learn what's really in their domain entities. It is important that data within your repository undergo automated "bottom-up" data analysis and the result is presented through consistent KPI’s. It is important to emphasize the word automated. Far to many time have I seen profiling tools profiling tools rely on complex SQL queries that must be run by experienced technical staff. Worse, the resulting analyses is not comparable, and offers no means to get an overview if the data quality is getting better or worse, or even what is the course of the bad quality. Do break down the KPI’s so every data producing unit can bench mark itself. It is important that the KPI’s allows drilling down to the very lines of the data problem.

Should repositories hold history

Should repositories hold history, my initial opinion would be why not. Seeing the changes over time might be useful in supporting applications especially with a not clean cut uniquely identified scenario with a distributed system of entry. Utilizing Type 2 history in the mapping between the business keys and the repository will yield a lot of useful information, synchronizing wise.

 

In a analytical MDM context, Dimension creating would also have benefit of such a secure source of history.

 

I don’t really buy the argument of this meaning an explosion in the amount of data. After all, only some 30% of customers have changes to their contact info a year. This should indeed be manageable. Should the amount of data be an issue, then this is one of the situations where I see Dan E. Linstedt's Data Vault Data Modeling thoughts being utilized.

The being witty remark trick

When creating tables I often have a drop section in the top of the script which is remarked in and out when needed.
 
/*
DROP TABLE [dbo].[CrappyTable]
*/
CREATE TABLE [dbo].[CrappyTable] (ID_CrappyTable INT IDENTITY (1,1) PRIMARY KEY)

By using my remarks slightly different it enables me to quickly comment the section in.
/*
DROP TABLE [dbo].[CrappyTable]
--*/

So when enabling the section I simply …
--/*
DROP TABLE [dbo].[CrappyTable]
--*/
 
 

Hot Topic getting hotter

Just by goggling Master Data Management one can see a growing list of vendors and advisors. I guess the future seems clouded on the area. I’ve personally always believed in writing MDM applications bottom up instead of buying a full featured suite, while the price of these often wasn’t balanced with their functionality. However products like Stratature/Kaliedo shift this picture somewhat. They are products making it easier to convince the business that they can afford MDM

http://uk.platon.net/master_data_management.asp

http://www.datafoundations.com

http://www.DataMirror.com

http://www.Gemstone.com

http://www.hyperion.com/products/bi_platform/core_data_integration/mdm_index.cfm

http://www.i2.com/solutionareas/sixone/scos/mdm.cfm

http://www.kalido.com/products/mdm

http://www.metamatrix.com/pages/solutions/master_data_management.htm

http://www.Netics.com

http://www.Nimaya.com

http://www.ObjectRiver.net

http://www.Purisma.com

http://www.sas.com

http://www.SilverCreekSystems.com

http://www.Stratature.com

https://www.sdn.sap.com/irj/sdn/developerareas/mdm

Divide and Conquer

Do Master data management one piece at a time

Deciding whether to enforce Master data for disparate entities across the enterprise is the $1mill question. Due to immatureness of the organization or shallow business drivers a fractmented approach might be preferable rather than the full deal. You might recall the article on my sight telling about what is the necessary cornerstone’s in a successful MDM initiative, these steps is still necessary, yet they can be addressed in various degrees. Taking basics in a BI project often seems as good grounds to initiate a MDM initiative, while the obvious disadvantages of not having a MDM driven approach to your data is obvious. Rather than repairing with ETL, you might consider a advocating for a MDM initiative. I am sure you will hear a lot of objections, however I been right in my claim “it doesn’t take longer to-do it right the first time” quite a few times.

The +EDM guys sees MDM implemented in such small steps, they tend not to believe in BIG BANG implementations. Moving in.. From a small department with a small given task and then expand from there. Guess its like going up a mountain, there is a few approaches.. There is the steep fast way up, with an increased chance of failure, or the path twisting up.

I guess I fully agree with this, however, chances are your will not be able to find your sponsors high enough in the organization. And the obvious disadvantage of not having your foundation at a level with the “swings and roundabouts” overview, is lack of investment will, both economically politically and resource wise.

+EDM Communicates really well.

Designing Dimensions and hierarchies is an important task when we do BI work for our clients. Later on this work ends up in a mockup cube. It is then the users sees how the hierarchies unfold and collapses, and it’s like a more detailed understanding of what they are getting sinks in, and quite often… its rework time.

I think EDM has the same capabilities as the cube mockup. Being as visual as it is, makes it easy to understand super domain master data.

 

MDM Methodology

Data Modeling vs. Master Data Models

 

 mdm3

What communicates the better

Integrating Stratature/Bulldog seen from an enterprise perspective

AS IS

Importing data in MDM is done through an interface table. It is possible to call a stored procedure that kick off the import of data. This crude mechanism doesn’t support any features in a SOA architecture. This calls for a SSIS integration rather than the desired real-time integration one would prefer with a BizTalk Enterprise hub. In Redmond Microsoft didn’t reveal any plans of changing this for the first release.

mdm3

The import process validates the data which is due for import. AS-IS Complex rules in +EDM is parsed row by row and is unacceptable slow. This is not at all agile enough to accommodate enterprise needs, where data should drip in to the stewards in near real-time. Microsoft is looking into improving this area. Personally I am looking forward to test this improvement a lot while I see it as critical when utilizing EDM in connections with external systems, especially when SOE is distributed.

 
mdm3
 
Synchronizing data out from EDM is done through views. There is no direct way to attach notification mechanisms when to take out data to external systems. Another thing which is a major problem is that the version containing the data needs to be valid as a whole before data can be synchronized out.  A lot of data is missing before one could claim this to be a fully featured Master Data repository with completeness degree controlled synchronization, depending on a controlled workflow
 
mdm3

The Secret to Driving ROI in MDM

 The Secret to Driving ROI in MDM

"Get the Business Involved"

Certain cornerstones are essential when driving a MDM project to success. Mostly the Sequence of these steps is important, but it’s hard to give a general starting point, while it often depends on the tasks at hand. One can compare it with going to the doctor, his approach isn’t quite the same when going for a full medical checkup or when you arrive to the emergency room with an broken leg. So where to start and how much weight you put into each of the topics listed depends on the projects needs, however a full circle is needed before a success can be in sight, and the MDM patient cured :).

 

mdm

Ownership Defined

It is important that named persons are held accountable for the structure of Master Data and quality in data. Ensure that the sponsors has identified the subject matter experts or data stewards across the enterprise who can best articulate what quality means to your business, identify process gaps where improvements are necessary and define the quality metrics that should be monitored ongoing.

having Common Definitions

One of the more hard tasks is getting fractions of the enterprise to agree on consistent definitions on what e.g. a vendor.

What is a product? Is that Product stones, and on what granularity are they?

Defined SOR/SOE

Being able to have an answer to the question: From where do I get Master Data I can trust?  (e.g. a product) across processes and IT systems. There is no single truth which constellation of SOR/SOE there should be chosen, only one rule stands firm, it’s essential that it’s uniquely defined.

I like to emphasize that because MDM is the consolidated truth of how the entity is accepted corporately does this not mean it’s the single point of the facts. It is rather a single version of today's corporately accepted view. Remember: what's true today was wasn’t necessarily yesterday truth. Master Data is auditable as a system of record.

mdm2

governance defined

Defining governance roles, maintenance processes, data principles, data quality monitoring, procedures for continuous improvement

Lets take the situation where a sales person gets in contact with a new customer. Ungoverned he might be very interested how much money he can squeeze out this client, but less focused on the rest of the procedure.

So he gets the deal down, Company Name, time of delivery, 5 times the red car and 2 of the blue. Price, zero in discount and he is happy.

Then what happens when the order hits production, they got like 6 different red and also the model with motor. When is the earliest and latest date of delivery ?... Hmm they call the customer. Well the contact person is missing so they trouble a number of people in the customers organization before finally getting the info needed, luckily they spoke to him because he orders yet 2 more cars, cool.

 

Then it moves to shipping, where should the cars be delivered, who will pay for insurance etc. again information is lacking, calling the sales person does little good, so calling the customer seems like a sound option. Again after troubling a number of people in the organization they get the information.

 

Now its time to get the big bucks, 7 cars nod a bad thing, hmm weird the production order doesn't match the purchase order. Better call ….

Does this leave an impression of professionalism at the client. Does it affect business?. Do it cost more in man power to be with out governance.

Solving the right things for the business

Define and validate the business need

Identify the specific business processes that need to be improved or enabled with master data and map those processes

both to the systems - upstream and downstream - and the enterprise data that is critical to supporting each process. Focus on the people in the organization who will be benefiting and how, not only on the capabilities delivered.

Sponsorship.

Your MDM initiative must be supported at the highest levels possible, and your executive sponsors should recognize it as a priority investment that will require ongoing oversight and management from both a business and an IT perspective - it’s not a onetime project.
I use to blame Michael Vølund for this method. GET THE PAINPOINTS, once you know where it hurts then you can ask the business to make these pains operational by putting value to them in terms of money and extra effort. But this it quickly turns into business drivers. Then seek sponsor ship as high in the organization as possible. Preferably so high they can see its all “swings and roundabouts”

rss feed now possible

I've enabled rss feed on this page, for anyone interested in my fumblings
 
http://jguldmann.spaces.live.com/feed.rss

Stratature hierarchy management capabilities

When I first saw Stratature one of the things that caught my eye was its strong hierarchy management capabilities. This was very exiting while, in my time as a BI consultant I have encountered a lot people who were spending a significant amount of their time managing hierarchies in spreadsheets. Not only is it a time consuming task to maintain these, but also often error-prone while inconsistencies build up when more than one person access and maintain them. +EDM offers a fully featured approach to maintain ragged as well as fixed hierarchies, and with that yet another reason (and by all means let’s get more reasons) to use excel as a data source is deprecated.

Bulldog (+EDM) seen from a connotative perspective.

 

Having received the outline of where Microsoft is taking their very exiting acquisition, and tried the product in a prior beta version (+EDM), I feel the urge to write a little about +EDM and its connectivity. How agile it is communicating with the surrounding systems on the corporate platform.

 

It seems obvious when using the +EDM that it’s designed for having SOE/SOR very centralised. Should one strategy be otherwise, it does offer views exposing data out of the repository, And likewise for data input it offers a table where one can batch load data from. In this blog I will try to elaborate why I believe it will fall very short on this front.

 

Before data can be exported out from +EDM it has to meet the rules defined. And this thought is good however. In the current state of +EDM executing rules is very slow rules while they are row based. With a surprising low amount of rules and not that many rows it takes very long time to validate your data. This cripples us building up complex rules determining the completeness level of data, and basing or determination of synchronization on this.

 

Mapping to external systems, like SAP seems very primitively supported. By adding the Business key from SAP as an attribute in your MDM it allows the user to manually maintain a simple link to your CODE which is the +EDM internal key.  What about gearing in hierarchies? Is a manual procedure here not error prune?

 

Complex workflow where data has to be validated by various people seems likewise hard to implement. However this might be possible with changing rights on data. Still it seems very crude and not agile enough for a modern MDM workflow setup.

 

Most annoyingly is +EDM batch input/output strategy. Near real-time is what most modern MDM synchronization strategies sough’s after. Having a table where data may or may not be imported from and reading the status of the import is really inflexible.

 

This sounds dreadful negative, but it really isn’t. +EDM is a very exiting product which really addresses a lack in the Microsoft suite when it comes to MDM. Taking into account this is a prior beta version I’ve been testing out I am sure when the final product reaches RTM it will be much improved on this front.

 

FieldUse

How often have you not been a situation where you needed to reverse engineer the logical structure of an unknown database or needed to track down some data and how the relate.

CREATE PROCEDURE [dbo].[FieldUse]

@field sysname,

@value sysname

AS

BEGIN

DECLARE @SQL NVARCHAR(MAX)

SET @SQL=''

SELECT @SQL=@SQL+'SELECT '''+s.NAME+''' AS [Schema],'''+t.NAME+''' AS [Table], '''+COALESCE([collation_name],'')+''' AS [collation_name] ' +',COUNT(*) AS Occurrences FROM ['+s.[NAME]+'].['+t.NAME+'] WHERE ['+@field+']='''+@value+''';'

FROM sys.columns c

JOIN sys.tables t

ON c.object_id = t.object_id

JOIN sys.schemas s

ON t.schema_id = s.schema_id

WHERE c.name = @field

ORDER BY s.name ,t.NAME;

EXEC sp_executesql @SQL

END

Analytical and Operational MDM does it really make sense separating these

Last week while attending a MDM Stratature TP in Redmond, some thoughts regarding Analytical and Operational MDM struck my mind. It was 3 hectic days, covering a lot of ground on this exiting product, and I am sure a lot more will spawn to my blog as the information settles.

Analytical MDM is defined as a style that has Business Intelligence at aim, where the Operational MDM has OLTP in focus. When it comes to Entities and hierarchal information’s these is mostly the same, no matter which direction which has superiority. It seems that only the process impact is influential.

When looking at the in real life MDM implementations we have worked with, it makes little sense separating the two. There is always both a strong OLTP and BI drivers to step the MDM path, only timing varies.

I believe it’s important not to pigeonhole MDM as a BI or operational technology, to do so makes, a mockery of the very justification for having an MDM strategy. MDM is not a quick fix, it’s the long haul, and adapting for both OLTP and BI is essential in such a timeframe.

Stratature / Bulldog

This I am visiting Microsoft in Redmond attending MDM sessions with the Stratature team.

Their product looks really promising especially when it comes to hierarchies.

In the upcoming time I will be musing on the topic.

Retrieve column info from SYS

SELECT tschema.[name] AS SchemaName
       ,ttables.[name] AS TableName
       ,ttypes.[name] AS DataTypeName
       ,ttypes.[max_length]
       ,ttypes.[precision]
       ,ttypes.[scale]
  FROM sys.syscolumns tsyscolumns
 INNER JOIN  sys.tables ttables
    ON tsyscolumns.id = ttables.[object_id]
 INNER JOIN sys.schemas tschema
    ON ttables.schema_id = tschema.schema_id
 INNER JOIN sys.types ttypes
    ON tsyscolumns.xtype=ttypes.system_type_id

Repositories - Create Table is that really that hard to write?

When a company chooses MDM Repository, I have often been surprised. There is enormous focus on MDM holding master data and Hierarchical data. Which Framework does this best which can hold most values? How easy they are when a new attributes is needed. I never quite understood this focus.  I mean, how hard is it, to write Create or Alter Table? Choosing a MDM Repository should in my view be made out from quite a more complex set of criteria’s.