![]() |
|
Spaces home Guldmann Fumbles with Ma...PhotosProfileFriendsMore ![]() | ![]() |
Guldmann Fumbles with Master DataMDM GULDMANN PLATON TSQL MASTER DATA MANAGEMENT
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The WOW factor has reached Microsoft MDMNews just in… Microsoft has purchased Zoomix, a data quality software vendor. This is truly interesting while Zoomix will contribute with the missing data improvement layer to the Microsoft Sql server data Platform. By this expanding Microsoft already ongoing MDM effort (stratature/bulldog/+EDM)
I found it intriguing reading Kirks Haseldens blog on the Guldmann Data ProfilingWhen venturing into any data quality program, is data profiling an essential cornerstone to turn. It provides a wealth of information about the data that you have. A master data repository must hold functionality to automatically identify data quality issues in a more than one way. Common for needed profiling plug-in is the requirement to drill trough to the very entries causing the anomalies, and the very same data must facilitate KPI reports over the same. Basic statistics, frequencies, selectability, data patterns, ranges and outliers. Through patterning the content of the attributes the possibility to detect a vararity of condition emerges. Say Postal code is mostly created with two chars indicating the country, followed by a space and then 4 numeric chars indicating the postal area, consider the mount of information deriving from patterning of this information.
e.g.
This patterning makes it easy to query on, combined with the knowledge of how many times the pattern emerges, the count of spaces and NULL values, the max length, min length, average length makes the foundation of any data analysis.
Looking into the attributes selectability gives strong indication if the attribute is a candidate for a unique business key. Numeric range analysis provides knowledge about utilization. Data patterns gives info regarding which entries doesn’t applied with a given mask Datatype selection This example is boiled down, naturally there will be far more data types and attributes in a real life scenario.
It’s not often this old rat is moved emotionally by a singerIt’s not often this old rat is moved emotionally by a singer, but this is the most beautiful song I've ever heard. I've been back to this site at least 100 times to listen and watch Josh sing this song. Even though I don't understand much Italian, it doesn't make any difference. He makes the hair stand up on my neck every time I listen to it.
http://www.youtube.com/watch?v=nKDkXOOHG34
Identify master data duplicates in a MDM repository contextMaster Data repository should provide a validation framework to check the consistency of your master data.
When looking into identify master data duplicates in a MDM repository, it is important that it support some basic functionality. Beyond its ability to merge specific master data records into a central repository a repository must be able to provide means to detect doublets through a fully parametric search. This resulting into a overview over the degree of similarity between different records.
The Identifying plug-in must operate fully automatically, yet the merging conclusion can only be taken by the steward. The automated matching is based on matching rules, and these rules are bundled into matching strategies. According to a matching strategy, a probability score is calculated ranking the probability that specific records are duplicates. The steward can then compare potential duplicates and merge any records in the entities that are in fact redundant. When having a strong Business key to match upon, it is unwise to automate the collapse process. Say we have a VAT number on the suppliers, a strong unique key, seemingly obvious key to react on, and to automate on. But it’s not safe; consider that the people typing data into the required field VAT number could do when they don’t have the suppliers vat number? All sorts of silly solutions are applied in an ungoverned process; all from using your own company’s VAT number to just entering a random text. Data ScrubbingData Scrubbing is a technical term used when correcting data towards a better data quality. I believe the term originates from the data warehouse world. It is actually the same as the meaning of data cleansing in MDM. The process of heightening data quality can be everything from detecting, removing, and/or correcting the dirty data in a database. Dirty data can be missing, incorrect, out-of-date, redundant, incomplete, or formatted incorrectly. The reason for low quality of data can be found in everything from the result of human error in entering the data, the merging of dispersed systems. Lack of company wide standards and lack of governance takes, or due to old systems containing deprecated data. The data scrubbing program belongs as a plug-in in the MDM repository used to clean the repository. But oddly enough is it mostly implemented in the ETL jobs transporting data to between dispersed systems or even into the Enterprise Data Warehouse. So instead of implementing this facility one place it is implemented redundantly, and with implementations of diverting quality, causing even more inconsistencies. Data Profiling and Quality Report KPI’sDo you know how good your data really is? Data profiling helps stewards learn what's really in their domain entities. It is important that data within your repository undergo automated "bottom-up" data analysis and the result is presented through consistent KPI’s. It is important to emphasize the word automated. Far to many time have I seen profiling tools profiling tools rely on complex SQL queries that must be run by experienced technical staff. Worse, the resulting analyses is not comparable, and offers no means to get an overview if the data quality is getting better or worse, or even what is the course of the bad quality. Do break down the KPI’s so every data producing unit can bench mark itself. It is important that the KPI’s allows drilling down to the very lines of the data problem. Should repositories hold historyShould repositories hold history, my initial opinion would be why not. Seeing the changes over time might be useful in supporting applications especially with a not clean cut uniquely identified scenario with a distributed system of entry. Utilizing Type 2 history in the mapping between the business keys and the repository will yield a lot of useful information, synchronizing wise.
In a analytical MDM context, Dimension creating would also have benefit of such a secure source of history.
I don’t really buy the argument of this meaning an explosion in the amount of data. After all, only some 30% of customers have changes to their contact info a year. This should indeed be manageable. Should the amount of data be an issue, then this is one of the situations where I see Dan E. Linstedt's Data Vault Data Modeling thoughts being utilized. The being witty remark trickWhen creating tables I often have a drop section in the top of the script which is remarked in and out when needed.
/*
DROP TABLE [dbo].[CrappyTable] */ CREATE TABLE [dbo].[CrappyTable] (ID_CrappyTable INT IDENTITY (1,1) PRIMARY KEY) By using my remarks slightly different it enables me to quickly comment the section in. /*
DROP TABLE [dbo].[CrappyTable] --*/ So when enabling the section I simply … --/*
DROP TABLE [dbo].[CrappyTable] --*/ Hot Topic getting hotterJust by goggling Master Data Management one can see a growing list of vendors and advisors. I guess the future seems clouded on the area. I’ve personally always believed in writing MDM applications bottom up instead of buying a full featured suite, while the price of these often wasn’t balanced with their functionality. However products like Stratature/Kaliedo shift this picture somewhat. They are products making it easier to convince the business that they can afford MDM
Divide and ConquerDo Master data management one piece at a time Deciding whether to enforce Master data for disparate entities across the enterprise is the $1mill question. Due to immatureness of the organization or shallow business drivers a fractmented approach might be preferable rather than the full deal. You might recall the article on my sight telling about what is the necessary cornerstone’s in a successful MDM initiative, these steps is still necessary, yet they can be addressed in various degrees. Taking basics in a BI project often seems as good grounds to initiate a MDM initiative, while the obvious disadvantages of not having a MDM driven approach to your data is obvious. Rather than repairing with ETL, you might consider a advocating for a MDM initiative. I am sure you will hear a lot of objections, however I been right in my claim “it doesn’t take longer to-do it right the first time” quite a few times. The +EDM guys sees MDM implemented in such small steps, they tend not to believe in BIG BANG implementations. Moving in.. From a small department with a small given task and then expand from there. Guess its like going up a mountain, there is a few approaches.. There is the steep fast way up, with an increased chance of failure, or the path twisting up. I guess I fully agree with this, however, chances are your will not be able to find your sponsors high enough in the organization. And the obvious disadvantage of not having your foundation at a level with the “swings and roundabouts” overview, is lack of investment will, both economically politically and resource wise. +EDM Communicates really well.Designing Dimensions and hierarchies is an important task when we do BI work for our clients. Later on this work ends up in a mockup cube. It is then the users sees how the hierarchies unfold and collapses, and it’s like a more detailed understanding of what they are getting sinks in, and quite often… its rework time. I think EDM has the same capabilities as the cube mockup. Being as visual as it is, makes it easy to understand super domain master data.
MDM Methodology Data Modeling vs. Master Data Models
Integrating Stratature/Bulldog seen from an enterprise perspectiveAS IS
Importing data in MDM is done through an interface table. It is possible to call a stored procedure that kick off the import of data. This crude mechanism doesn’t support any features in a SOA architecture. This calls for a SSIS integration rather than the desired real-time integration one would prefer with a BizTalk Enterprise hub. In Redmond Microsoft didn’t reveal any plans of changing this for the first release. The import process validates the data which is due for import. AS-IS Complex rules in +EDM is parsed row by row and is unacceptable slow. This is not at all agile enough to accommodate enterprise needs, where data should drip in to the stewards in near real-time. Microsoft is looking into improving this area. Personally I am looking forward to test this improvement a lot while I see it as critical when utilizing EDM in connections with external systems, especially when SOE is distributed. Synchronizing data out from EDM is done through views. There is no direct way to attach notification mechanisms when to take out data to external systems. Another thing which is a major problem is that the version containing the data needs to be valid as a whole before data can be synchronized out. A lot of data is missing before one could claim this to be a fully featured Master Data repository with completeness degree controlled synchronization, depending on a controlled workflow
The Secret to Driving ROI in MDMThe Secret to Driving ROI in MDM"Get the Business Involved"
Ownership Defined
having Common Definitions
One of the more hard tasks is getting fractions of the enterprise to agree on consistent definitions on what e.g. a vendor.
What is a product? Is that Product stones, and on what granularity are they? Defined SOR/SOE
Being able to have an answer to the question: From where do I get Master Data I can trust? (e.g. a product) across processes and IT systems. There is no single truth which constellation of SOR/SOE there should be chosen, only one rule stands firm, it’s essential that it’s uniquely defined.
I like to emphasize that because MDM is the consolidated truth of how the entity is accepted corporately does this not mean it’s the single point of the facts. It is rather a single version of today's corporately accepted view. Remember: what's true today was wasn’t necessarily yesterday truth. Master Data is auditable as a system of record.
governance defined
Defining governance roles, maintenance processes, data principles, data quality monitoring, procedures for continuous improvement
Lets take the situation where a sales person gets in contact with a new customer. Ungoverned he might be very interested how much money he can squeeze out this client, but less focused on the rest of the procedure.
So he gets the deal down, Company Name, time of delivery, 5 times the red car and 2 of the blue. Price, zero in discount and he is happy.
Then what happens when the order hits production, they got like 6 different red and also the model with motor. When is the earliest and latest date of delivery ?... Hmm they call the customer. Well the contact person is missing so they trouble a number of people in the customers organization before finally getting the info needed, luckily they spoke to him because he orders yet 2 more cars, cool.
Then it moves to shipping, where should the cars be delivered, who will pay for insurance etc. again information is lacking, calling the sales person does little good, so calling the customer seems like a sound option. Again after troubling a number of people in the organization they get the information.
Now its time to get the big bucks, 7 cars nod a bad thing, hmm weird the production order doesn't match the purchase order. Better call ….
Does this leave an impression of professionalism at the client. Does it affect business?. Do it cost more in man power to be with out governance. Solving the right things for the business
Define and validate the business need
Identify the specific business processes that need to be improved or enabled with master data and map those processes
both to the systems - upstream and downstream - and the enterprise data that is critical to supporting each process. Focus on the people in the organization who will be benefiting and how, not only on the capabilities delivered. Sponsorship.
rss feed now possibleI've enabled rss feed on this page, for anyone interested in my fumblings
http://jguldmann.spaces.live.com/feed.rss Stratature hierarchy management capabilitiesWhen I first saw Stratature one of the things that caught my eye was its strong hierarchy management capabilities. This was very exiting while, in my time as a BI consultant I have encountered a lot people who were spending a significant amount of their time managing hierarchies in spreadsheets. Not only is it a time consuming task to maintain these, but also often error-prone while inconsistencies build up when more than one person access and maintain them. +EDM offers a fully featured approach to maintain ragged as well as fixed hierarchies, and with that yet another reason (and by all means let’s get more reasons) to use excel as a data source is deprecated. Bulldog (+EDM) seen from a connotative perspective.
Having received the outline of where Microsoft is taking their very exiting acquisition, and tried the product in a prior beta version (+EDM), I feel the urge to write a little about +EDM and its connectivity. How agile it is communicating with the surrounding systems on the corporate platform.
It seems obvious when using the +EDM that it’s designed for having SOE/SOR very centralised. Should one strategy be otherwise, it does offer views exposing data out of the repository, And likewise for data input it offers a table where one can batch load data from. In this blog I will try to elaborate why I believe it will fall very short on this front.
Before data can be exported out from +EDM it has to meet the rules defined. And this thought is good however. In the current state of +EDM executing rules is very slow rules while they are row based. With a surprising low amount of rules and not that many rows it takes very long time to validate your data. This cripples us building up complex rules determining the completeness level of data, and basing or determination of synchronization on this.
Mapping to external systems, like SAP seems very primitively supported. By adding the Business key from SAP as an attribute in your MDM it allows the user to manually maintain a simple link to your CODE which is the +EDM internal key. What about gearing in hierarchies? Is a manual procedure here not error prune?
Complex workflow where data has to be validated by various people seems likewise hard to implement. However this might be possible with changing rights on data. Still it seems very crude and not agile enough for a modern MDM workflow setup.
Most annoyingly is +EDM batch input/output strategy. Near real-time is what most modern MDM synchronization strategies sough’s after. Having a table where data may or may not be imported from and reading the status of the import is really inflexible.
This sounds dreadful negative, but it really isn’t. +EDM is a very exiting product which really addresses a lack in the Microsoft suite when it comes to MDM. Taking into account this is a prior beta version I’ve been testing out I am sure when the final product reaches RTM it will be much improved on this front.
FieldUseHow often have you not been a situation where you needed to reverse engineer the logical structure of an unknown database or needed to track down some data and how the relate. CREATE PROCEDURE [dbo].[FieldUse]@field sysname, @value sysname AS BEGIN DECLARE @SQL NVARCHAR(MAX) SET @SQL='' SELECT @SQL=@SQL+'SELECT '''+s.NAME+''' AS [Schema],'''+t.NAME+''' AS [Table], '''+COALESCE([collation_name],'')+''' AS [collation_name] ' +',COUNT(*) AS Occurrences FROM ['+s.[NAME]+'].['+t.NAME+'] WHERE ['+@field+']='''+@value+''';' FROM sys.columns c JOIN sys.tables t ON c.object_id = t.object_id JOIN sys.schemas s ON t.schema_id = s.schema_id WHERE c.name = @field ORDER BY s.name ,t.NAME; EXEC sp_executesql @SQL END Analytical and Operational MDM does it really make sense separating theseLast week while attending a MDM Stratature TP in Redmond, some thoughts regarding Analytical and Operational MDM struck my mind. It was 3 hectic days, covering a lot of ground on this exiting product, and I am sure a lot more will spawn to my blog as the information settles. Analytical MDM is defined as a style that has Business Intelligence at aim, where the Operational MDM has OLTP in focus. When it comes to Entities and hierarchal information’s these is mostly the same, no matter which direction which has superiority. It seems that only the process impact is influential. When looking at the in real life MDM implementations we have worked with, it makes little sense separating the two. There is always both a strong OLTP and BI drivers to step the MDM path, only timing varies. I believe it’s important not to pigeonhole MDM as a BI or operational technology, to do so makes, a mockery of the very justification for having an MDM strategy. MDM is not a quick fix, it’s the long haul, and adapting for both OLTP and BI is essential in such a timeframe. Stratature / BulldogThis I am visiting Microsoft in Redmond attending MDM sessions with the Stratature team. Their product looks really promising especially when it comes to hierarchies. In the upcoming time I will be musing on the topic. Retrieve column info from SYSSELECT tschema.[name] AS SchemaName ,ttables.[name] AS TableName ,ttypes.[name] AS DataTypeName ,ttypes.[max_length] ,ttypes.[precision] ,ttypes.[scale] FROM sys.syscolumns tsyscolumns INNER JOIN sys.tables ttables ON tsyscolumns.id = ttables.[object_id] INNER JOIN sys.schemas tschema ON ttables.schema_id = tschema.schema_id INNER JOIN sys.types ttypes ON tsyscolumns.xtype=ttypes.system_type_id Repositories - Create Table is that really that hard to write?When a company chooses MDM Repository, I have often been surprised. There is enormous focus on MDM holding master data and Hierarchical data. Which Framework does this best which can hold most values? How easy they are when a new attributes is needed. I never quite understood this focus. I mean, how hard is it, to write Create or Alter Table? Choosing a MDM Repository should in my view be made out from quite a more complex set of criteria’s.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||