| Jørgen's profileGuldmann Fumbles with Ma...PhotosBlogLists | Help |
Guldmann Fumbles with Master DataMDM GULDMANN PLATON TSQL MASTER DATA MANAGEMENT |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Workflow, Business Rules.I like to address how critical a good rule engine is a master data repository and how important it is that it supports an extended workflow.
Take the example where customers can add themselves over the net, and immediately place orders. A strict and well designed GUI with wizards will of course bring you far, with shrewd functionality surrounding the critical business keys, but marketing will most certainly limit these checks in fearing of that too many rules and checks will take away users.
So in the instance where we actually do give a heck regarding issues such as double accounts, false information etc. we would set up a workflow to verify these new data. A workflow which stretches from a repository aided workflow controlled by well defined business rules, over a workflow platform and into a manual workflow strictly governed. Scanning every input will flood any data stewards into giving in and worse of all not give enough care to his data responsibility. Smooth oiled workflow automation and a good business-rules engine are required to spot the problem areas and draw the attention of the steward only where needed be.
So clearly we would want as much as possible to be in the aided workflow as possible, while it’s far less demanding on manpower. In a Microsoft scenario you might also consider Windows Workflow Foundation and Microsoft Windows SharePoint Services workflow, but their rules support is not as complete as in BRE. Microsoft BizTalk workflow and Business-Rules Engine (BRE) might be a closer fit. The manual part of the workflow can also be supported in a InfoPath scenario.
This was only one of many examples where a repository can aid the governed process and heighten the data quality. If you fail to see other scenarios then, do consider how a doublet spot would be implemented. With Computer aided spotting, but steward involved handling, where doublets are sorted out.
In the TP edition of EDM/Stratature/BullDog we saw a really flexible rule engine, which allows support, for an advanced workflow. But it seems to me that in this version, there are a few bumps in the road making it difficult to see this feature utilized. The rules were manually initiated; they needed to be valid for the entity as a whole, before data could be synchronized out from the repository, rather than just for the individual record. And finally they were SLOW. I am looking forward seeing this fixed. I would much prefer much of these workflow checks, implemented in SSIS, rather than in the repository itself.
The WOW factor has reached Microsoft MDMNews just in… Microsoft has purchased Zoomix, a data quality software vendor. This is truly interesting while Zoomix will contribute with the missing data improvement layer to the Microsoft Sql server data Platform. By this expanding Microsoft already ongoing MDM effort (stratature/bulldog/+EDM)
I found it intriguing reading Kirks Haseldens blog on the Guldmann Data ProfilingWhen venturing into any data quality program, is data profiling an essential cornerstone to turn. It provides a wealth of information about the data that you have. A master data repository must hold functionality to automatically identify data quality issues in a more than one way. Common for needed profiling plug-in is the requirement to drill trough to the very entries causing the anomalies, and the very same data must facilitate KPI reports over the same. Basic statistics, frequencies, selectability, data patterns, ranges and outliers. Through patterning the content of the attributes the possibility to detect a vararity of condition emerges. Say Postal code is mostly created with two chars indicating the country, followed by a space and then 4 numeric chars indicating the postal area, consider the mount of information deriving from patterning of this information.
e.g.
This patterning makes it easy to query on, combined with the knowledge of how many times the pattern emerges, the count of spaces and NULL values, the max length, min length, average length makes the foundation of any data analysis.
Looking into the attributes selectability gives strong indication if the attribute is a candidate for a unique business key. Numeric range analysis provides knowledge about utilization. Data patterns gives info regarding which entries doesn’t applied with a given mask Datatype selection This example is boiled down, naturally there will be far more data types and attributes in a real life scenario.
It’s not often this old rat is moved emotionally by a singerIt’s not often this old rat is moved emotionally by a singer, but this is the most beautiful song I've ever heard. I've been back to this site at least 100 times to listen and watch Josh sing this song. Even though I don't understand much Italian, it doesn't make any difference. He makes the hair stand up on my neck every time I listen to it.
http://www.youtube.com/watch?v=nKDkXOOHG34
Identify master data duplicates in a MDM repository contextMaster Data repository should provide a validation framework to check the consistency of your master data.
When looking into identify master data duplicates in a MDM repository, it is important that it support some basic functionality. Beyond its ability to merge specific master data records into a central repository a repository must be able to provide means to detect doublets through a fully parametric search. This resulting into a overview over the degree of similarity between different records.
The Identifying plug-in must operate fully automatically, yet the merging conclusion can only be taken by the steward. The automated matching is based on matching rules, and these rules are bundled into matching strategies. According to a matching strategy, a probability score is calculated ranking the probability that specific records are duplicates. The steward can then compare potential duplicates and merge any records in the entities that are in fact redundant. When having a strong Business key to match upon, it is unwise to automate the collapse process. Say we have a VAT number on the suppliers, a strong unique key, seemingly obvious key to react on, and to automate on. But it’s not safe; consider that the people typing data into the required field VAT number could do when they don’t have the suppliers vat number? All sorts of silly solutions are applied in an ungoverned process; all from using your own company’s VAT number to just entering a random text. Data ScrubbingData Scrubbing is a technical term used when correcting data towards a better data quality. I believe the term originates from the data warehouse world. It is actually the same as the meaning of data cleansing in MDM. The process of heightening data quality can be everything from detecting, removing, and/or correcting the dirty data in a database. Dirty data can be missing, incorrect, out-of-date, redundant, incomplete, or formatted incorrectly. The reason for low quality of data can be found in everything from the result of human error in entering the data, the merging of dispersed systems. Lack of company wide standards and lack of governance takes, or due to old systems containing deprecated data. The data scrubbing program belongs as a plug-in in the MDM repository used to clean the repository. But oddly enough is it mostly implemented in the ETL jobs transporting data to between dispersed systems or even into the Enterprise Data Warehouse. So instead of implementing this facility one place it is implemented redundantly, and with implementations of diverting quality, causing even more inconsistencies. Data Profiling and Quality Report KPI’sDo you know how good your data really is? Data profiling helps stewards learn what's really in their domain entities. It is important that data within your repository undergo automated "bottom-up" data analysis and the result is presented through consistent KPI’s. It is important to emphasize the word automated. Far to many time have I seen profiling tools profiling tools rely on complex SQL queries that must be run by experienced technical staff. Worse, the resulting analyses is not comparable, and offers no means to get an overview if the data quality is getting better or worse, or even what is the course of the bad quality. Do break down the KPI’s so every data producing unit can bench mark itself. It is important that the KPI’s allows drilling down to the very lines of the data problem. Should repositories hold historyShould repositories hold history, my initial opinion would be why not. Seeing the changes over time might be useful in supporting applications especially with a not clean cut uniquely identified scenario with a distributed system of entry. Utilizing Type 2 history in the mapping between the business keys and the repository will yield a lot of useful information, synchronizing wise.
In a analytical MDM context, Dimension creating would also have benefit of such a secure source of history.
I don’t really buy the argument of this meaning an explosion in the amount of data. After all, only some 30% of customers have changes to their contact info a year. This should indeed be manageable. Should the amount of data be an issue, then this is one of the situations where I see Dan E. Linstedt's Data Vault Data Modeling thoughts being utilized. The being witty remark trickWhen creating tables I often have a drop section in the top of the script which is remarked in and out when needed.
/*
DROP TABLE [dbo].[CrappyTable] */ CREATE TABLE [dbo].[CrappyTable] (ID_CrappyTable INT IDENTITY (1,1) PRIMARY KEY) By using my remarks slightly different it enables me to quickly comment the section in. /*
DROP TABLE [dbo].[CrappyTable] --*/ So when enabling the section I simply … --/*
DROP TABLE [dbo].[CrappyTable] --*/ Hot Topic getting hotterJust by goggling Master Data Management one can see a growing list of vendors and advisors. I guess the future seems clouded on the area. I’ve personally always believed in writing MDM applications bottom up instead of buying a full featured suite, while the price of these often wasn’t balanced with their functionality. However products like Stratature/Kaliedo shift this picture somewhat. They are products making it easier to convince the business that they can afford MDM
Divide and ConquerDo Master data management one piece at a time Deciding whether to enforce Master data for disparate entities across the enterprise is the $1mill question. Due to immatureness of the organization or shallow business drivers a fractmented approach might be preferable rather than the full deal. You might recall the article on my sight telling about what is the necessary cornerstone’s in a successful MDM initiative, these steps is still necessary, yet they can be addressed in various degrees. Taking basics in a BI project often seems as good grounds to initiate a MDM initiative, while the obvious disadvantages of not having a MDM driven approach to your data is obvious. Rather than repairing with ETL, you might consider a advocating for a MDM initiative. I am sure you will hear a lot of objections, however I been right in my claim “it doesn’t take longer to-do it right the first time” quite a few times. The +EDM guys sees MDM implemented in such small steps, they tend not to believe in BIG BANG implementations. Moving in.. From a small department with a small given task and then expand from there. Guess its like going up a mountain, there is a few approaches.. There is the steep fast way up, with an increased chance of failure, or the path twisting up. I guess I fully agree with this, however, chances are your will not be able to find your sponsors high enough in the organization. And the obvious disadvantage of not having your foundation at a level with the “swings and roundabouts” overview, is lack of investment will, both economically politically and resource wise. +EDM Communicates really well.Designing Dimensions and hierarchies is an important task when we do BI work for our clients. Later on this work ends up in a mockup cube. It is then the users sees how the hierarchies unfold and collapses, and it’s like a more detailed understanding of what they are getting sinks in, and quite often… its rework time. I think EDM has the same capabilities as the cube mockup. Being as visual as it is, makes it easy to understand super domain master data.
MDM Methodology Data Modeling vs. Master Data Models
Integrating Stratature/Bulldog seen from an enterprise perspectiveAS IS
Importing data in MDM is done through an interface table. It is possible to call a stored procedure that kick off the import of data. This crude mechanism doesn’t support any features in a SOA architecture. This calls for a SSIS integration rather than the desired real-time integration one would prefer with a BizTalk Enterprise hub. In Redmond Microsoft didn’t reveal any plans of changing this for the first release. The import process validates the data which is due for import. AS-IS Complex rules in +EDM is parsed row by row and is unacceptable slow. This is not at all agile enough to accommodate enterprise needs, where data should drip in to the stewards in near real-time. Microsoft is looking into improving this area. Personally I am looking forward to test this improvement a lot while I see it as critical when utilizing EDM in connections with external systems, especially when SOE is distributed. Synchronizing data out from EDM is done through views. There is no direct way to attach notification mechanisms when to take out data to external systems. Another thing which is a major problem is that the version containing the data needs to be valid as a whole before data can be synchronized out. A lot of data is missing before one could claim this to be a fully featured Master Data repository with completeness degree controlled synchronization, depending on a controlled workflow
The Secret to Driving ROI in MDMThe Secret to Driving ROI in MDM"Get the Business Involved"
Ownership Defined
having Common Definitions
One of the more hard tasks is getting fractions of the enterprise to agree on consistent definitions on what e.g. a vendor.
What is a product? Is that Product stones, and on what granularity are they? Defined SOR/SOE
Being able to have an answer to the question: From where do I get Master Data I can trust? (e.g. a product) across processes and IT systems. There is no single truth which constellation of SOR/SOE there should be chosen, only one rule stands firm, it’s essential that it’s uniquely defined.
I like to emphasize that because MDM is the consolidated truth of how the entity is accepted corporately does this not mean it’s the single point of the facts. It is rather a single version of today's corporately accepted view. Remember: what's true today was wasn’t necessarily yesterday truth. Master Data is auditable as a system of record.
governance defined
Solving the right things for the business
Define and validate the business need
Identify the specific business processes that need to be improved or enabled with master data and map those processes
both to the systems - upstream and downstream - and the enterprise data that is critical to supporting each process. Focus on the people in the organization who will be benefiting and how, not only on the capabilities delivered.
rss feed now possibleI've enabled rss feed on this page, for anyone interested in my fumblings
http://jguldmann.spaces.live.com/feed.rss Stratature hierarchy management capabilitiesWhen I first saw Stratature one of the things that caught my eye was its strong hierarchy management capabilities. This was very exiting while, in my time as a BI consultant I have encountered a lot people who were spending a significant amount of their time managing hierarchies in spreadsheets. Not only is it a time consuming task to maintain these, but also often error-prone while inconsistencies build up when more than one person access and maintain them. +EDM offers a fully featured approach to maintain ragged as well as fixed hierarchies, and with that yet another reason (and by all means let’s get more reasons) to use excel as a data source is deprecated. Bulldog (+EDM) seen from a connotative perspective.
Having received the outline of where Microsoft is taking their very exiting acquisition, and tried the product in a prior beta version (+EDM), I feel the urge to write a little about +EDM and its connectivity. How agile it is communicating with the surrounding systems on the corporate platform.
It seems obvious when using the +EDM that it’s designed for having SOE/SOR very centralised. Should one strategy be otherwise, it does offer views exposing data out of the repository, And likewise for data input it offers a table where one can batch load data from. In this blog I will try to elaborate why I believe it will fall very short on this front.
Before data can be exported out from +EDM it has to meet the rules defined. And this thought is good however. In the current state of +EDM executing rules is very slow rules while they are row based. With a surprising low amount of rules and not that many rows it takes very long time to validate your data. This cripples us building up complex rules determining the completeness level of data, and basing or determination of synchronization on this.
Mapping to external systems, like SAP seems very primitively supported. By adding the Business key from SAP as an attribute in your MDM it allows the user to manually maintain a simple link to your CODE which is the +EDM internal key. What about gearing in hierarchies? Is a manual procedure here not error prune?
Complex workflow where data has to be validated by various people seems likewise hard to implement. However this might be possible with changing rights on data. Still it seems very crude and not agile enough for a modern MDM workflow setup.
Most annoyingly is +EDM batch input/output strategy. Near real-time is what most modern MDM synchronization strategies sough’s after. Having a table where data may or may not be imported from and reading the status of the import is really inflexible.
This sounds dreadful negative, but it really isn’t. +EDM is a very exiting product which really addresses a lack in the Microsoft suite when it comes to MDM. Taking into account this is a prior beta version I’ve been testing out I am sure when the final product reaches RTM it will be much improved on this front.
FieldUseHow often have you not been a situation where you needed to reverse engineer the logical structure of an unknown database or needed to track down some data and how the relate. CREATE PROCEDURE [dbo].[FieldUse]@field sysname, @value sysname AS BEGIN DECLARE @SQL NVARCHAR(MAX) SET @SQL='' SELECT @SQL=@SQL+'SELECT '''+s.NAME+''' AS [Schema],'''+t.NAME+''' AS [Table], '''+COALESCE([collation_name],'')+''' AS [collation_name] ' +',COUNT(*) AS Occurrences FROM ['+s.[NAME]+'].['+t.NAME+'] WHERE ['+@field+']='''+@value+''';' FROM sys.columns c JOIN sys.tables t ON c.object_id = t.object_id JOIN sys.schemas s ON t.schema_id = s.schema_id WHERE c.name = @field ORDER BY s.name ,t.NAME; EXEC sp_executesql @SQL END Analytical and Operational MDM does it really make sense separating theseLast week while attending a MDM Stratature TP in Redmond, some thoughts regarding Analytical and Operational MDM struck my mind. It was 3 hectic days, covering a lot of ground on this exiting product, and I am sure a lot more will spawn to my blog as the information settles. Analytical MDM is defined as a style that has Business Intelligence at aim, where the Operational MDM has OLTP in focus. When it comes to Entities and hierarchal information’s these is mostly the same, no matter which direction which has superiority. It seems that only the process impact is influential. When looking at the in real life MDM implementations we have worked with, it makes little sense separating the two. There is always both a strong OLTP and BI drivers to step the MDM path, only timing varies. I believe it’s important not to pigeonhole MDM as a BI or operational technology, to do so makes, a mockery of the very justification for having an MDM strategy. MDM is not a quick fix, it’s the long haul, and adapting for both OLTP and BI is essential in such a timeframe. Stratature / BulldogThis I am visiting Microsoft in Redmond attending MDM sessions with the Stratature team. Their product looks really promising especially when it comes to hierarchies. In the upcoming time I will be musing on the topic. Retrieve column info from SYSSELECT tschema.[name] AS SchemaName ,ttables.[name] AS TableName ,ttypes.[name] AS DataTypeName ,ttypes.[max_length] ,ttypes.[precision] ,ttypes.[scale] FROM sys.syscolumns tsyscolumns INNER JOIN sys.tables ttables ON tsyscolumns.id = ttables.[object_id] INNER JOIN sys.schemas tschema ON ttables.schema_id = tschema.schema_id INNER JOIN sys.types ttypes ON tsyscolumns.xtype=ttypes.system_type_id Repositories - Create Table is that really that hard to write?When a company chooses MDM Repository, I have often been surprised. There is enormous focus on MDM holding master data and Hierarchical data. Which Framework does this best which can hold most values? How easy they are when a new attributes is needed. I never quite understood this focus. I mean, how hard is it, to write Create or Alter Table? Choosing a MDM Repository should in my view be made out from quite a more complex set of criteria’s.
· How well does it integrate with a service orientated architecture. Are we talking about Batch replications or a more sought after near real time push of data? · Does it Map business keys with Repository Keys? This has much relevance when feeding data back to the external systems with as little impact at them as possible. · Can it be Meta Modeled? Or do we have to accept certain limitation in terms of attributes? · Which Services does it feature? o Data washing against public repositories o Doublet spotting with phonetic functions and group o Completeness degree reporting? o Harmonization of records and values? o Constraint and translation of values? o Etc. · How well is centralized Data entry (SOE) handled? Does it allow new attributes to be edited? Do it enforce constrains when editing Hierarchical data. Security is that taken care off? who can change what, who sees what, single sign on and workflows?
There are endless more considerations to make when choosing a repository. Rather than focusing on just values.
Which kind of a data do we consider for MDR EntitiesWhich kind of a data do we consider for MDR Entities
The data types typical considered for MDR control can be divided in two categories, Hierarchical data and Master data.
Master data are the critical nouns of a business type data which often fall within these categories: Legal Entities such as companies and persons, Items such as product, demographical data such as locations, segments etc. and Legal Papers such as contracts, and other legal papers where strong typing is essential.
Hierarchical data stores the relationships between other entities. They are often modeled as helping entities that bind leading entities together and gives a understanding of the data-Hierarchical data is sometimes considered a super MDM domain, because it is critical to understanding and sometimes discovering the relationships between master data.
Which Attributes goes into a MDR EntityWhich Attributes goes into a MDR Entity
Answering this question has been and is one of the tougher nuts to crack for me. · Seen from the distributed systems data is separated into Domain Data, which really has no relevance to others and General data which more than one system could benefit from. · Seen from a Centralized System of Entry point of view should the amount of data driving the process be centralized in the MDR. · The user point of view is that he really does not want to be troubled with data that is not relevant for his domain operation. · The entity owner’s point of view should data be entered in as few steps as possible and involving the least amount of people necessary and with the strictest possible governance. · Seen from our customer’s point of view should the attributes reflect and correspond to general definitions to on the marked.
I guess answering this can only be putting aside most arguments and taking the business point of view and combining it with the Data Responsible finding a balance.
In writing… System of Reference and Entry, where do we get our data, and where should we put itSystem of Reference and Entry, where do we get our data, and where should we put it
The ambition of data consolidation in an MDM scenario must be balanced to the maturity of the organization. Distributed systems, the organization often are not ready for the sought-after big bang centralization. The Strategy can be divergent from Entity to Entity, however 2 factors has to be set always. Where should data be born (System of Entry SOE) and where should the consolidated data be read from (System of Reference SOR). SOE – System of EntrySOE handles designating the origin of entry entity data, often with attributes disunited over a number of systems. The actions entities can undergo are also assigned under SOE. Where is the entity Initiated, where does it die, and who controls the workflow. “Origin of attributes and actions must be uniquely determined to avoid inconsistencies.”
SOR – System of ReferencesSOR is more controversial. As with SOE data could be distributed over a number of systems. However it is my experiences that this is a mine filled road, only leading to chaos and inconsistencies. A centralized repository as SOR is for my experiences the only safe road. Here a Consolidated picture can be sought and argued with strong rules to enforce diversities in hierarchies and data. No matter what implementation level the ambitions amounts to is it imperative that the realization is compliant with a hub and spoke service orientated architecture. Otherwise the MDR scenario will only add to the Asian noodle soup.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|