Jørgen's profileGuldmann Fumbles with Ma...PhotosBlogLists Tools Help

Blog


    Workflow, Business Rules.

    I like to address how critical a good rule engine is a master data repository and how important it is that it supports an extended workflow.  
    Take the example where customers can add themselves over the net, and immediately place orders. A strict and well designed GUI with wizards will of course bring you far, with shrewd functionality surrounding the critical business keys, but marketing will most certainly limit these checks in fearing of that too many rules and checks will take away users.
    So in the instance where we actually do give a heck regarding issues such as double accounts, false information etc. we would set up a workflow to verify these new data. A workflow which stretches from a repository aided workflow controlled by well defined business rules, over a workflow platform and into a manual workflow strictly governed. Scanning every input will flood any data stewards into giving in and worse of all not give enough care to his data responsibility. Smooth oiled workflow automation and a good business-rules engine are required to spot the problem areas and draw the attention of the steward only where needed be.
    So clearly we would want as much as possible to be in the aided workflow as possible, while it’s far less demanding on manpower. In a Microsoft scenario you might also consider Windows Workflow Foundation and Microsoft Windows SharePoint Services workflow, but their rules support is not as complete as in BRE. Microsoft BizTalk workflow and Business-Rules Engine (BRE) might be a closer fit. The manual part of the workflow can also be supported in a InfoPath scenario.
    This was only one of many examples where a repository can aid the governed process and heighten the data quality. If you fail to see other scenarios then, do consider how a doublet spot would be implemented. With Computer aided spotting, but steward involved handling, where doublets are sorted out.
    In the TP edition of EDM/Stratature/BullDog we saw a really flexible rule engine, which allows support, for an advanced workflow. But it seems to me that in this version, there are a few bumps in the road making it difficult to see this feature utilized. The rules were manually initiated; they needed to be valid for the entity as a whole, before data could be synchronized out from the repository, rather than just for the individual record. And finally they were SLOW. I am looking forward seeing this fixed. I would much prefer much of these workflow checks, implemented in SSIS, rather than in the repository itself.

     

     

    Data Profiling

    When venturing into any data quality program, is data profiling an essential cornerstone to turn. It provides a wealth of information about the data that you have. A master data repository must hold functionality to automatically identify data quality issues in a more than one way. Common for needed profiling plug-in is the requirement to drill trough to the very entries causing the anomalies, and the very same data must facilitate KPI reports over the same.

     Basic statistics, frequencies, selectability, data patterns, ranges and outliers.

    Through patterning the content of the attributes the possibility to detect a vararity of condition emerges. Say Postal code is mostly created with two chars indicating the country, followed by a space and then 4 numeric chars indicating the postal area, consider the mount of information deriving from patterning of this information.

    e.g.

    Attribute Entry

    Pattern

    DK 8000

    XX[_]9999

    DK 9000

    XX[_]9999

    DK8000

    XX9999

    8000DENMARK

    9999XXXXXXX

    9200

    9999

    This patterning makes it easy to query on, combined with the knowledge of how many times the pattern emerges, the count of spaces and NULL values, the max length, min length, average length makes the foundation of any data analysis.

    Looking into the attributes selectability gives strong indication if the attribute is a candidate for a unique business key.

    Numeric range analysis provides knowledge about utilization.

    Data patterns gives info regarding which entries doesn’t applied with a given mask

    Datatype selection

    This example is boiled down, naturally there will be far more data types and attributes in a real life scenario.

    DataType

    Attribute Name

    Result

    INT

    CustomerNumber

    100%

    DATETIME

    CustomerNumber

    0%

    INT

    Name

    0%

    DATETIME

    Name

    0%

    INT

    PhoneNumber

    80%

    DATETIME

    PhoneNumber

    0%

    INT

    Birthdate

    0%

    DATETIME

    Birthdate

    100%

    Identify master data duplicates in a MDM repository context

    Master Data repository should provide a validation framework to check the consistency of your master data.

      

    When looking into identify master data duplicates in a MDM repository, it is important that it support some basic functionality. Beyond its ability to merge specific master data records into a central repository a repository must be able to provide means to detect doublets through a fully parametric search. This resulting into a overview over the degree of similarity between different records.

     

    The Identifying plug-in must operate fully automatically, yet the merging conclusion can only be taken by the steward. The automated matching is based on matching rules, and these rules are bundled into matching strategies. According to a matching strategy, a probability score is calculated ranking the probability that specific records are duplicates.

    The steward can then compare potential duplicates and merge any records in the entities that are in fact redundant.

     

    When having a strong Business key to match upon, it is unwise to automate the collapse process. Say we have a VAT number on the suppliers, a strong unique key, seemingly obvious key to react on, and to automate on. But it’s not safe; consider that the people typing data into the required field VAT number could do when they don’t have the suppliers vat number? All sorts of silly solutions are applied in an ungoverned process; all from using your own company’s VAT number to just entering a random text.

    Data Scrubbing

    Data Scrubbing is a technical term used when correcting data towards a better data quality. I believe the term originates from the data warehouse world. It is actually the same as the meaning of data cleansing in MDM. The process of heightening data quality can be everything from detecting, removing, and/or correcting the dirty data in a database. Dirty data can be missing, incorrect, out-of-date, redundant, incomplete, or formatted incorrectly.

    The reason for low quality of data can be found in everything from the result of human error in entering the data, the merging of dispersed systems. Lack of company wide standards and lack of governance takes, or due to old systems containing deprecated data.

    The data scrubbing program belongs as a plug-in in the MDM repository used to clean the repository. But oddly enough is it mostly implemented in the ETL jobs transporting data to between dispersed systems or even into the Enterprise Data Warehouse. So instead of implementing this facility one place it is implemented redundantly, and with implementations of diverting quality, causing even more inconsistencies.

    Data Profiling and Quality Report KPI’s

    Do you know how good your data really is? Data profiling helps stewards learn what's really in their domain entities. It is important that data within your repository undergo automated "bottom-up" data analysis and the result is presented through consistent KPI’s. It is important to emphasize the word automated. Far to many time have I seen profiling tools profiling tools rely on complex SQL queries that must be run by experienced technical staff. Worse, the resulting analyses is not comparable, and offers no means to get an overview if the data quality is getting better or worse, or even what is the course of the bad quality. Do break down the KPI’s so every data producing unit can bench mark itself. It is important that the KPI’s allows drilling down to the very lines of the data problem.

    Should repositories hold history

    Should repositories hold history, my initial opinion would be why not. Seeing the changes over time might be useful in supporting applications especially with a not clean cut uniquely identified scenario with a distributed system of entry. Utilizing Type 2 history in the mapping between the business keys and the repository will yield a lot of useful information, synchronizing wise.

     

    In a analytical MDM context, Dimension creating would also have benefit of such a secure source of history.

     

    I don’t really buy the argument of this meaning an explosion in the amount of data. After all, only some 30% of customers have changes to their contact info a year. This should indeed be manageable. Should the amount of data be an issue, then this is one of the situations where I see Dan E. Linstedt's Data Vault Data Modeling thoughts being utilized.

    Hot Topic getting hotter

    Just by goggling Master Data Management one can see a growing list of vendors and advisors. I guess the future seems clouded on the area. I’ve personally always believed in writing MDM applications bottom up instead of buying a full featured suite, while the price of these often wasn’t balanced with their functionality. However products like Stratature/Kaliedo shift this picture somewhat. They are products making it easier to convince the business that they can afford MDM

    http://uk.platon.net/master_data_management.asp

    http://www.datafoundations.com

    http://www.DataMirror.com

    http://www.Gemstone.com

    http://www.hyperion.com/products/bi_platform/core_data_integration/mdm_index.cfm

    http://www.i2.com/solutionareas/sixone/scos/mdm.cfm

    http://www.kalido.com/products/mdm

    http://www.metamatrix.com/pages/solutions/master_data_management.htm

    http://www.Netics.com

    http://www.Nimaya.com

    http://www.ObjectRiver.net

    http://www.Purisma.com

    http://www.sas.com

    http://www.SilverCreekSystems.com

    http://www.Stratature.com

    https://www.sdn.sap.com/irj/sdn/developerareas/mdm

    Divide and Conquer

    Do Master data management one piece at a time

    Deciding whether to enforce Master data for disparate entities across the enterprise is the $1mill question. Due to immatureness of the organization or shallow business drivers a fractmented approach might be preferable rather than the full deal. You might recall the article on my sight telling about what is the necessary cornerstone’s in a successful MDM initiative, these steps is still necessary, yet they can be addressed in various degrees. Taking basics in a BI project often seems as good grounds to initiate a MDM initiative, while the obvious disadvantages of not having a MDM driven approach to your data is obvious. Rather than repairing with ETL, you might consider a advocating for a MDM initiative. I am sure you will hear a lot of objections, however I been right in my claim “it doesn’t take longer to-do it right the first time” quite a few times.

    The +EDM guys sees MDM implemented in such small steps, they tend not to believe in BIG BANG implementations. Moving in.. From a small department with a small given task and then expand from there. Guess its like going up a mountain, there is a few approaches.. There is the steep fast way up, with an increased chance of failure, or the path twisting up.

    I guess I fully agree with this, however, chances are your will not be able to find your sponsors high enough in the organization. And the obvious disadvantage of not having your foundation at a level with the “swings and roundabouts” overview, is lack of investment will, both economically politically and resource wise.

    +EDM Communicates really well.

    Designing Dimensions and hierarchies is an important task when we do BI work for our clients. Later on this work ends up in a mockup cube. It is then the users sees how the hierarchies unfold and collapses, and it’s like a more detailed understanding of what they are getting sinks in, and quite often… its rework time.

    I think EDM has the same capabilities as the cube mockup. Being as visual as it is, makes it easy to understand super domain master data.

     

    MDM Methodology

    Data Modeling vs. Master Data Models

     

     mdm3

    What communicates the better

    Integrating Stratature/Bulldog seen from an enterprise perspective

    AS IS

    Importing data in MDM is done through an interface table. It is possible to call a stored procedure that kick off the import of data. This crude mechanism doesn’t support any features in a SOA architecture. This calls for a SSIS integration rather than the desired real-time integration one would prefer with a BizTalk Enterprise hub. In Redmond Microsoft didn’t reveal any plans of changing this for the first release.

    mdm3

    The import process validates the data which is due for import. AS-IS Complex rules in +EDM is parsed row by row and is unacceptable slow. This is not at all agile enough to accommodate enterprise needs, where data should drip in to the stewards in near real-time. Microsoft is looking into improving this area. Personally I am looking forward to test this improvement a lot while I see it as critical when utilizing EDM in connections with external systems, especially when SOE is distributed.

     
    mdm3
     
    Synchronizing data out from EDM is done through views. There is no direct way to attach notification mechanisms when to take out data to external systems. Another thing which is a major problem is that the version containing the data needs to be valid as a whole before data can be synchronized out.  A lot of data is missing before one could claim this to be a fully featured Master Data repository with completeness degree controlled synchronization, depending on a controlled workflow
     
    mdm3

    The Secret to Driving ROI in MDM

     The Secret to Driving ROI in MDM

    "Get the Business Involved"

    Certain cornerstones are essential when driving a MDM project to success. Mostly the Sequence of these steps is important, but it’s hard to give a general starting point, while it often depends on the tasks at hand. One can compare it with going to the doctor, his approach isn’t quite the same when going for a full medical checkup or when you arrive to the emergency room with an broken leg. So where to start and how much weight you put into each of the topics listed depends on the projects needs, however a full circle is needed before a success can be in sight, and the MDM patient cured :).

     

    mdm

    Ownership Defined

    It is important that named persons are held accountable for the structure of Master Data and quality in data. Ensure that the sponsors has identified the subject matter experts or data stewards across the enterprise who can best articulate what quality means to your business, identify process gaps where improvements are necessary and define the quality metrics that should be monitored ongoing.

    having Common Definitions

    One of the more hard tasks is getting fractions of the enterprise to agree on consistent definitions on what e.g. a vendor.

    What is a product? Is that Product stones, and on what granularity are they?

    Defined SOR/SOE

    Being able to have an answer to the question: From where do I get Master Data I can trust?  (e.g. a product) across processes and IT systems. There is no single truth which constellation of SOR/SOE there should be chosen, only one rule stands firm, it’s essential that it’s uniquely defined.

    I like to emphasize that because MDM is the consolidated truth of how the entity is accepted corporately does this not mean it’s the single point of the facts. It is rather a single version of today's corporately accepted view. Remember: what's true today was wasn’t necessarily yesterday truth. Master Data is auditable as a system of record.

    mdm2

    governance defined

    Defining governance roles, maintenance processes, data principles, data quality monitoring, procedures for continuous improvement

    Lets take the situation where a sales person gets in contact with a new customer. Ungoverned he might be very interested how much money he can squeeze out this client, but less focused on the rest of the procedure.

    So he gets the deal down, Company Name, time of delivery, 5 times the red car and 2 of the blue. Price, zero in discount and he is happy.

    Then what happens when the order hits production, they got like 6 different red and also the model with motor. When is the earliest and latest date of delivery ?... Hmm they call the customer. Well the contact person is missing so they trouble a number of people in the customers organization before finally getting the info needed, luckily they spoke to him because he orders yet 2 more cars, cool.

     

    Then it moves to shipping, where should the cars be delivered, who will pay for insurance etc. again information is lacking, calling the sales person does little good, so calling the customer seems like a sound option. Again after troubling a number of people in the organization they get the information.

     

    Now its time to get the big bucks, 7 cars nod a bad thing, hmm weird the production order doesn't match the purchase order. Better call ….

    Does this leave an impression of professionalism at the client. Does it affect business?. Do it cost more in man power to be with out governance.

    Solving the right things for the business

    Define and validate the business need

    Identify the specific business processes that need to be improved or enabled with master data and map those processes

    both to the systems - upstream and downstream - and the enterprise data that is critical to supporting each process. Focus on the people in the organization who will be benefiting and how, not only on the capabilities delivered.

    Sponsorship.

    Your MDM initiative must be supported at the highest levels possible, and your executive sponsors should recognize it as a priority investment that will require ongoing oversight and management from both a business and an IT perspective - it’s not a onetime project.
    I use to blame Michael Vølund for this method. GET THE PAINPOINTS, once you know where it hurts then you can ask the business to make these pains operational by putting value to them in terms of money and extra effort. But this it quickly turns into business drivers. Then seek sponsor ship as high in the organization as possible. Preferably so high they can see its all “swings and roundabouts”

    Stratature hierarchy management capabilities

    When I first saw Stratature one of the things that caught my eye was its strong hierarchy management capabilities. This was very exiting while, in my time as a BI consultant I have encountered a lot people who were spending a significant amount of their time managing hierarchies in spreadsheets. Not only is it a time consuming task to maintain these, but also often error-prone while inconsistencies build up when more than one person access and maintain them. +EDM offers a fully featured approach to maintain ragged as well as fixed hierarchies, and with that yet another reason (and by all means let’s get more reasons) to use excel as a data source is deprecated.

    Bulldog (+EDM) seen from a connotative perspective.

     

    Having received the outline of where Microsoft is taking their very exiting acquisition, and tried the product in a prior beta version (+EDM), I feel the urge to write a little about +EDM and its connectivity. How agile it is communicating with the surrounding systems on the corporate platform.

     

    It seems obvious when using the +EDM that it’s designed for having SOE/SOR very centralised. Should one strategy be otherwise, it does offer views exposing data out of the repository, And likewise for data input it offers a table where one can batch load data from. In this blog I will try to elaborate why I believe it will fall very short on this front.

     

    Before data can be exported out from +EDM it has to meet the rules defined. And this thought is good however. In the current state of +EDM executing rules is very slow rules while they are row based. With a surprising low amount of rules and not that many rows it takes very long time to validate your data. This cripples us building up complex rules determining the completeness level of data, and basing or determination of synchronization on this.

     

    Mapping to external systems, like SAP seems very primitively supported. By adding the Business key from SAP as an attribute in your MDM it allows the user to manually maintain a simple link to your CODE which is the +EDM internal key.  What about gearing in hierarchies? Is a manual procedure here not error prune?

     

    Complex workflow where data has to be validated by various people seems likewise hard to implement. However this might be possible with changing rights on data. Still it seems very crude and not agile enough for a modern MDM workflow setup.

     

    Most annoyingly is +EDM batch input/output strategy. Near real-time is what most modern MDM synchronization strategies sough’s after. Having a table where data may or may not be imported from and reading the status of the import is really inflexible.

     

    This sounds dreadful negative, but it really isn’t. +EDM is a very exiting product which really addresses a lack in the Microsoft suite when it comes to MDM. Taking into account this is a prior beta version I’ve been testing out I am sure when the final product reaches RTM it will be much improved on this front.

     

    Analytical and Operational MDM does it really make sense separating these

    Last week while attending a MDM Stratature TP in Redmond, some thoughts regarding Analytical and Operational MDM struck my mind. It was 3 hectic days, covering a lot of ground on this exiting product, and I am sure a lot more will spawn to my blog as the information settles.

    Analytical MDM is defined as a style that has Business Intelligence at aim, where the Operational MDM has OLTP in focus. When it comes to Entities and hierarchal information’s these is mostly the same, no matter which direction which has superiority. It seems that only the process impact is influential.

    When looking at the in real life MDM implementations we have worked with, it makes little sense separating the two. There is always both a strong OLTP and BI drivers to step the MDM path, only timing varies.

    I believe it’s important not to pigeonhole MDM as a BI or operational technology, to do so makes, a mockery of the very justification for having an MDM strategy. MDM is not a quick fix, it’s the long haul, and adapting for both OLTP and BI is essential in such a timeframe.

    Stratature / Bulldog

    This I am visiting Microsoft in Redmond attending MDM sessions with the Stratature team.

    Their product looks really promising especially when it comes to hierarchies.

    In the upcoming time I will be musing on the topic.

    Repositories - Create Table is that really that hard to write?

    When a company chooses MDM Repository, I have often been surprised. There is enormous focus on MDM holding master data and Hierarchical data. Which Framework does this best which can hold most values? How easy they are when a new attributes is needed. I never quite understood this focus.  I mean, how hard is it, to write Create or Alter Table? Choosing a MDM Repository should in my view be made out from quite a more complex set of criteria’s.

     

    ·         How well does it integrate with a service orientated architecture.  Are we talking about Batch replications or a more sought after near real time push of data?

    ·         Does it Map business keys with Repository Keys? This has much relevance when feeding data back to the external systems with as little impact at them as possible.

    ·         Can it be Meta Modeled? Or do we have to accept certain limitation in terms of attributes?

    ·         Which Services does it feature?

    o   Data washing against public repositories

    o   Doublet spotting with phonetic functions and group 

    o   Completeness degree reporting?

    o   Harmonization of records and values?

    o   Constraint and translation of values?

    o   Etc.

    ·         How well is centralized Data entry (SOE) handled? Does it allow new attributes to be edited? Do it enforce constrains when editing Hierarchical data. Security is that taken care off?  who can change what, who sees what, single sign on and workflows?

     

    There are endless more considerations to make when choosing a repository. Rather than focusing on just values.

     

     

    Which kind of a data do we consider for MDR Entities

    Which kind of a data do we consider for MDR Entities

     

    The data types typical considered for MDR control can be divided in two categories, Hierarchical data and Master data.

     

    Master data are the critical nouns of a business type data which often fall within these categories: Legal Entities such as companies and persons, Items such as product, demographical data such as locations, segments etc. and Legal Papers such as contracts, and other legal papers where strong typing is essential.

     

    Hierarchical data stores the relationships between other entities. They are often modeled as helping entities that bind leading entities together and gives a understanding of the data-Hierarchical data is sometimes considered a super MDM domain, because it is critical to understanding and sometimes discovering the relationships between master data.

     

    System of Reference and Entry, where do we get our data, and where should we put it

    System of Reference and Entry, where do we get our data, and where should we put it

     

    The ambition of data consolidation in an MDM scenario must be balanced to the maturity of the organization.  Distributed systems, the organization often are not ready for the sought-after big bang centralization. The Strategy can be divergent from Entity to Entity, however 2 factors has to be set always. Where should data be born (System of Entry SOE) and where should the consolidated data be read from (System of Reference SOR).  

    SOE – System of Entry

    SOE handles designating the origin of entry entity data, often with attributes disunited over a number of systems. The actions entities can undergo are also assigned under SOE. Where is the entity Initiated, where does it die, and who controls the workflow.

    okRule

    “Origin of attributes and actions must be uniquely determined to avoid inconsistencies.”

     

    SOR – System of References

    SOR is more controversial. As with SOE data could be distributed over a number of systems. However it is my experiences that this is a mine filled road, only leading to chaos and inconsistencies.  A centralized repository as SOR is for my experiences the only safe road. Here a Consolidated picture can be sought and argued with strong rules to enforce diversities in hierarchies and data.  No matter what implementation level the ambitions amounts to is it imperative that the realization is compliant with a hub and spoke service orientated architecture. Otherwise the MDR scenario will only add to the Asian noodle soup.

     

    Stress Data and Process Ownership

    The breadwinner is to link the Data Quality and MDM initiatives to the enterprise’s key business drivers and combined with executives’ priorities. Only then will the business put their money where their mouth is. And only then will we are to be assured of a successful ongoing MDM program.

     
    Having made the connection between master data and business processes and identified the pain points, it’s critical to stress that data is everyone’s problem but not everyone’s responsibility Data and processes needs owners which are measured against how well their domains is managed.
     
    It is imperative that the support to enforce policies goes on top of any project or function in the company. The desire must be anchored in Corporate Management, at a level where silo concerns isn’t dominant. This scope will enforce that super optimizations within the company’s silos doesn’t hinder the harmonization of data and processes. The most likely places to find backing are in the ranks of sales executives, marketing directors etc.
     

     Appoint data stewards. These should be the people with the knowledge of the current source data and the ability to determine how to transform the source into the master-data format. In general, stewards should be appointed from the owners of each master-data source, the architects responsible for the MDM systems, and representatives from the business users of the master data.” Quote Kirk Haselden

     
     

    okrule

    "Do not measure people, without giving those means to change the reasons of the measured outcome."

    Give owners right to enforce their policies in a corporate context. Measure owners rate of success.

     

    First step towards a cure is the realization of being sick.

    First step towards a cure is the realization of being sick.

     

    It is imperative to begin the MDM journey by gaining insight into how poorly Managed Master Data affects effects performance of the business.

    This information is always present within the company’s organization but there are a number of barriers to utilize this information

    ·         It’s spread out on a number of persons which non overlapping domains.

    ·         The importance of parsing on this information is hindered by pride. Who want to disclose their dirty laundry in public?

    ·         We don’t have time do we, there is always something that has higher priority

     

    Best Method in identifying these wounds “improvement barriers” follow blood trail. Identify the pain points and setting ownership to them. These pain points can then be made operational in order to evaluate against them later on in the project.