| Jørgen 的个人资料Guldmann Fumbles with Ma...照片日志列表 | 帮助 |
Stratature hierarchy management capabilitiesWhen I first saw Stratature one of the things that caught my eye was its strong hierarchy management capabilities. This was very exiting while, in my time as a BI consultant I have encountered a lot people who were spending a significant amount of their time managing hierarchies in spreadsheets. Not only is it a time consuming task to maintain these, but also often error-prone while inconsistencies build up when more than one person access and maintain them. +EDM offers a fully featured approach to maintain ragged as well as fixed hierarchies, and with that yet another reason (and by all means let’s get more reasons) to use excel as a data source is deprecated. Bulldog (+EDM) seen from a connotative perspective.
Having received the outline of where Microsoft is taking their very exiting acquisition, and tried the product in a prior beta version (+EDM), I feel the urge to write a little about +EDM and its connectivity. How agile it is communicating with the surrounding systems on the corporate platform.
It seems obvious when using the +EDM that it’s designed for having SOE/SOR very centralised. Should one strategy be otherwise, it does offer views exposing data out of the repository, And likewise for data input it offers a table where one can batch load data from. In this blog I will try to elaborate why I believe it will fall very short on this front.
Before data can be exported out from +EDM it has to meet the rules defined. And this thought is good however. In the current state of +EDM executing rules is very slow rules while they are row based. With a surprising low amount of rules and not that many rows it takes very long time to validate your data. This cripples us building up complex rules determining the completeness level of data, and basing or determination of synchronization on this.
Mapping to external systems, like SAP seems very primitively supported. By adding the Business key from SAP as an attribute in your MDM it allows the user to manually maintain a simple link to your CODE which is the +EDM internal key. What about gearing in hierarchies? Is a manual procedure here not error prune?
Complex workflow where data has to be validated by various people seems likewise hard to implement. However this might be possible with changing rights on data. Still it seems very crude and not agile enough for a modern MDM workflow setup.
Most annoyingly is +EDM batch input/output strategy. Near real-time is what most modern MDM synchronization strategies sough’s after. Having a table where data may or may not be imported from and reading the status of the import is really inflexible.
This sounds dreadful negative, but it really isn’t. +EDM is a very exiting product which really addresses a lack in the Microsoft suite when it comes to MDM. Taking into account this is a prior beta version I’ve been testing out I am sure when the final product reaches RTM it will be much improved on this front.
FieldUseHow often have you not been a situation where you needed to reverse engineer the logical structure of an unknown database or needed to track down some data and how the relate. CREATE PROCEDURE [dbo].[FieldUse]@field sysname, @value sysname AS BEGIN DECLARE @SQL NVARCHAR(MAX) SET @SQL='' SELECT @SQL=@SQL+'SELECT '''+s.NAME+''' AS [Schema],'''+t.NAME+''' AS [Table], '''+COALESCE([collation_name],'')+''' AS [collation_name] ' +',COUNT(*) AS Occurrences FROM ['+s.[NAME]+'].['+t.NAME+'] WHERE ['+@field+']='''+@value+''';' FROM sys.columns c JOIN sys.tables t ON c.object_id = t.object_id JOIN sys.schemas s ON t.schema_id = s.schema_id WHERE c.name = @field ORDER BY s.name ,t.NAME; EXEC sp_executesql @SQL END Analytical and Operational MDM does it really make sense separating theseLast week while attending a MDM Stratature TP in Redmond, some thoughts regarding Analytical and Operational MDM struck my mind. It was 3 hectic days, covering a lot of ground on this exiting product, and I am sure a lot more will spawn to my blog as the information settles. Analytical MDM is defined as a style that has Business Intelligence at aim, where the Operational MDM has OLTP in focus. When it comes to Entities and hierarchal information’s these is mostly the same, no matter which direction which has superiority. It seems that only the process impact is influential. When looking at the in real life MDM implementations we have worked with, it makes little sense separating the two. There is always both a strong OLTP and BI drivers to step the MDM path, only timing varies. I believe it’s important not to pigeonhole MDM as a BI or operational technology, to do so makes, a mockery of the very justification for having an MDM strategy. MDM is not a quick fix, it’s the long haul, and adapting for both OLTP and BI is essential in such a timeframe. Stratature / BulldogThis I am visiting Microsoft in Redmond attending MDM sessions with the Stratature team. Their product looks really promising especially when it comes to hierarchies. In the upcoming time I will be musing on the topic. Retrieve column info from SYSSELECT tschema.[name] AS SchemaName ,ttables.[name] AS TableName ,ttypes.[name] AS DataTypeName ,ttypes.[max_length] ,ttypes.[precision] ,ttypes.[scale] FROM sys.syscolumns tsyscolumns INNER JOIN sys.tables ttables ON tsyscolumns.id = ttables.[object_id] INNER JOIN sys.schemas tschema ON ttables.schema_id = tschema.schema_id INNER JOIN sys.types ttypes ON tsyscolumns.xtype=ttypes.system_type_id Repositories - Create Table is that really that hard to write?When a company chooses MDM Repository, I have often been surprised. There is enormous focus on MDM holding master data and Hierarchical data. Which Framework does this best which can hold most values? How easy they are when a new attributes is needed. I never quite understood this focus. I mean, how hard is it, to write Create or Alter Table? Choosing a MDM Repository should in my view be made out from quite a more complex set of criteria’s.
· How well does it integrate with a service orientated architecture. Are we talking about Batch replications or a more sought after near real time push of data? · Does it Map business keys with Repository Keys? This has much relevance when feeding data back to the external systems with as little impact at them as possible. · Can it be Meta Modeled? Or do we have to accept certain limitation in terms of attributes? · Which Services does it feature? o Data washing against public repositories o Doublet spotting with phonetic functions and group o Completeness degree reporting? o Harmonization of records and values? o Constraint and translation of values? o Etc. · How well is centralized Data entry (SOE) handled? Does it allow new attributes to be edited? Do it enforce constrains when editing Hierarchical data. Security is that taken care off? who can change what, who sees what, single sign on and workflows?
There are endless more considerations to make when choosing a repository. Rather than focusing on just values.
Which kind of a data do we consider for MDR EntitiesWhich kind of a data do we consider for MDR Entities
The data types typical considered for MDR control can be divided in two categories, Hierarchical data and Master data.
Master data are the critical nouns of a business type data which often fall within these categories: Legal Entities such as companies and persons, Items such as product, demographical data such as locations, segments etc. and Legal Papers such as contracts, and other legal papers where strong typing is essential.
Hierarchical data stores the relationships between other entities. They are often modeled as helping entities that bind leading entities together and gives a understanding of the data-Hierarchical data is sometimes considered a super MDM domain, because it is critical to understanding and sometimes discovering the relationships between master data.
Which Attributes goes into a MDR EntityWhich Attributes goes into a MDR Entity
Answering this question has been and is one of the tougher nuts to crack for me. · Seen from the distributed systems data is separated into Domain Data, which really has no relevance to others and General data which more than one system could benefit from. · Seen from a Centralized System of Entry point of view should the amount of data driving the process be centralized in the MDR. · The user point of view is that he really does not want to be troubled with data that is not relevant for his domain operation. · The entity owner’s point of view should data be entered in as few steps as possible and involving the least amount of people necessary and with the strictest possible governance. · Seen from our customer’s point of view should the attributes reflect and correspond to general definitions to on the marked.
I guess answering this can only be putting aside most arguments and taking the business point of view and combining it with the Data Responsible finding a balance.
In writing… System of Reference and Entry, where do we get our data, and where should we put itSystem of Reference and Entry, where do we get our data, and where should we put it
The ambition of data consolidation in an MDM scenario must be balanced to the maturity of the organization. Distributed systems, the organization often are not ready for the sought-after big bang centralization. The Strategy can be divergent from Entity to Entity, however 2 factors has to be set always. Where should data be born (System of Entry SOE) and where should the consolidated data be read from (System of Reference SOR). SOE – System of EntrySOE handles designating the origin of entry entity data, often with attributes disunited over a number of systems. The actions entities can undergo are also assigned under SOE. Where is the entity Initiated, where does it die, and who controls the workflow. “Origin of attributes and actions must be uniquely determined to avoid inconsistencies.”
SOR – System of ReferencesSOR is more controversial. As with SOE data could be distributed over a number of systems. However it is my experiences that this is a mine filled road, only leading to chaos and inconsistencies. A centralized repository as SOR is for my experiences the only safe road. Here a Consolidated picture can be sought and argued with strong rules to enforce diversities in hierarchies and data. No matter what implementation level the ambitions amounts to is it imperative that the realization is compliant with a hub and spoke service orientated architecture. Otherwise the MDR scenario will only add to the Asian noodle soup.
Stress Data and Process OwnershipThe breadwinner is to link the Data Quality and MDM initiatives to the enterprise’s key business drivers and combined with executives’ priorities. Only then will the business put their money where their mouth is. And only then will we are to be assured of a successful ongoing MDM program. Having made the connection between master data and business processes and identified the pain points, it’s critical to stress that data is everyone’s problem but not everyone’s responsibility Data and processes needs owners which are measured against how well their domains is managed.
It is imperative that the support to enforce policies goes on top of any project or function in the company. The desire must be anchored in Corporate Management, at a level where silo concerns isn’t dominant. This scope will enforce that super optimizations within the company’s silos doesn’t hinder the harmonization of data and processes. The most likely places to find backing are in the ranks of sales executives, marketing directors etc.
“Appoint data stewards. These should be the people with the knowledge of the current source data and the ability to determine how to transform the source into the master-data format. In general, stewards should be appointed from the owners of each master-data source, the architects responsible for the MDM systems, and representatives from the business users of the master data.” Quote Kirk Haselden "Do not measure people, without giving those means to change the reasons of the measured outcome." Give owners right to enforce their policies in a corporate context. Measure owners rate of success.
First step towards a cure is the realization of being sick.First step towards a cure is the realization of being sick.
It is imperative to begin the MDM journey by gaining insight into how poorly Managed Master Data affects effects performance of the business. This information is always present within the company’s organization but there are a number of barriers to utilize this information · It’s spread out on a number of persons which non overlapping domains. · The importance of parsing on this information is hindered by pride. Who want to disclose their dirty laundry in public? · We don’t have time do we, there is always something that has higher priority
Best Method in identifying these wounds “improvement barriers” follow blood trail. Identify the pain points and setting ownership to them. These pain points can then be made operational in order to evaluate against them later on in the project. Let Master Data Management Approach your business.Let Master Data Management Approach your business.
What is master data?
Master data is the process of getting the art of getting in touch with your company’s consolidated data in a way where inconsistence’s is removed and data quality is high. This being Master data is a business problem and IT has contributed to its complexity. Succeeding with master data requires a focus on enhancing business capabilities and establishing a technical master data environment.
Putting the Management back in Master Data.
Consolidation of data is often seen as integration between system silos issue. In that scenario data is pushed around between systems to create a widespread and mostly corrupt picture. These corrupted data is targeted with more integration due to the fact that one system has a truer picture than the others. This has nothing to-do with master data and even less with management of the problem. In fact Management of master data has more to do with controlling the processes creating data. Mostly it has nothing to-do with IT systems, but fare more how we handle data.
This Blog is meant to address the approach to MDM. And the upcoming months it will gradually grow while this topic is being debated. programmatically determine the number of processors the box hasEXEC master..xp_msver N'ProcessorCount', N'ProcessorType' Finding tables with no identity columns or alternate keysSET NOCOUNT ON
SELECT 'Tables without an identity column and primary key'
SELECT schema_name(schema_id) + '.' + [name] AS tableName
FROM sys.tables AS t WHERE type_desc = 'USER_TABLE' AND (NOT EXISTS (SELECT * FROM sys.columns AS c WHERE t.object_id = c.object_id AND is_identity = 1) OR NOT EXISTS (SELECT * FROM sys.objects o WHERE t.object_id = o.parent_object_id AND type_desc = 'PRIMARY_KEY_CONSTRAINT')) SELECT 'Tables with no UNIQUE CONSTRAINT to back up the alternate key' SELECT schema_name(schema_id) + '.' + [name] AS tableName FROM sys.tables AS t WHERE type_desc = 'USER_TABLE' AND NOT EXISTS (SELECT * FROM sys.objects o WHERE t.object_id = o.parent_object_id AND type_desc = 'UNIQUE_CONSTRAINT') Show all the indexes that have been used in a given databaseThis query shows all the indexes that have been used for a given database
SELECT OBJECT_NAME(s.OBJECT_ID) AS ObjectName , s.OBJECT_ID , i.[name] AS IndexName , i.index_id , user_seeks , user_scans , user_lookups , user_updates FROM sys.dm_db_index_usage_stats s JOIN sys.indexes i ON i.object_id = s.object_id AND i.index_id = s.index_id WHERE database_id = DB_ID() AND OBJECTPROPERTY(s.OBJECT_ID , 'IsUserTable') = 1 ORDER BY (user_seeks + user_scans + user_lookups + user_updates) ASC show the average CPU time for the top 5 query execution plans in cache.Dynamic Management Functions to show the average CPU time for the top 5 query execution plans in cache.
SELECT TOP 5 total_worker_time/execution_count [Avg CPU Time] , SUBSTRING(st.[text], (qs.statement_start_offset/2)+1 , ((CASE qs.statement_end_offset WHEN -1 THEN DATALENGTH(st.[text]) ELSE qs.statement_end_offset END - qs.statement_start_offset)/2) + 1) statement_text FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st ORDER BY total_worker_time/execution_count DESC; Display the amount of used transaction log space in the current databasesUsing dynamic management views it is possible to get at the amount of log space used. In the example below "sys.dm_os_performance_counters" is used to select amount of used transaction log space in the current database
SELECT instance_name
,cntr_value 'Log File(s) Used Size (KB)'
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Log File(s) Used Size (KB)'How to gain Exclusive Access to a databaseShutting out NON ADMINS is very easy ALTER DATABASE [DATABASENAME]
SET single_user WITH ROLLBACK immediate Shutting out ADMINS
DECLARE @spid varchar(10)
SELECT @spid = spid
FROM master.sys.sysprocesses
WHERE dbid = DB_ID('DATABASENAME')
WHILE @@ROWCOUNT <> 0 BEGIN
EXEC('KILL ' + @spid)
SELECT @spid = spid
FROM master.sys.sysprocesses
WHERE dbid =DB_ID('DATABASENAME')
AND spid > @spid
END View where collections differ from the database's default collctionUnintentional Differences in collection decisions within a database can be a aggravating matter. Regaining control and overview can be almost impossible had it not been for this view that exposes all items that varies from the default collection of the database.
CREATE VIEW [Dba].[ShowAttributesWithWrongCollation]
AS
SELECT TABLE_SCHEMA
, TABLE_NAME
, COLUMN_NAME
, DATA_TYPE
, COLLATION_NAME
FROM INFORMATION_SCHEMA.COLUMNS AS c
WHERE (TABLE_NAME IN (SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE (TABLE_TYPE = 'Base Table')))
AND (COLLATION_NAME NOT LIKE (SELECT CAST(DATABASEPROPERTYEX('EDW_AMS_DEV', 'Collation') AS VARCHAR(MAX)) AS Expr1)) |
|
|