Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table…

//Mark Wojciechowicz

//2013-01-15

If you work in SSAS for more than 30 seconds, you will see this error. I am writing this post to explain one of the sneakier causes of it. For the purposes of demonstration, I have mocked this up in AdventureWorksDW2008R2 sample database.

The scenario is that you are processing a dimension, such as the customer dimension, and receive an error like this:

Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'dbo_DimCustomer', Column: 'EnglishOccupation', Value: 'Professional'. The attribute is 'Occupation'.

This means that for the attribute occupation, the column EnglishOccupation returned two versions of the word “Professional.” If we look at SQL profiler, we can see that SSAS will process the data for the attribute by sending over a query like this:

SELECT

DISTINCT

[dbo_DimCustomer].[EnglishOccupation] AS [dbo_DimCustomerEnglishOccupation0_0]

,[dbo_DimCustomer].[SpanishOccupation] AS [dbo_DimCustomerSpanishOccupation0_1]

,[dbo_DimCustomer].[FrenchOccupation] AS [dbo_DimCustomerFrenchOccupation0_2]

FROM [dbo].[DimCustomer] AS [dbo_DimCustomer]

In my version of the database, we will see the following results:

Note, that the word “Professional” is in there twice. Testing the data, we will find that the first word ends with a TAB (char(9)). Tabs, spaces, carriage returns, line feeds, non-breaking spaces – any number of invisible characters often wind up in data from users copying and pasting from various sources like spreadsheets or the internet. SSAS will treat these values as a duplicate, and actually, this is a very good thing. It would not make sense to see two words that appear the same sitting as distinct values in a drop down or pivot table.

We try our best to clean up the spaces in the ETL, but the trim functions in SQL do not handle any other white space characters other than space, itself. This often requires a little bit of help from .NET whose trim function is more robust.

I suppose there are other ways to deal with this problem. For example, you could just ignore key errors in SSAS. I definitely warn against that approach because it will lead to more errors, but I mention it because I have seen it from time to time. The best approach is to fix up the data in the ETL. Better yet, fix the front end application that could trim out this stuff to begin with! Once the data is cleaned up, the cube will process.

One last note, I prepared the data for this demo with the following statement:

update dimCustomer set EnglishOccupation = EnglishOccupation + char(9)

WHERE CustomerKey = 11000

Hope this is helpful!

Labels: Data Quality, ETL, SSAS