If you work in SSAS for more than 30 seconds, you will
see this error. I am writing this post
to explain one of the sneakier causes of it.
For the purposes of demonstration, I have mocked this up in
AdventureWorksDW2008R2 sample database.
The scenario is that you are processing a dimension, such
as the customer dimension, and receive an error like this:
Errors in the OLAP
storage engine: A duplicate attribute key has been found when processing:
Table: 'dbo_DimCustomer', Column: 'EnglishOccupation', Value: 'Professional'.
The attribute is 'Occupation'
This means that for the attribute occupation
, the column EnglishOccupation
returned two versions of the word “Professional.” If we look at SQL profiler, we can see that
SSAS will process the data for the attribute by sending over a query like this:
,[dbo_DimCustomer].[SpanishOccupation] AS [dbo_DimCustomerSpanishOccupation0_1]
,[dbo_DimCustomer].[FrenchOccupation] AS [dbo_DimCustomerFrenchOccupation0_2]
FROM [dbo].[DimCustomer] AS [dbo_DimCustomer]
In my version of the database, we will see the following
Note, that the word “Professional” is in there twice. Testing the data, we will find that the first
word ends with a TAB (char(9)). Tabs,
spaces, carriage returns, line feeds, non-breaking spaces – any number of
invisible characters often wind up in data from users copying and pasting from
various sources like spreadsheets or the internet. SSAS will treat these values as a duplicate,
and actually, this is a very good thing.
It would not make sense to see two words that appear the same sitting as
distinct values in a drop down or pivot table.
We try our best to clean up the spaces in the ETL, but
the trim functions in SQL do not handle any other white space characters other
than space, itself. This often requires a
little bit of help from .NET whose trim function is more robust.
I suppose there are other ways to deal with this problem. For example, you could just ignore key errors
in SSAS. I definitely warn against that approach
because it will lead to more errors, but I mention it because I have seen it
from time to time. The best approach is
to fix up the data in the ETL. Better
yet, fix the front end application that could trim out this stuff to begin
with! Once the data is cleaned up, the
cube will process.
One last note, I prepared the data for this demo with the
update dimCustomer set EnglishOccupation
WHERE CustomerKey = 11000
Labels: Data Quality, ETL, SSAS