Yes, that's me - and this is a reference post because frankly, this stuff is too complicated to keep in my head for how frequently I use it.The Problem
|The SSAS Idiot|
When you get the "Errors in the OLAP storage engine: The attribute key cannot be found when processing." error returned when you're trying to process an SSAS cube, what does that really mean? I have a previous post
where I went into some more technical reasons that can cause this error, but there's usually a pretty simple cause related to how you've structured your dimension in the designer... your attribute relationships. Attribute Relationships
Specifying good attribute relationships are something that the experts say are a very basic and necessary step to configuring a cube that performs well. Attribute relationships tell SSAS that it can take some shortcuts when processing your cube. What kind of shortcuts? Something like telling it that since "British Columbia", "Alberta", "Saskatchewan", and all the other provinces are always in "Canada", if you (or a user) asks for a total sales number for Canada, it doesn't have to add up all the invoice line items for Canada - it can go add up the numbers it's stored already for total sales in each province.
So yes, it's very good to specify attribute relationships! It can turn a million row operation into a ten row operation, which results in better query performance, reduced storage requirements, faster cube processing, and more. It's a best practice to relate attributes that you've placed in a dimension Hierarchy - because they're going to be used in aggregations very often.How Are Attribute Relationships Related to the Problem? (Pun Intended)
In the example of provinces and countries I provided, it seems like you'd never have a problem - it's straightforward and simple, right? Not so fast. You could have bad data... or not such a simple relationship. We'll look at both in turn.
What do I mean by "bad data"? I mean that your data warehouse could have business entities that have bad address information delivered to it by your OLTP system(s). (You're shocked, I know.) Bad how? Perhaps address information is incomplete - it only has the country, they didn't bother to fill in the street, city, or province. Bad, bad, OLTP data entry validation! But it happens... all too often. This means that a "blank" province could be located in multiple countries.
What do I mean by "not a simple relationship"? Let's take another example - cities. There's only one Vancouver, right? The one in British Columbia, Canada. Not so fast... Vancouver, WA
would be having words about that assertion. Other city names are much more prevalent
, and easily demonstrate this particular issue.Deciphering SSAS's Error Message
OK - so you understand the problem when I explain it with my example... but how do you decipher SSAS's message to figure out how your attribute relationships and your data are causing problems? Here's the long explanation - I'll cover it again more briefly at the conclusion of the post. My error message looks like this:Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'dbo_vDIM_CUSTOMER', Column: 'ship_to_city', Value: 'Greenland'. The attribute is 'Ship To City'.
To genericize it, I'll replace the actual values with placeholder names that I'll use going forward:Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'DimensionTable', Column: 'AttributeColumn', Value: 'AttributeValue'. The attribute is 'AttributeName'.Find the "Source Attribute"
Breaking it down, this message means that the attribute in your SSAS dimension called AttributeName, and that's where the problem is. We'll refer to this as the Source Attribute.Find the "Related Attribute"
Open BIDS open the dimension that didn't process, and switch to the Attribute Relationships tab - the one with the graph that looks like this (based on my example):
(Now, why the "Ship To City" attribute is in a separate box from the "Dim Customer Key" attribute, but the "Ship To State Province Code" isn't is not something I can explain, so I won't.) Looking at the list view (down below), I see this:
The important things to notice is that the Source Attribute (in my case) was Ship To City, and it's being "rolled up" or "is contained in" the Ship To State Province Code. In your case, you need to find the Source Attribute in your relationship graph or list, and find out what other attributes it's related to. (There might be more than one relationship where your Source Attribute is on the left-hand-side of an arrow. You'll have to investigate each relationship individually.)Find the "Source Column" in the Dimension Table
Your Source Attribute came from somewhere in your underlying dimensional database - find out where. You're given a hint in the error message - the AttributeColumn.Find the "Related Column" in the Dimension Table
Just like the Source Attribute, the Related Attribute comes from a dimensional database table - find out where. You'll likely have to switch to the Dimension Structure tab of the dimension editor, select the Related Attribute, open the Properties window (F4), and look at the KeyColumns property.
Now you're ready to query your data.Querying Your Data
You now have three critical pieces of information: the Source Column, Related Column, and AttributeValue. In my case, the Source Column was "ship_to_city" in my vDIM_CUSTOMER view, the Related Column was "ship_to_state_province_code" in the same view, and the offending AttributeValue was 'Greenland'. To find out why SSAS is having a problem, I need to run this query:
SELECT ship_to_city, ship_to_state_province_code, COUNT(*)
WHERE ship_to_city = 'Greenland'
GROUP BY ship_to_city, ship_to_state_province_code
To genericize the query and relate it to those placeholder names we've been using, it should look like this:
SELECT [Source Attribute], [Related Attribute], COUNT(*)
WHERE [Source Attribute] = AttributeValue
GROUP BY [Source Attribute], [Related Attribute]
When I run that, I got 2 rows back - it shows quite clearly what my problem was:
Yep - OLTP addresses where sometimes the city was specified on a customer record, but the state or province wasn't. In your case, you should get at LEAST two rows from your query. If you don't - then this isn't your problem... maybe you have one of these problems
instead.Knowing Is Half The Battle
Fantastic, now that you know where the problem is, you can fix it. How? One of two ways:
- Decide it's a "data problem" and rectify the underlying data. Tell a business user to fix the offending address, tell the OLTP developer to add validation to customer address entry, and/or change your ETL process to clean the data.
- Decide the data will never get "clean" - because nobody will "own" the dirt, or because it's a valid part of the business. This happens very often.
In my case, the data you see is
dirty. But the offending business data is very old, and not worth cleaning up. If I did spend the time to clean up the blanks, I would still be unable to process my cube, because I would find "Springfield" in multiple states, etc... So even though it is dirty, it is also a business reality that cities don't uniquely belong to state/provinces.The Other Half - Fixing The Attributes
So if I can't clean the data because it is (mostly) how it's supposed to be... how can I model it in SSAS? I recognize that certain cities are related to multiple states and provinces... but I still want to (and should) help SSAS optimize it's processing.
I'm sure there are several ways (there always are) - but the following works for me. If an SSAS expert would like to chime in, please do!Recognize That "AttributeColumn" Isn't Unique
We have to recognize that the Source Attribute doesn't uniquely identify the Reference Attribute. Just because I say "Vancouver" doesn't always mean I'm referring to the city in British Columbia. I'm going to have to uniquely identify cities in order to roll them up into states and provinces, just like you'll have to uniquely identify Source Attributes in order to roll them up into Reference Attributes.
How do we do that?
Inside BIDS, open the Dimension and select the Dimension Structure tab. Select the Source Attribute from the list and open the Properties window (F4).
Select the KeyColumn property, and click the ellipsis button (...) to open the Key Column selection dialog.
Pick another column that together with the Source Attribute will uniquely identify a Reference Attribute. I've chosen to select the Reference Attribute column itself. You may choose to select another column or columns that will uniquely identify the Reference Attribute.
Since we've just identified two columns to be the key for this attribute, SSAS can't use the default "name" for attribute members (the KeyColumn itself). It doesn't know how to display specific members of this attribute - you've got to tell it by editing the NameColumn property - just under the KeyColumns property we just changed. You may decide to simply choose the Source Attribute column (as I did).
The dimension will now successfully process... unless you've got another "duplicate attribute"!Success! Let's Recap
The short version of what we just did was:
We deconstructed the error message to identify the Source Attribute
We examined the Attribute Relationship tab of the dimension to identify the Reference Attribute
We verified that there was a many-to-one relationship between the Source and Reference Attribute
We made the Source Attribute map one-to-one to the Reference Attribute by including the reference column in the KeyColumns for the Source Attribute.
What I don't know is whether this is the "best" general advice to give, or if there's a better rule of thumb for resolving this issue. Until I do know, I'll be referring back to this step-by-step quite often, I'm sure...