Archive for May, 2007

Want to work for Microsoft?

Thursday, May 24th, 2007
The AS user education team are having a hard time trying to recruit someone who knows MDX. Is this something you, dear reader, would be interested in? Here's the job description I got from Neil Orint:
 

Do you have a background in Analysis Services and MDX? Are you looking to put your mark on the next version of SQL Server content deliverables? Want to work on a writing team where your development team counterparts are as passionate about your content as you are? If so, the Analysis Services User Education team is looking for an experienced technical writer to assist us in delivering top-notch customer facing technical documentation.

The successful candidate will have a strong background in technical writing, a working knowledge of MDX and OLAP; possess solid project management and planning skills and a passion for learning new technologies. Strong communication skills are a must. As a member of our team you can expect opportunities to:
· Help define and execute upon content strategies and prioritizes
· Listen, analyze and respond to customer feedback
· Learn the entire spectrum of Microsoft’s business intelligence offering - Integration Services, Analysis Services, Reporting Services, Office and more.
Qualifications
· A practical understanding of MDX and OLAP.
· A history of developing assistance content for end users in a variety of delivery formats
· The ability to learn new tools, technologies, and processes quickly and independently

Please contact: neilor@microsoft.com

 
Please email Neil on the address given if you want to discuss it further.

Want to work for Microsoft?

Thursday, May 24th, 2007
The AS user education team are having a hard time trying to recruit someone who knows MDX. Is this something you, dear reader, would be interested in? Here's the job description I got from Neil Orint:
 

Do you have a background in Analysis Services and MDX? Are you looking to put your mark on the next version of SQL Server content deliverables? Want to work on a writing team where your development team counterparts are as passionate about your content as you are? If so, the Analysis Services User Education team is looking for an experienced technical writer to assist us in delivering top-notch customer facing technical documentation.

The successful candidate will have a strong background in technical writing, a working knowledge of MDX and OLAP; possess solid project management and planning skills and a passion for learning new technologies. Strong communication skills are a must. As a member of our team you can expect opportunities to:
· Help define and execute upon content strategies and prioritizes
· Listen, analyze and respond to customer feedback
· Learn the entire spectrum of Microsoft’s business intelligence offering - Integration Services, Analysis Services, Reporting Services, Office and more.
Qualifications
· A practical understanding of MDX and OLAP.
· A history of developing assistance content for end users in a variety of delivery formats
· The ability to learn new tools, technologies, and processes quickly and independently

Please contact: neilor@microsoft.com

 
Please email Neil on the address given if you want to discuss it further.

AS2005 MDX Course Now Available

Monday, May 21st, 2007
<Shameless Advertising Plug>
 
As I've mentioned before, I do all of my training activities through Solid Quality Mentors (also known as Solid Quality Learning) and earlier this year they persuaded me to write an MDX course for them. You can see the outline on their website here:
 
If you're interested in booking this as a private course please contact Solid Quality through the address on the site. If you're in the UK you'd definitely get me teaching it and I'd probably cover certain other European countries too, but the good thing about Solid Quality is that they have a network of top-notch BI people not only in the US but also in Central America and continential Europe too, many of whom will be teaching this class as well.
 
I think there's a real need for in-depth MDX training out there - even people who know Analysis Services really well sometimes struggle with it, and you can only get so far without understanding the fundamentals. With PerformancePoint on the way the market is only going to grow and I've deliberately kept the first day or so as platform-agnostic as possible with a view to adapting the material for SAP BW, Essbase, TM1 and Mondrian MDX at some point in the future.
 
One last thing to mention: Solid Quality have also got a lot of other Microsoft BI courses if you're not just interested in MDX. Take a look:
They've even got an Analysis Services data mining course written by Dejan Sarka:
 
UPDATE: I no longer work with Solid Quality, so if you want MDX or Analysis Services training then come straight to me! You can find out more at http://www.crossjoin.co.uk/training.html
 
</Shameless Advertising Plug>
 

AS2005 MDX Course Now Available

Monday, May 21st, 2007
<Shameless Advertising Plug>
 
As I've mentioned before, I do all of my training activities through Solid Quality Mentors (also known as Solid Quality Learning) and earlier this year they persuaded me to write an MDX course for them. You can see the outline on their website here:
 
If you're interested in booking this as a private course please contact Solid Quality through the address on the site. If you're in the UK you'd definitely get me teaching it and I'd probably cover certain other European countries too, but the good thing about Solid Quality is that they have a network of top-notch BI people not only in the US but also in Central America and continential Europe too, many of whom will be teaching this class as well.
 
I think there's a real need for in-depth MDX training out there - even people who know Analysis Services really well sometimes struggle with it, and you can only get so far without understanding the fundamentals. With PerformancePoint on the way the market is only going to grow and I've deliberately kept the first day or so as platform-agnostic as possible with a view to adapting the material for SAP BW, Essbase, TM1 and Mondrian MDX at some point in the future.
 
One last thing to mention: Solid Quality have also got a lot of other Microsoft BI courses if you're not just interested in MDX. Take a look:
They've even got an Analysis Services data mining course written by Dejan Sarka:
 
UPDATE: I no longer work with Solid Quality, so if you want MDX or Analysis Services training then come straight to me! You can find out more at http://www.crossjoin.co.uk/training.html
 
</Shameless Advertising Plug>
 

Using Linear Regression to Calculate Growth

Friday, May 18th, 2007

A few blog entries back I showed the MDX I used to calculate a seasonally-adjusted total in my chalk talk at the BI conference. This is useful but if we're looking for a calculation that we can use for the Trend property of a KPI it's not the whole story - we still need to find a way of expressing how much a value is growing or shrinking over time. Although previous period growth calculations are a lot more useful with seasonally-adjusted values, we can use simple linear regression (and it has to be simple because, as I said, I'm no statistician) to do a better job.

The starting point for understanding how to use linear regression in MDX is (surprise, surprise) Mosha's blog entry on the subject:

http://sqljunkies.com/WebLog/mosha/archive/2004/12/21/5689.aspx

However, the function that's going to be most useful here is the linregslope function. If we're looking at the values in our time series and trying to find a line of best fit for those values with the equation y=ax+b, linregslope returns the value of a in that equation, ie the gradient - when the value of x increases by 1, y increases by the value of a. Here's an example of how to use it:

with member measures.gradient as
linregslope(
lastperiods(3, [Date].[Calendar].currentmember) as last3
, [Measures].[Internet Sales Amount]
,rank([Date].[Calendar].currentmember, last3)
)


select {[Measures].[Internet Sales Amount], measures.gradient} on 0,
[Date].[Calendar].[Month].members on 1
from [Adventure Works]

The trick with using this function in MDX with a time series is to be able to work out what values you want to pass in for the x axis. Here I've used the lastperiods function to get a set containing the current member on the Calendar hierarchy, the previous member on the Calendar hierarchy and the member before that, in the first parameter of the function; at the same time I've declared a named set and then used that with a rank function in the third parameter to return the values 1, 2 and 3 for each of these three members.

This gets us the slope, then, but I was thinking it would be better to express this value as a percentage - but of what? The current period's value? Or one of the preceding two periods values? I have to admit I don't know which would be correct. Can someone help me out here? Please leave a comment..

Modelling Goals and Thresholds in Measure Groups

Friday, May 18th, 2007

Before I carry on with my chalk talk series, I have to own up to something: I didn't actually want to present on the topic of KPIs, and when I found out that I was going to have to talk on the subject I fired off a few emails to people who spend more time with KPIs than I do to ask them if they could suggest some interesting things to talk about. One of these people was Nick Barclay, co-author of 'The Rational Guide to Business Scorecard Manager 2005' (which I shall be reviewing very soon - it's a good book), and he pointed out that while all the examples of KPIs he'd seen hard-coded goals and thresholds into the MDX code this was not a good thing - users want to change their values all the time and ideally you'd want to be able to let them do this themselves. Why not store these values in a measure group, allow users to change the values using writeback, and then use these values within the KPI definition somehow?

Actually modelling how the values should be stored in measure groups was very straightforward. In my demo I showed two fact tables, one for the Goals and one for the Thresholds, with one measure each. I also created a KPI dimension for both of them to allow multiple goals and thresholds for different KPIs to be stored in the same measure group; for the Goal measure group I added the Date dimension at the granularity of Calendar Year (so there was a column in the fact table containing year names) and for the Threshold fact table I also created a Threshold dimension. This Threshold dimension contained one member for each threshold to be used: Very Bad, Quite Bad, OK, Quite Good and Very Good; there was also a numeric column containing the values -1, -0.5, 0, 0.5 and 1 which represented the numeric values each threshold gets normalised to and which I assigned to the ValueColumn property of my sole attribute when I built the dimension.

One this was done and the measure groups were added to the Adventure Works cube, I showed some ways to allocate the Goal values down from the Year granularity at which they were stored. Here's the scoped assignment for the simple allocation which simply splits the values equally by the number of time periods in the year, so for example each month shows 1/12 of the year total:

SCOPE([Measures].[Goal]);
SCOPE([Date].[Calendar Semester].[Calendar Semester].MEMBERS, [Date].[Date].MEMBERS);
THIS=[Measures].[Goal]/
COUNT(
DESCENDANTS(
ANCESTOR([Date].[Calendar].CURRENTMEMBER,[Date].[Calendar].[Calendar Year])
, [Date].[Calendar].CURRENTMEMBER.LEVEL
)
);
END SCOPE;
END SCOPE;

Here's the code for doing the weighted allocation by the previous year's Internet Sales Amount measure values:

SCOPE([Measures].[Goal]);
SCOPE([Date].[Calendar Semester].[Calendar Semester].MEMBERS, [Date].[Date].MEMBERS);
THIS=[Measures].[Goal]
*
(
(PARALLELPERIOD([Date].[Calendar].[Calendar Year],1,[Date].[Calendar].CURRENTMEMBER), [Measures].[Internet Sales Amount])
/
(ANCESTOR([Date].[Calendar].CURRENTMEMBER,[Date].[Calendar].[Calendar Year]).PREVMEMBER, [Measures].[Internet Sales Amount])
);
END SCOPE;
END SCOPE;

A few things to note here:

  • In both cases, because I've set IgnoreUnrelatedDimensions to false on the measure group, to get the year's Goal measure value I can simply reference [Measures].[Goal] - the values for the year are copied down automatically to the attributes below the granularity attribute.  
  • Although normally when you assign a value to a regular measure with an additive aggregation function the assigned value gets aggregated up, when you assign to a regular measure below the granularity attribute of a dimension no aggregation happens, similar to what you get with a calculated measure.
  • The assignment SCOPE([Date].[Calendar Semester].[Calendar Semester].MEMBERS, [Date].[Date].MEMBERS) means 'scope on everything from the Date attribute (I've included the whole attribute here, All Member and the leaf level - everything on a dimension exists with either the All Member or the leaf members of the key attribute) up to and including the Calendar Semester attribute but no higher'.

Moving onto the thresholds, we need to find a way to apply the threshold values we've got in our measure group to the measure we're interested in. Here's a calculated member definition that does this:

CREATE MEMBER CurrentCube.[Measures].[Internet Sales To Goal Status] AS
TAIL(
{[Threshold].[Threshold].&[1],
FILTER(
[Threshold].[Threshold].&[2]:null
, ([Measures].[Threshold],[KPI].[KPI].&[1])
<
([Measures].[Internet Sales Amount]/[Measures].GOAL)
)}
,1).ITEM(0).MEMBERVALUE
;

What I'm doing here is creating a set that always contains the first member of the Threshold dimension, the 'Very Bad' member, and then filtering on the set that contains every other threshold to return the members for whom the threshold measure is less than the value of Internet Sales Amount. I then get the last member in that set, which represents the threshold with the highest value that is less than Internet Sales Amount, and use the MemberValue function to get the normalised value (the value between -1 and 1) that I assigned to that member.

Using Linear Regression to Calculate Growth

Friday, May 18th, 2007

A few blog entries back I showed the MDX I used to calculate a seasonally-adjusted total in my chalk talk at the BI conference. This is useful but if we're looking for a calculation that we can use for the Trend property of a KPI it's not the whole story - we still need to find a way of expressing how much a value is growing or shrinking over time. Although previous period growth calculations are a lot more useful with seasonally-adjusted values, we can use simple linear regression (and it has to be simple because, as I said, I'm no statistician) to do a better job.

The starting point for understanding how to use linear regression in MDX is (surprise, surprise) Mosha's blog entry on the subject:

http://sqljunkies.com/WebLog/mosha/archive/2004/12/21/5689.aspx

However, the function that's going to be most useful here is the linregslope function. If we're looking at the values in our time series and trying to find a line of best fit for those values with the equation y=ax+b, linregslope returns the value of a in that equation, ie the gradient - when the value of x increases by 1, y increases by the value of a. Here's an example of how to use it:

with member measures.gradient as
linregslope(
lastperiods(3, [Date].[Calendar].currentmember) as last3
, [Measures].[Internet Sales Amount]
,rank([Date].[Calendar].currentmember, last3)
)


select {[Measures].[Internet Sales Amount], measures.gradient} on 0,
[Date].[Calendar].[Month].members on 1
from [Adventure Works]

The trick with using this function in MDX with a time series is to be able to work out what values you want to pass in for the x axis. Here I've used the lastperiods function to get a set containing the current member on the Calendar hierarchy, the previous member on the Calendar hierarchy and the member before that, in the first parameter of the function; at the same time I've declared a named set and then used that with a rank function in the third parameter to return the values 1, 2 and 3 for each of these three members.

This gets us the slope, then, but I was thinking it would be better to express this value as a percentage - but of what? The current period's value? Or one of the preceding two periods values? I have to admit I don't know which would be correct. Can someone help me out here? Please leave a comment..

Modelling Goals and Thresholds in Measure Groups

Friday, May 18th, 2007

Before I carry on with my chalk talk series, I have to own up to something: I didn't actually want to present on the topic of KPIs, and when I found out that I was going to have to talk on the subject I fired off a few emails to people who spend more time with KPIs than I do to ask them if they could suggest some interesting things to talk about. One of these people was Nick Barclay, co-author of 'The Rational Guide to Business Scorecard Manager 2005' (which I shall be reviewing very soon - it's a good book), and he pointed out that while all the examples of KPIs he'd seen hard-coded goals and thresholds into the MDX code this was not a good thing - users want to change their values all the time and ideally you'd want to be able to let them do this themselves. Why not store these values in a measure group, allow users to change the values using writeback, and then use these values within the KPI definition somehow?

Actually modelling how the values should be stored in measure groups was very straightforward. In my demo I showed two fact tables, one for the Goals and one for the Thresholds, with one measure each. I also created a KPI dimension for both of them to allow multiple goals and thresholds for different KPIs to be stored in the same measure group; for the Goal measure group I added the Date dimension at the granularity of Calendar Year (so there was a column in the fact table containing year names) and for the Threshold fact table I also created a Threshold dimension. This Threshold dimension contained one member for each threshold to be used: Very Bad, Quite Bad, OK, Quite Good and Very Good; there was also a numeric column containing the values -1, -0.5, 0, 0.5 and 1 which represented the numeric values each threshold gets normalised to and which I assigned to the ValueColumn property of my sole attribute when I built the dimension.

One this was done and the measure groups were added to the Adventure Works cube, I showed some ways to allocate the Goal values down from the Year granularity at which they were stored. Here's the scoped assignment for the simple allocation which simply splits the values equally by the number of time periods in the year, so for example each month shows 1/12 of the year total:

SCOPE([Measures].[Goal]);
SCOPE([Date].[Calendar Semester].[Calendar Semester].MEMBERS, [Date].[Date].MEMBERS);
THIS=[Measures].[Goal]/
COUNT(
DESCENDANTS(
ANCESTOR([Date].[Calendar].CURRENTMEMBER,[Date].[Calendar].[Calendar Year])
, [Date].[Calendar].CURRENTMEMBER.LEVEL
)
);
END SCOPE;
END SCOPE;

Here's the code for doing the weighted allocation by the previous year's Internet Sales Amount measure values:

SCOPE([Measures].[Goal]);
SCOPE([Date].[Calendar Semester].[Calendar Semester].MEMBERS, [Date].[Date].MEMBERS);
THIS=[Measures].[Goal]
*
(
(PARALLELPERIOD([Date].[Calendar].[Calendar Year],1,[Date].[Calendar].CURRENTMEMBER), [Measures].[Internet Sales Amount])
/
(ANCESTOR([Date].[Calendar].CURRENTMEMBER,[Date].[Calendar].[Calendar Year]).PREVMEMBER, [Measures].[Internet Sales Amount])
);
END SCOPE;
END SCOPE;

A few things to note here:

  • In both cases, because I've set IgnoreUnrelatedDimensions to false on the measure group, to get the year's Goal measure value I can simply reference [Measures].[Goal] - the values for the year are copied down automatically to the attributes below the granularity attribute.  
  • Although normally when you assign a value to a regular measure with an additive aggregation function the assigned value gets aggregated up, when you assign to a regular measure below the granularity attribute of a dimension no aggregation happens, similar to what you get with a calculated measure.
  • The assignment SCOPE([Date].[Calendar Semester].[Calendar Semester].MEMBERS, [Date].[Date].MEMBERS) means 'scope on everything from the Date attribute (I've included the whole attribute here, All Member and the leaf level - everything on a dimension exists with either the All Member or the leaf members of the key attribute) up to and including the Calendar Semester attribute but no higher'.

Moving onto the thresholds, we need to find a way to apply the threshold values we've got in our measure group to the measure we're interested in. Here's a calculated member definition that does this:

CREATE MEMBER CurrentCube.[Measures].[Internet Sales To Goal Status] AS
TAIL(
{[Threshold].[Threshold].&[1],
FILTER(
[Threshold].[Threshold].&[2]:null
, ([Measures].[Threshold],[KPI].[KPI].&[1])
<
([Measures].[Internet Sales Amount]/[Measures].GOAL)
)}
,1).ITEM(0).MEMBERVALUE
;

What I'm doing here is creating a set that always contains the first member of the Threshold dimension, the 'Very Bad' member, and then filtering on the set that contains every other threshold to return the members for whom the threshold measure is less than the value of Internet Sales Amount. I then get the last member in that set, which represents the threshold with the highest value that is less than Internet Sales Amount, and use the MemberValue function to get the normalised value (the value between -1 and 1) that I assigned to that member.

An MDX Challenge: Debtor Days

Friday, May 11th, 2007
As I said in my previous post, last night's event at the Experience Music Project was good fun. There was live music, free booze and of course - and this is probably evidence that I need to be taken away by the men in white coats - conversation naturally turned to the topic of tricky MDX problems. Richard Halliday (I hope I've got your name right) came up with an interesting calculation for me which he referred to as 'debtor days': if I understood correctly, he had a cube with a measure containing values of individual debts incurred by customers and was interested in finding out for any given customer and any given day, the minimum number of days it took from the current date backwards in time for the total of that customer's debts to reach a given value. After digesting it overnight I had a go at implementing it this morning and found that it was a really fascinating problem - although getting the right value is fairly tricky and worth discussing, there are some great opportunities for optimisation too which I wanted to blog about.
 
First let's translate the problem into Adventure Works: we'll use the Date dimension and the Customer dimension, and find out how many days you need to go back from the current date for the current member on the [Customer Geography] user hierarchy for the cumulative total of Internet Sales Amount to exceed 10000. Here's my sample query with my calculated member:
 

with member measures.daysto10000 as
iif(
count(nonempty(null:{{[Date].[Date].currentmember} as mycurrentdate}.item(0),[Measures].[Internet Sales Amount]) as mynonemptydates)=0,
null,
iif(isempty([Measures].[Internet Sales Amount]) and (not isempty((measures.daysto10000, [Date].[Date].currentmember.prevmember)))
, (measures.daysto10000, [Date].[Date].currentmember.prevmember)+1,
iif(
count({{} as myresults,
generate(
mynonemptydates
, iif(count(myresults)=0,
iif(
sum(subset(mynonemptydates, count(mynonemptydates)-mynonemptydates.currentordinal), [Measures].[Internet Sales Amount]) > 10000
, {mynonemptydates.item(count(mynonemptydates)-mynonemptydates.currentordinal)} as myresults
, myresults)
, myresults
)
)
})=0, null,
count(myresults.item(0): mycurrentdate.item(0))-1
)
)
)

select
descendants
([Date].[Calendar].[Calendar Year].&[2004]
,
[Date].[Calendar].[Date]) on 0,
non empty
descendants
(
[Customer].[Customer Geography].[State-Province].&[HH]&[DE]
,
[Customer].[Customer Geography].[Postal Code])
on 1
from [Adventure Works]
where(measures.daysto10000)

On a cold cache this executes in a touch under 15 seconds on my laptop. The select statement puts all of the days in the year 2004 on columns, all of the postal codes in Hamburg on rows, and slices on the calculated measure defined in the with clause. Here are some things to notice about the calculated measure:

  • The outermost iif simply says that if the set of dates from the start of the Date level to the current date contains no values at all for Internet Sales Amount, then return null. If there are values then the set of dates with values is stored in the named set mynonemptydates, declared inline.
  • The next level of iif represents a recursive calculation, and I found that this was one of the extra touches that made a big difference to performance. It says that if the current date has no value for Internet Sales Amount but the value of the calculated measure is not null for the previous day, then simply add one to the value of the calculated measure from the previous day - this avoids a lot of extra work later on.
  • The next level of iif is where I do the main part of the calculation and this is going to need a lot of explaining... Put simply, I've already got from step 1 a set of members representing the dates from the start of time to the current date which have values in and what I want to do is loop through that set from the end backwards doing a cumulative sum, stopping when that sum reaches 10000 and then taking the date I've stopped at and finding the number of days from that date to the current date. Originally I attempted the problem like this:

    iif(
    count(
    tail(
    filter(
    mynonemptydates
    , sum(subset(mynonemptydates, mynonemptydates.currentordinal-1), [Measures].[Internet Sales Amount]) > 10000
    )
    ,1) as myresults
    )=0, null,
    count(myresults.item(0): mycurrentdate.item(0))-1

    Here I'm filtering the entire set to get the set of all dates where the sum from the current date to the end of the set is greater than 10000, then getting the last item in that set. This seemed inelegant though - if we had a large set then potentially we'd be doing the expensive sum a lot of times we didn't need to do it. It seemed better to loop through the set backwards and then somehow be able to stop the loop when I'd reached the first member which fulfilled my filter criteria. But how was this going to be possible in MDX? I didn't manage it completely, but I did work out a way of stopping doing the expensive calculation as soon as I'd found the member I was looking for. Let's take a look at the specific section from the main query above:

    iif(
    count({{} as myresults,
    generate(
    mynonemptydates
    ,
    iif(count(myresults)=0,
    iif(
    sum(subset(mynonemptydates, count(mynonemptydates)-mynonemptydates.currentordinal), [Measures].[Internet Sales Amount]) > 10000
    , {mynonemptydates.
    item(count(mynonemptydates)-mynonemptydates.currentordinal)} as myresults
    , myresults)
    , myresults
    )
    )
    })=0,
    null,
    count(myresults.item(0): mycurrentdate.item(0))-1
    )


    What I do first is declare an empty set inline called myresults. I then use the generate function to loop through the set nonemptydates. The first thing you'll see after the generate is an iif checking if the count of myresults is 0, and the first time we run this check it will be so we need to do our cumulative sum. Because generate loops from the start of a set to the end, and I want to go in the other direction, I get the current ordinal of the iteration and then find the cumulative sum from the item that is that number of members away from the end of the set up to the end of the set. Once I've got the cumulative sum I can check if it is greater than 10000; if it is, then I return a set from the iif statement and at the same time overwrite the declaration of myresults with a set of the same name which now contains that one member. As a result, at all subsequent iterations the test count(myresults) returns 1 and I don't try and do the cumulative sum again. I was quite pleased at finding I could do this - I hadn't realised it was possible. It only makes about 0.5 seconds difference to the overall query performance though.

  • Finally, on the last line of the calculated measure I can take the member I've got in the set myresults and using the range operator find the number of days between it and the current date, which I've also stored in a named set called mycurrentdate.

Pretty fun, eh? No, please don't answer that question. But if you can think of an alternative, better-performing way of solving this problem I would love to hear it...

UPDATE: it turns out that Richard Tkachuk not only had a go at the same problem (although his interpretation of what it is is slightly different) but got just as excited about it as I did, and wrote up his findings here:
http://www.sqlserveranalysisservices.com/OLAPPapers/ReverseRunningSum.htm

An MDX Challenge: Debtor Days

Friday, May 11th, 2007
As I said in my previous post, last night's event at the Experience Music Project was good fun. There was live music, free booze and of course - and this is probably evidence that I need to be taken away by the men in white coats - conversation naturally turned to the topic of tricky MDX problems. Richard Halliday (I hope I've got your name right) came up with an interesting calculation for me which he referred to as 'debtor days': if I understood correctly, he had a cube with a measure containing values of individual debts incurred by customers and was interested in finding out for any given customer and any given day, the minimum number of days it took from the current date backwards in time for the total of that customer's debts to reach a given value. After digesting it overnight I had a go at implementing it this morning and found that it was a really fascinating problem - although getting the right value is fairly tricky and worth discussing, there are some great opportunities for optimisation too which I wanted to blog about.
 
First let's translate the problem into Adventure Works: we'll use the Date dimension and the Customer dimension, and find out how many days you need to go back from the current date for the current member on the [Customer Geography] user hierarchy for the cumulative total of Internet Sales Amount to exceed 10000. Here's my sample query with my calculated member:
 

with member measures.daysto10000 as
iif(
count(nonempty(null:{{[Date].[Date].currentmember} as mycurrentdate}.item(0),[Measures].[Internet Sales Amount]) as mynonemptydates)=0,
null,
iif(isempty([Measures].[Internet Sales Amount]) and (not isempty((measures.daysto10000, [Date].[Date].currentmember.prevmember)))
, (measures.daysto10000, [Date].[Date].currentmember.prevmember)+1,
iif(
count({{} as myresults,
generate(
mynonemptydates
, iif(count(myresults)=0,
iif(
sum(subset(mynonemptydates, count(mynonemptydates)-mynonemptydates.currentordinal), [Measures].[Internet Sales Amount]) > 10000
, {mynonemptydates.item(count(mynonemptydates)-mynonemptydates.currentordinal)} as myresults
, myresults)
, myresults
)
)
})=0, null,
count(myresults.item(0): mycurrentdate.item(0))-1
)
)
)

select
descendants
([Date].[Calendar].[Calendar Year].&[2004]
,
[Date].[Calendar].[Date]) on 0,
non empty
descendants
(
[Customer].[Customer Geography].[State-Province].&[HH]&[DE]
,
[Customer].[Customer Geography].[Postal Code])
on 1
from [Adventure Works]
where(measures.daysto10000)

On a cold cache this executes in a touch under 15 seconds on my laptop. The select statement puts all of the days in the year 2004 on columns, all of the postal codes in Hamburg on rows, and slices on the calculated measure defined in the with clause. Here are some things to notice about the calculated measure:

  • The outermost iif simply says that if the set of dates from the start of the Date level to the current date contains no values at all for Internet Sales Amount, then return null. If there are values then the set of dates with values is stored in the named set mynonemptydates, declared inline.
  • The next level of iif represents a recursive calculation, and I found that this was one of the extra touches that made a big difference to performance. It says that if the current date has no value for Internet Sales Amount but the value of the calculated measure is not null for the previous day, then simply add one to the value of the calculated measure from the previous day - this avoids a lot of extra work later on.
  • The next level of iif is where I do the main part of the calculation and this is going to need a lot of explaining... Put simply, I've already got from step 1 a set of members representing the dates from the start of time to the current date which have values in and what I want to do is loop through that set from the end backwards doing a cumulative sum, stopping when that sum reaches 10000 and then taking the date I've stopped at and finding the number of days from that date to the current date. Originally I attempted the problem like this:

    iif(
    count(
    tail(
    filter(
    mynonemptydates
    , sum(subset(mynonemptydates, mynonemptydates.currentordinal-1), [Measures].[Internet Sales Amount]) > 10000
    )
    ,1) as myresults
    )=0, null,
    count(myresults.item(0): mycurrentdate.item(0))-1

    Here I'm filtering the entire set to get the set of all dates where the sum from the current date to the end of the set is greater than 10000, then getting the last item in that set. This seemed inelegant though - if we had a large set then potentially we'd be doing the expensive sum a lot of times we didn't need to do it. It seemed better to loop through the set backwards and then somehow be able to stop the loop when I'd reached the first member which fulfilled my filter criteria. But how was this going to be possible in MDX? I didn't manage it completely, but I did work out a way of stopping doing the expensive calculation as soon as I'd found the member I was looking for. Let's take a look at the specific section from the main query above:

    iif(
    count({{} as myresults,
    generate(
    mynonemptydates
    ,
    iif(count(myresults)=0,
    iif(
    sum(subset(mynonemptydates, count(mynonemptydates)-mynonemptydates.currentordinal), [Measures].[Internet Sales Amount]) > 10000
    , {mynonemptydates.
    item(count(mynonemptydates)-mynonemptydates.currentordinal)} as myresults
    , myresults)
    , myresults
    )
    )
    })=0,
    null,
    count(myresults.item(0): mycurrentdate.item(0))-1
    )


    What I do first is declare an empty set inline called myresults. I then use the generate function to loop through the set nonemptydates. The first thing you'll see after the generate is an iif checking if the count of myresults is 0, and the first time we run this check it will be so we need to do our cumulative sum. Because generate loops from the start of a set to the end, and I want to go in the other direction, I get the current ordinal of the iteration and then find the cumulative sum from the item that is that number of members away from the end of the set up to the end of the set. Once I've got the cumulative sum I can check if it is greater than 10000; if it is, then I return a set from the iif statement and at the same time overwrite the declaration of myresults with a set of the same name which now contains that one member. As a result, at all subsequent iterations the test count(myresults) returns 1 and I don't try and do the cumulative sum again. I was quite pleased at finding I could do this - I hadn't realised it was possible. It only makes about 0.5 seconds difference to the overall query performance though.

  • Finally, on the last line of the calculated measure I can take the member I've got in the set myresults and using the range operator find the number of days between it and the current date, which I've also stored in a named set called mycurrentdate.

Pretty fun, eh? No, please don't answer that question. But if you can think of an alternative, better-performing way of solving this problem I would love to hear it...

UPDATE: it turns out that Richard Tkachuk not only had a go at the same problem (although his interpretation of what it is is slightly different) but got just as excited about it as I did, and wrote up his findings here:
http://www.sqlserveranalysisservices.com/OLAPPapers/ReverseRunningSum.htm