More Advanced MDX Queries

From SQLServerPedia

Jump to: navigation, search
See Also: Main_Page - Database Administration - Analysis Services - MDX with Analysis Services


Queries you will see in this section involve syntax elements that you will encounter most often; these aren't necessarily complicated but go beyond the basic syntax that simply lists certain members or rows and others on columns.

Contents

CROSSJOIN



The CROSSJOIN function allows you to combine possible combinations of two sets. This is a powerful function that allows analyzing your measures by multiple dimensions on the same view. For example, suppose you wanted to see the breakdown of sales by product category and by year, the following query will help:

SELECT CROSSJOIN({[date].[cy 2001]:[date].[cy 2002] },
{[product].[category].members})
 ON ROWS,
{[measures].[order count]} ON COLUMNS
FROM [adventure works]


Results:

Order Count
CY 2001 all products 1,379
CY 2001 Accessories 135
CY 2001 Bikes 1,358
CY 2001 Clothing 242
CY 2001 Components 205
CY 2002 all products 3,692
CY 2002 Accessories 356
CY 2002 Bikes 3,527
CY 2002 Clothing 644
CY 2002 Components 702


The CROSSJOIN function essentially multiplies one set by another. In fact, MSAS supports an alternative syntax for writing the same query by replacing the CROSSJOIN function with a multiplication sign; the following query would return the same results:

SELECT
{[date].[cy 2001]:[date].[cy 2002] } * {[product].[category].members}     ON ROWS,
{[measures].[order count]} ON COLUMNS
FROM [adventure works]
   

The syntax of the function is CROSSJOIN({set1}, {set2}). Note that you can nest CROSSJOIN functions; the CROSSJOIN function returns a set which you can cross join against another set. For example, let's suppose that you wanted to further breakdown the sales during 2001 and 2002 by customer education level in addition to listing product categories, the query would look like the following:

SELECT CROSSJOIN({[date].[cy 2001]:[date].[cy 2002] },
CROSSJOIN({[product].[category].members},
{[customer].[education].members}))     
      ON ROWS,
{[measures].[order count]} ON COLUMNS
FROM [adventure works]

FILTER



The FILTER function allows you to reduce the result set by specifying some criteria. This function is particularly useful if your result set is large and difficult to decipher. For example, suppose you have five thousand members in your product dimension. Instead of presenting all product sales you could return only those products that have sold at least 3000 items by using the following query:

SELECT FILTER({[product].[product].members},
[Measures].[order count]>3000) ON ROWS,
{[measures].[order count]} ON COLUMNS
FROM [adventure works]


Results:

Order Count
All Products 31,455
Water Bottle - 30 oz. 4,688
Patch Kit/8 Patches 3,354
Mountain Tire Tube 3,095


Note that unlike Transact-SQL which would limit the result set by the WHERE clause MDX limits the output using the FILTER function included with the SELECT statement.



The FILTER function requires two arguments, a set that needs to be limited and the criteria for limiting the set. Note that you can use multiple criteria within the same FILTER function. For example, the following statement checks for a total order count greater than 3000 OR an internet order count of 1000:

SELECT FILTER({[product].[product].members},
[measures].[order count]>3000 OR [measures].[internet order count]>1000) ON ROWS,
{[measures].[order count]} ON COLUMNS
FROM [adventure works]


You can use a set returned by other functions as the first argument of the FILTER function. For example, you can apply FILTER to a result of the CROSSJOIN. You could extend the previous example to return only those product categories that sold over 1000 orders in either 2001 or 2002:

SELECT FILTER(CROSSJOIN({[date].[cy 2001]:[date].[cy 2002] },
{[product].[category].members}), [measures].[order count]>1000)
  ON ROWS,
{[measures].[order count]} ON COLUMNS
FROM [adventure works]


Results:

Order Count
CY 2001 All Products 1,379
CY 2001 Bikes 1,358
CY 2002 All Products 3,692
CY 2002 Bikes 3,527


The next example uses the FILTER function to return 2004 order count only for those products whose sales have improved in 2004 as compared to 2002:

SELECT FILTER(
{[product].[category].members},
([date].[cy 2004],[measures].[order count])>
([date].[cy 2002], [measures].[order count]))
 ON ROWS,
{[measures].[order count]} ON COLUMNS
FROM [adventure works]
WHERE ([date].[cy 2004])


Results:

Order Count
All Products 13,944
Accessories 10,950
Bikes 6,529
Clothing 5,012


You could also use the FILTER function if you wanted to eliminate "All Products" from the result set; the query would look as follows:

SELECT FILTER( {[product].[category].members}, ([date].[cy 2004],[measures].[order count])> ([date].[cy 2002], [measures].[order count]) AND [product].[category].CurrentMember.Name<>'all products') ON ROWS, {[measures].[order count]} ON COLUMNS FROM [adventure works]

WHERE ([date].[cy 2004])

ORDER



The ORDER function allows sorting of the query result set by the returned data values. This function takes three arguments: the set to be ordered, string or numeric expression to be used as the sorting criteria, and optional flag that specifies whether the results should be ordered in ascending or descending fashion.



The following query returns 2003 order counts for product categories:


SELECT {[product].[category].members}      ON ROWS,
{[measures].[order count]} ON COLUMNS
FROM [adventure works]
WHERE [date].[cy 2003]


Results:

Order Count
All Products 12,440
Accessories 8,082
Bikes 6,944
Clothing 3,973
Components 1,138


Now let's see how you can sort it in ascending order:

SELECT ORDER({[product].[category].members}, [measures].[order count], ASC) ON ROWS, {[measures].[order count]} ON COLUMNS FROM [adventure works]

WHERE [date].[cy 2003]


Results:

Order Count
All Products 12,440
Components 1,138
Clothing 3,973
Bikes 6,944
Accessories 8,082


Note however, that "all products" still appears at the top of the list, even though the query was supposed to place it at the bottom. This is because by default MSAS keeps the hierarchy in tact - since all products is the "parent" of all other categories it still appears at the top. You can override the default behavior by specifying BASC ("B" stands for break hierarchy) flag instead of ASC. When you add BASC to the query the results will look as follows:

Order Count
Components 1,138
Clothing 3,973
Bikes 6,944
Accessories 8,082
All Products 12,440


Similarly to break the hierarchy when sorting in descending order you must specify BDESC flag instead of DESC.



The next example is somewhat more complicated; it retrieves 2002 sales for those categories whose sales have improved in 2002 as compared to 2004. In addition the query sorts the output based on 2003 sales:

SELECT ORDER(FILTER( {[product].[category].members}, ([date].[cy 2004],[measures].[order count])> ([date].[cy 2002], [measures].[order count])), ([measures].[order count], [date].[cy 2003]), BDESC) ON ROWS, {[measures].[order count]} ON COLUMNS FROM [adventure works]

WHERE ([date].[cy 2002])


Results:

Order Count
All Products 3,692
Accessories 356
Bikes 3,527
Clothing 644


At first the results of this query might look confusing. But if you examine the previous query results you will notice that accessories indeed had more orders than bikes in 2003. Since this result set is ordered based on 2003 sales the order count for accessories appears above bikes.



You could also use the ORDER function to sort the items based on a string. For example the following query will return the output based on product name in descending order:

SELECT ORDER({[product].[category].members}, [product].[category].CurrentMember.name, BDESC) ON ROWS, {[measures].[order count]} ON COLUMNS FROM [adventure works]

WHERE [date].[cy 2003]


Results:

Order Count
Components 1,138
Clothing 3,973
Bikes 6,944
All Products 12,440
Accessories 8,082

TOPCOUNT and BOTTOMCOUNT



TOPCOUNT and BOTTOMCOUNT functions return top or bottom N members of a dimension optionally ordered based on a specified measure. For example, to return the top ten products from the product dimension and their respective orders you could write the following:

SELECT TOPCOUNT({[product].[product].members}, 10) ON ROWS, {[measures].[order count]} ON COLUMNS

FROM [adventure works]


Results:

Order Count
All Products 31,455
Adjustable Race (null)
Bearing Ball (null)
BB Ball Bearing (null)
Headset Ball Bearings (null)
Blade (null)
LL Crankarm (null)
ML Crankarm (null)
HL Crankarm (null)
Chainring Bolts (null)


Note that MSAS does not attempt to order products alphabetically - it simply returns the top 10 products it finds in the product dimension. Next let's see how you could retrieve top 10 products based on respective order counts:

SELECT TOPCOUNT({[product].[product].members}, 10, [measures].[order count]) ON ROWS, {[measures].[order count]} ON COLUMNS FROM [adventure works]

WHERE ([date].[cy 2004])


Results:

Order Count
All Products 13,944
Water Bottle - 30 oz. 2,694
Patch Kit/8 Patches 1,889
Mountain Tire Tube 1,782
AWC Logo Cap 1,504
Sport-100 Helmet, Red 1,462
Sport-100 Helmet, Blue 1,434
Road Tire Tube 1,377
Sport-100 Helmet, Black 1,352
Fender Set Mountain 1,238


BOTTOMCOUNT would do the exact opposite - return the top 10 items with the least number of orders. However the BOTTOMCOUNT function isn't always associated with negative performers; for example you could use the BOTTOMCOUNT function to retrieve shipping companies that provide shortest (and therefore best) delivery times.

TOPPERCENT and BOTTOMPERCENT



TOPPERCENT and BOTTOMPERCENT functions are similar to their counterparts that return counts. However, these functions require the numeric expression (measure); they return the topmost (or bottommost) members whose cumulative total is above or equal to the specified percentage. For example, to return products whose cumulative number of orders is 15 percent of all orders (excluding "all products" from the total) you could run the following:

SELECT TOPPERCENT( FILTER({[product].[product].members}, product.product.CurrentMember.Name <>'all products') ,15, [measures].[order count]) ON ROWS, {[measures].[order count]} ON COLUMNS

FROM [adventure works]


Results:

Order Count
Water Bottle - 30 oz. 4,688
Patch Kit/8 Patches 3,354
Mountain Tire Tube 3,095
AWC Logo Cap 2,650
Sport-100 Helmet, Red 2,566
Sport-100 Helmet, Blue 2,503

TOPSUM and BOTTOMSUM



TOPSUM and BOTTOMSUM functions work similarly to TOPPERCENT. TOPSUM sorts the set and returns thw topmost members whose cumulative total is above or equal to a specified value. BOTTOMSUM returns the bottommost members whose cumulative total is above or equal to the specified value. For example, the following query will return products that have had the least number of orders and their cumulative total is 10:

SELECT BOTTOMSUM( FILTER({[product].[product].members}, NOT ISEMPTY ([measures].[order count])) ,10, [measures].[order count]) ON ROWS, {[measures].[order count]} ON COLUMNS

FROM [adventure works]


Results:

Order Count
LL Road Frame - Black, 60 1
LL Touring Frame - Blue, 58 2
ML Mountain Frame-W - Silver, 38 5
LL Road Seat/Saddle 6


Note the usage of the ISEMPTY function in this query; if we had not filtered out those products that haven't been ordered the list would include every single product that has had no orders. The ISEMPTY function returns TRUE if a member has no value (has a value of NULL); specifying NOT ISEMPTY does the opposite - it advises MSAS to return only values that are not empty.

NON EMPTY



The NON EMPTY keyword supplies another way of filtering members that have NULL values (are empty). Empty cells occur because the intersection of two or more dimensions does not exist. The previous query could be rewritten like this:

SELECT NON EMPTY BOTTOMSUM( {[product].[product].members}, 10, [measures].[order count]) ON ROWS, {[measures].[order count]} ON COLUMNS

FROM [adventure works]


Results would be the same as before. Similarly we could filter promotions that have not resulted in any orders from the following query:

SELECT NON EMPTY {[promotion].[end date].members} ON ROWS,

{[measures].[order count]} ON COLUMNS

FROM [adventure works]


Results:

Order Count
All Promotions 31,455
30-Jun-02 24
31-Jul-02 36
31-Aug-02 61
31-Jul-03 30
30-Sep-03 149
30-Jun-04 2,831
31-Dec-04 30,901


If you attempt concatenating an empty cell with other cells MDX will treat the empty cell as an empty string. If you attempt adding / subtracting / multiplying a value of an empty cell with a numeric value MDX will treat the empty cell as a number zero. Empty cells can skew your calculations; for example, if you try calculating an average, MDX will divide the sum of all cells by the count of cells. When MDX sums the cells it adds zero for each empty cell but when it counts cells it increments the count of cells even if the empty cell is encountered. You need to decide whether you wish to exclude empty cells from the calculations depending on your business needs.



You can apply the NON EMPTY keyword to any set, including the output of the CROSSJOIN function. For example, the following query excludes empty members from promotions that occurred in 2004 and shows the number of orders generated for each product category by each promotion:

SELECT NON EMPTY CROSSJOIN( FILTER({[promotion].[end date].members}, [promotion].[end date].CurrentMember.Name <>'all promotions'), {[product].[category].members}) ON ROWS, {[measures].[order count]} ON COLUMNS FROM [adventure works]

WHERE ([date].[cy 2004])


Results:

Order Count
30-Jun-04 All Products 1,168
30-Jun-04 Accessories 509
30-Jun-04 Bikes 490
30-Jun-04 Clothing 276
30-Jun-04 Components 3
31-Dec-04 All Products 13,857
31-Dec-04 Accessories 10,950
31-Dec-04 Bikes 6,125
31-Dec-04 Clothing 4,918
31-Dec-04 Components 601


Note that the NON EMPTY keyword examines whether the tuple is empty, as opposed to checking whether the cell was empty. So it is possible to see empty values in the output even when using NON EMPTY. The only reliable test of whether the cell is empty is using the ISEMPTY function.



MDX provides another way of dealing with empty cells the COALESCEEMPTY function allows you to evaluate several values and return the first non-empty value. For example, the following query returns two columns the first one is the actual count of orders for a particular promotion; the second column replaces the empty counts with the number zero:

WITH member [measures].[non empty count] AS 'COALESCEEMPTY([measures].[order count],0)' SELECT {[promotion].[end date].members} ON ROWS, {[measures].[order count], [measures].[non empty count]} ON COLUMNS

FROM [adventure works]


Results:

Order Count non empty count
All Promotions 31,455 31,455
30-Jun-02 24 24
31-Jul-02 36 36
31-Aug-02 61 61
31-Jul-03 30 30
15-Aug-03 (null) 0
30-Aug-03 (null) 0
15-Sep-03 (null) 0
30-Sep-03 149 149
30-Jun-04 2,831 2831
31-Dec-04 30,901 30901


Alternatively we could examine multiple measures and supply the number zero if none of the examined measures have any values:

WITH member [measures].[non empty count] AS 'COALESCEEMPTY([measures].[order count], [measures].[reseller order count], [measures].[internet order count],0)' SELECT {[promotion].[end date].members} ON ROWS, {[measures].[order count], [measures].[non empty count]} ON COLUMNS

FROM [adventure works]


Note that the COALESCEEMPTY function examines values from left to right.

NONEMPTYCROSSJOIN



The NONEMPTYCROSSJOIN function is similar to CROSSJOIN but is considerably more efficient; this function also acts differently than CROSSJOIN preceded by the NON EMPTY keyword. NONEMPTYCROSSJOIN returns a cross product of two or more sets but it eliminates the empty tuples as well as tuples that have no associated fact table data. Since tuples that aren't based on fact tables are dropped, the NONEMPTYCROSSJOIN function does not work with calculated members. In addition to one or more sets this function also takes "set counter" as an optional argument. If the set counter is supplied then the function cross joins the specified number of sets; the rest of the sets provide context and are used to determine which members are considered non-empty in the resulting set. For example, the following query will return products that have had orders in calendar years 2001 and 2002:

SELECT NONEMPTYCROSSJOIN( {[date].[cy 2001]:[date].[cy 2002] }, FILTER({[product].[category].Members}, [product].[category].CurrentMember.Name<>'all products')) ON ROWS, {[measures].[order count]} ON COLUMNS

FROM [adventure works]


Results:

Order Count
CY 2001 Accessories 135
CY 2001 Bikes 1,358
CY 2001 Clothing 242
CY 2001 Components 205
CY 2002 Accessories 356
CY 2002 Bikes 3,527
CY 2002 Clothing 644
CY 2002 Components 702


Recall that the CROSSJOIN function only accepts 2 sets as parameters. Not so with NONEMPTYCROSSJOIN; you could alter this query slightly by adding another set to return the breakdown of order counts by product categories and promotions:

SELECT NONEMPTYCROSSJOIN( {[date].[cy 2003]:[date].[cy 2004]}, FILTER({[product].[category].members}, [product].[category].CurrentMember.Name<>'all products'), FILTER({[promotion].[end date].members}, [promotion].[end date].CurrentMember.Name<>'All Promotions')) ON ROWS, {[measures].[order count]} ON COLUMNS

FROM [adventure works]


Results:

Order Count
CY 2003 Accessories 31-Jul-03 30
CY 2003 Accessories 30-Jun-04 480
CY 2003 Accessories 31-Dec-04 8,079
CY 2003 Bikes 30-Sep-03 149
CY 2003 Bikes 30-Jun-04 588
CY 2003 Bikes 31-Dec-04 6,400
CY 2003 Clothing 30-Jun-04 372
CY 2003 Clothing 31-Dec-04 3,905
CY 2003 Components 30-Jun-04 39
CY 2003 Components 31-Dec-04 1,138
CY 2004 Accessories 30-Jun-04 509
CY 2004 Accessories 31-Dec-04 10,950
CY 2004 Bikes 30-Jun-04 490
CY 2004 Bikes 31-Dec-04 6,125
CY 2004 Clothing 30-Jun-04 276
CY 2004 Clothing 31-Dec-04 4,918
CY 2004 Components 30-Jun-04 3
CY 2004 Components 31-Dec-04 601


Notice that this output finds all intersections of date and product category dimensions and then cross joins the results with promotion end date. You could pass the crossjoin set counter to the NONEMPTYCROSSJOIN function to drop the first column and make the query more efficient:

SELECT NONEMPTYCROSSJOIN( FILTER({[product].[category].children}, [product].[category].CurrentMember.Name<>'all products'), FILTER({[promotion].[end date].members}, [promotion].[end date].CurrentMember.Name<>'All Promotions'), {[date].[cy 2003]:[date].[cy 2004]}, 2) ON ROWS, {[measures].[order count]} ON COLUMNS

FROM [adventure works]


Results:

Order Count
Accessories 31-Jul-03 30
Accessories 30-Jun-04 1,022
Accessories 31-Dec-04 19,505
Bikes 30-Sep-03 149
Bikes 30-Jun-04 1,306
Bikes 31-Dec-04 17,214
Clothing 30-Jun-04 824
Clothing 31-Dec-04 9,703
Components 30-Jun-04 57
Components 31-Dec-04 2,646


As a rule NONEMPTYCROSSJOIN performs better than the CROSSJOIN function preceded with NON EMPTY keyword. NONEMPTYCROSSJOIN has certain drawbacks, however, and is deprecated in Analysis Services 2005. MSAS 2005 introduces the new function NONEMPTY() that should be used in place of both NONEMPTYCROSSJOIN and NON EMPTY CROSSJOIN. Click here for more information on the NONEMPTY() function.