Archive for February, 2008

SQL Shorts

Friday, February 29th, 2008

The big news of the week was the HH{H} launch. A lot of marketing fluff. Not just from MS but from vendors trying to ride the wave. That said, I think the rest of the year is going to be really exciting technology wise.

There are some good webcasts at the virtual event site through the slick but slightly buggy silver light interface. Check it out.

The MS SQL site got a facelift. I suspect it will replace the original site soon. Props to ThePremiers for breaking it.

Dan reports the return of the tsql debugger, object search and the new activity monitor. He demo'd these features and the other manageability features at the launch event. 

SQL Server 2008 is setting records. Most impressively in loading 1TB+ in 30 minutes with SSIS. Read it on the SQLPerf site here and here, the SQLCAT site, Erin's blog, and Denis's blog.

Peter has the scoop about issues with SQL Server 2008 and leap year. Whoops :) Good thing they didn't launch today.

Linchi comments on the intel hexacore procs that should hit Q3-4.

Derek C drops knowledge on the SQL Server 2008 performance data collector.

This might be HAWTALTA to some but I consistently here people complaining about the speed of SQLWB. Jeff posts links to resources.

Bonnie posts the 2008 version of samples on codeplex. Direct dl here. Details here.

Check out this very detailed post on SNAC for the horse's mouth.

Andy has a no nonsense post on sparse columns.

I should have mentioned this last week. Simon did a great job in detailing full text in SQL Server 2008 in a series of posts. Start here.

 

Have a great weekend!

SQL 2008 exams

Thursday, February 28th, 2008

It sounds like the first tests for SQL and VS 2008 are going to be live in April.(Subject to change) You can also get a 40% discount by registering early. More here.

 

edit: The VS2008 tests are scheduled for April. SQL is scheduled for August.

Tom Brokaw opens 2008 launch

Wednesday, February 27th, 2008

Full story here.

"Technology alone is not the answer," Brokaw said. "It will do little good to wire the world if we short-circuit our conscience."

 

Parameterising Calculated Measure Definitions

Tuesday, February 26th, 2008

I'm currently holed up in my hotel room somewhere in Sweden facing up to the fact that I've barely started work on my presentation for SQLBits on Saturday. It's on the subject of using Analysis Services as a data source for Reporting Services, and as it happens one of the tips I'll be talking about came up at work today - handling parameterised calculated measure definitions - so I thought I'd blog about it too.

RS reports commonly use calculated measures in the WITH clause to get around that annoying problem that RS needs to know about its column names in advance. If you're creating a parameterised report which allows your users to choose which columns appear when it renders, you might be tempted to write your query something like this:

with member measures.test as
//this would actually be strtomember(@MyParameter) in the query
strtomember("[Measures].[Internet Sales Amount]")
select measures.test on 0,
topcount(
[Customer].[Customer].[Customer].members
*
[Date].[Calendar].[Calendar Quarter].members
, 100, measures.test) on 1
from [adventure works]

But performance isn't great here: the query executes in 23 seconds cold cache on my laptop. When you compare it with the un-parameterised version:

with member measures.test as
[Measures].[Internet Sales Amount]
select measures.test on 0,
topcount(
[Customer].[Customer].[Customer].members
*
[Date].[Calendar].[Calendar Quarter].members
, 100, measures.test) on 1
from [adventure works]

...which executes in a mete 2 seconds cold cache, then you'll be right in thinking we need to find a better approach. The culprit is of course the strtomember function; using any of the StrToX functions in a calculation is surefire way of killing query performance. Unfortunately because MDX parameters return strings (maybe we could get typed parameters in some future release?) we have to use strtomember to cast the uniquename string our parameter is returning to a member. The best way around this I've found is to create a named set in the With clause and parameterise that instead: it only gets evaluated once, and after that you can reference that set in your calculation so:

with
set mymeasure as {strtomember("[Measures].[Internet Sales Amount]")}
member measures.test as
mymeasure.item(0).item(0)
select measures.test on 0,
topcount(
[Customer].[Customer].[Customer].members
*
[Date].[Calendar].[Calendar Quarter].members
, 100, measures.test) on 1
from [adventure works]

This runs in a much more respectable 7 seconds on my laptop. But there's one more trick you can use though, our old friend non_empty_behavior. If we set it to the measure whose value we're actually displaying like this:

with
set mymeasure as {strtomember("[Measures].[Internet Sales Amount]")}
member measures.test as
mymeasure.item(0).item(0), non_empty_behavior= strtomember("[Measures].[Internet Sales Amount]")
select measures.test on 0,
topcount(
[Customer].[Customer].[Customer].members
*
[Date].[Calendar].[Calendar Quarter].members
, 100, measures.test) on 1
from [adventure works]

We're back down to 3 seconds on a cold cache.

Parameterising Calculated Measure Definitions

Tuesday, February 26th, 2008

I'm currently holed up in my hotel room somewhere in Sweden facing up to the fact that I've barely started work on my presentation for SQLBits on Saturday. It's on the subject of using Analysis Services as a data source for Reporting Services, and as it happens one of the tips I'll be talking about came up at work today - handling parameterised calculated measure definitions - so I thought I'd blog about it too.

RS reports commonly use calculated measures in the WITH clause to get around that annoying problem that RS needs to know about its column names in advance. If you're creating a parameterised report which allows your users to choose which columns appear when it renders, you might be tempted to write your query something like this:

with member measures.test as
//this would actually be strtomember(@MyParameter) in the query
strtomember("[Measures].[Internet Sales Amount]")
select measures.test on 0,
topcount(
[Customer].[Customer].[Customer].members
*
[Date].[Calendar].[Calendar Quarter].members
, 100, measures.test) on 1
from [adventure works]

But performance isn't great here: the query executes in 23 seconds cold cache on my laptop. When you compare it with the un-parameterised version:

with member measures.test as
[Measures].[Internet Sales Amount]
select measures.test on 0,
topcount(
[Customer].[Customer].[Customer].members
*
[Date].[Calendar].[Calendar Quarter].members
, 100, measures.test) on 1
from [adventure works]

...which executes in a mete 2 seconds cold cache, then you'll be right in thinking we need to find a better approach. The culprit is of course the strtomember function; using any of the StrToX functions in a calculation is surefire way of killing query performance. Unfortunately because MDX parameters return strings (maybe we could get typed parameters in some future release?) we have to use strtomember to cast the uniquename string our parameter is returning to a member. The best way around this I've found is to create a named set in the With clause and parameterise that instead: it only gets evaluated once, and after that you can reference that set in your calculation so:

with
set mymeasure as {strtomember("[Measures].[Internet Sales Amount]")}
member measures.test as
mymeasure.item(0).item(0)
select measures.test on 0,
topcount(
[Customer].[Customer].[Customer].members
*
[Date].[Calendar].[Calendar Quarter].members
, 100, measures.test) on 1
from [adventure works]

This runs in a much more respectable 7 seconds on my laptop. But there's one more trick you can use though, our old friend non_empty_behavior. If we set it to the measure whose value we're actually displaying like this:

with
set mymeasure as {strtomember("[Measures].[Internet Sales Amount]")}
member measures.test as
mymeasure.item(0).item(0), non_empty_behavior= strtomember("[Measures].[Internet Sales Amount]")
select measures.test on 0,
topcount(
[Customer].[Customer].[Customer].members
*
[Date].[Calendar].[Calendar Quarter].members
, 100, measures.test) on 1
from [adventure works]

We're back down to 3 seconds on a cold cache.

I am interested in migrating from Sybase ASE to SQL Server and I am not making progress with SQL Server Migration Assistant provided by Microsoft. Are there other tools that provide an a the ability to migrated Sybase ASE 12.5 To SQL Server 2005.

Sunday, February 24th, 2008

First of all, welcome to the light, we’ve been waiting for you. ;-) Just kidding…(sorta).

The first question you need to answer is what’s been causing you problems? Have you taken a look at Microsoft’s Guide to Migrating from Sybase ASE to SQL Server 2005 whitepaper? The planning phase is critical in ensuring your migration is successful. You should definitely do some due dilligence with regard to equivalent, nonsupported and emulated functions as well as unique datatype conversions, and syntax differences/disagreements to determine what’s going to create problems.

There is certainly no shortage of third-party applications to help you migrate your databases and applications, but this can be the perfect opportunity to gain some deep insight into your environment. Yes, that is a glass-half-full way of looking at it, but it also happens to be true. A third-party application can help, but if something during the migration goes south, you or the vendor’s support are still going to have to investigate your environment to determine what went wrong and why. Wouldn’t you rather know?

Personally, I’m a big fan of testimonials and the type of information you usually find in forums and Google groups. In an article entitled Migrating from Sybase to SQL Server Sayed Geneidy talks about his experience migrating to SQL Server 2000. Look at the sections on data compatibility mode and optimizer hints for some items that frequently cause people headaches during a migration.

Hope that helps!

I want to capture the text of every SQL statement executed against a given database. Isn’t there some reasonable way to do this without herculean effort?

Sunday, February 24th, 2008

You’re looking to run a SQL Server profiler trace. Performing a trace is Microsoft’s means of capturing every statement executed against an instance (or specified database(s)), but be aware that this level of information comes with a performance penalty inherent in capturing everything.

Now this is by no means the only way to get what you’re looking for. You could also write a script or a set of scripts leveraging a number of different T-SQL syntaxes/commands/functions to gather this information, although nothing guarantees you’ll see everything like profiler does. Here are three options:

  1. For SQL Server 2005 you can use the sys.dm_exec_requests DMV and CROSS APPLY the sys.dm_exec_sql_text function in a loop. The MSDN link for sys.dm_exec_sql_text includes a few sample scripts to illustrate how you can leverage that interaction.
  2. For SQL Server 2000 (and 2005 if you’d like) you can use the fn_get_sql function together with the sys.sysprocesses system view to see SQL Sytnaxes and to correlate them with current session information.
  3. Using DBCC INPUTBUFFER together with sysprocesses (or sys.dm_exec_requests) in a loop can provide similar information to fn_get_sql.

Note that both fn_get_sql and DBCC INPUTBUFFER are limited by the amount of text that can be returned, so don’t consider these an end-all-be-all solution for SQL monitoring.

If you don’t want to get into writing all the scripts you’ll need, or you don’t have the time and resources to create a process that will maintain the databases, tables, reports, and other related monitoring data there is certainly no shortage of third-party applications vying for your attention.  These products specialize in providing user-friendly, formatted, guided answers and advice to important questions like:

  • How long does that statement usually run?
  • Who usually runs that statement?
  • Has the plan for that statement changed? If so, when and why and who changed it?
  • …and the list goes on and on.

Once you start asking those questions, SQLServerPedia can certainly be your resource to get answers, and Quest has an arsenal of tools to help analyze, diagnose and resolve even your most complicated performance or management issues.  Hope to see you back here soon!

SQL Server 2008 filtered indexes in 5 minutes

Saturday, February 23rd, 2008

My jaw literally dropped when I saw it. If this works as advertised, it has the potential of changing everything. The days of over indexing will be over. Dynamic indexing off of the missing and unused index DMV's could be possible especially with added support. I also think this will better accomplish what people tried to do with partitioning for performance reasons in SQL Server 2005.

It will be really interesting to see how it gets applied in production and where this leads in the next versions. Maybe an autoindex checkbox will replace the DBA :)

This is a small and quick example. We will have to see how it scales to larger environments.

 

use adventureworks

--Let's nullify a random 400

update  Production.WorkOrder

set EndDate = null

where WorkOrderID in (select top 400 WorkOrderID from Production.WorkOrder where enddate is not null)


--Let's give a random 200 a recent date to mimic prod data

update top (200) Production.WorkOrder

set duedate = getdate()-10

where enddate is null

--Base query
--This is query type that should be simple yet common
--Let's get open WO's that will be due soon

set statistics io on

select p.Name, wo.OrderQty, wo.DueDate

from Production.WorkOrder wo JOIN Production.Product p on wo.ProductID=p.ProductID

where wo.EndDate is null and wo.DueDate >= '2007-11-24'

 --CI Scan with loop join

--Table 'WorkOrder'. Scan count 1, logical reads 530, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--Now let's create a covering index

create index ix01 on Production.WorkOrder(EndDate,DueDate, ProductID) include (OrderQty)

--Run the base query

set statistics io on
select
p.Name, wo.OrderQty, wo.DueDate
from
Production.WorkOrder wo JOIN Production.Product p on wo.ProductID=p.ProductID
where
wo.EndDate is null and wo.DueDate >= '2007-11-24'

--Does NCI seek

--Table 'WorkOrder'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

--Clean up

drop index [Production].[WorkOrder].[ix01]

 

--Create Filter index

create index ix01 on Production.WorkOrder(EndDate,DueDate, ProductID) include (OrderQty) where EndDate is null and DueDate >= '2007-11-24'

--Run base query

set statistics io on
select
p.Name, wo.OrderQty, wo.DueDate

from Production.WorkOrder wo JOIN Production.Product p on wo.ProductID=p.ProductID

where wo.EndDate is null and wo.DueDate >= '2007-11-24'

 

--It uses it. We have a real small sample but it performs better

--Table 'WorkOrder'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--Now let's look at size
--Create unfiltered index for comparison

create index ix02 on Production.WorkOrder(EndDate,DueDate, ProductID) include (OrderQty)

--note the new syntax

declare @dbid int = db_id()

declare @objid int = object_id('[Production].[WorkOrder]')

select * from sys.dm_db_index_physical_stats(@dbid, @objid, null, null, 'detailed') ps join sys.indexes i

on ps.object_id=i.object_id and ps.index_id=i.index_id

and i.name in ('ix01', 'ix02') and i.type_desc='NONCLUSTERED'

--We are looking at 2 page vs. 301 pages

That is a huge difference in size. The major point is not the fact that we retrieve the same while taking up so much less space on disk but so much less space in memory as well. We are going to get into this much more!

 

 Technorati Tags: ,,

Calculated members vs assignments to real members

Saturday, February 23rd, 2008

One design technique I see used quite frequently on my travels, usually in financial applications and usually by people who have experience with OLAP tools other than AS, is the creation of members on a dimension which have no real data associated with them in the fact table and whose value is later overwritten with an assignment in the MDX Script. Imagine the following scenario, for example: you have a chart of accounts dimension which has a member Total Sales, which represents the total sales of all of the operating units in your company; you also want another member, Sales Plus Tax, which is the Total Sales value multiplied by a constant representing the rate of tax. To implement this you could either create a calculated member on the Account dimension which did the calculation, or in your Account dimension table create an extra real member and then in the MDX Script use a scoped assignment to write the value of the calculation to that member.

Creating the calculated member is the simplest option, so why not do that? Well, the 'real member' option has several advantages, such as:

  • You have absolute control over where the member appears in the hierarchy, unlike with a calculated member
  • Real members can have children, calculated members can't
  • Real members can have member properties associated with them, calculated members can't
  • It's easier to manage all of your dimension members in one place, ie in the dimension table itself
  • It's easier to write scoped assignments which cover only real members; writing scoped assignments that cover multiple calculated members can be a pain (see my blog entry here for example).
  • Some client tools (ok, Excel 2007) have problems with displaying and selecting calculated members on non-measures dimensions. For another practical example of the 'real member' technique, see Marco Russo's blog entry on using it to create a time utility dimension here to get around exactly this issue.

Hmm, so that's great, are there any disadvantages to using it? I'm going to have to be vague here, unfortunately, because I don't understand what's going on inside AS to be able to say for sure, but in many cases where I've seen the 'real member' technique used it's been associated with poor query performance. Even where the calculation involved has been trivial and there has been no further aggregation of the resulting value involved, I've seen situations where queries are slow and Profiler has gone nuts. It certainly doesn't happen for every calculation and every query, but when it has and I've created a calculated member that contains the same calculation performance has been much better. I wonder what's going on here? Maybe someone from Microsoft can comment? As you can probably guess, I'm writing this while onsite with a customer experiencing exactly this problem and I'm hoping to provoke some discussion on this topic...

In the meantime, all I can say is that be wary of using the 'real member' technique and test to see if a calculated member performs better. And hopefully in Katmai+1 we'll see some of the limitations of calculated members addressed so the provide a stronger alternative.

Calculated members vs assignments to real members

Saturday, February 23rd, 2008

One design technique I see used quite frequently on my travels, usually in financial applications and usually by people who have experience with OLAP tools other than AS, is the creation of members on a dimension which have no real data associated with them in the fact table and whose value is later overwritten with an assignment in the MDX Script. Imagine the following scenario, for example: you have a chart of accounts dimension which has a member Total Sales, which represents the total sales of all of the operating units in your company; you also want another member, Sales Plus Tax, which is the Total Sales value multiplied by a constant representing the rate of tax. To implement this you could either create a calculated member on the Account dimension which did the calculation, or in your Account dimension table create an extra real member and then in the MDX Script use a scoped assignment to write the value of the calculation to that member.

Creating the calculated member is the simplest option, so why not do that? Well, the 'real member' option has several advantages, such as:

  • You have absolute control over where the member appears in the hierarchy, unlike with a calculated member
  • Real members can have children, calculated members can't
  • Real members can have member properties associated with them, calculated members can't
  • It's easier to manage all of your dimension members in one place, ie in the dimension table itself
  • It's easier to write scoped assignments which cover only real members; writing scoped assignments that cover multiple calculated members can be a pain (see my blog entry here for example).
  • Some client tools (ok, Excel 2007) have problems with displaying and selecting calculated members on non-measures dimensions. For another practical example of the 'real member' technique, see Marco Russo's blog entry on using it to create a time utility dimension here to get around exactly this issue.

Hmm, so that's great, are there any disadvantages to using it? I'm going to have to be vague here, unfortunately, because I don't understand what's going on inside AS to be able to say for sure, but in many cases where I've seen the 'real member' technique used it's been associated with poor query performance. Even where the calculation involved has been trivial and there has been no further aggregation of the resulting value involved, I've seen situations where queries are slow and Profiler has gone nuts. It certainly doesn't happen for every calculation and every query, but when it has and I've created a calculated member that contains the same calculation performance has been much better. I wonder what's going on here? Maybe someone from Microsoft can comment? As you can probably guess, I'm writing this while onsite with a customer experiencing exactly this problem and I'm hoping to provoke some discussion on this topic...

In the meantime, all I can say is that be wary of using the 'real member' technique and test to see if a calculated member performs better. And hopefully in Katmai+1 we'll see some of the limitations of calculated members addressed so the provide a stronger alternative.


Fatal error: Call to undefined function SEO_pager() in /home/ssp/webapps/htdocs/blog/wp-content/themes/newblogcity/archive.php on line 47