Archive for May, 2008

TechEd! Let’s do this.

Friday, May 30th, 2008

I am pumped up about TechEd. Bill Gates's last technical keynote. He has to have a bomb to drop... The launch of Windows\VS\SQL Server 2008. All of the speakers are reaching deep into their bag of tricks and bringing new hardcore sessions. The universal party. The blogger's lounge. Twitter, FriendFeed, MSDN, and Technet are all buzzing with posts. 

I found out a couple of days ago that a speaker would not be able to make it due to a family emergency. I offered to present his session on DMV's so I will be presenting these sessions:

 

Using Dynamic Management Views to Improve Your Development

Dynamic Management Views were added to SQL Server 2005 and have been enhanced in SQL Server 2008. While they provide great functionality and usefulness, it appears they are not widely understood or implemented as yet. This session provides an overview of the Dynamic Management Objects available (both Views and Functions) and describes those considered the most useful. It shows how they can be utilized in monitoring, troubleshooting, and ongoing management of SQL Server systems and in gaining an understanding of SQL Server’s operation. The session also illustrates how they can be used to improve the database development experience and how they can be used in custom reports within SQL Server Management Studio.

Database Platform
300
BRK
Jason Massie

 

Windows Management Instrumentation (WMI) for the Command Line DBA

This is a demo filled session on how a command line DBA can unleash the power of WMI. WMI allows the DBA access to information that was previously hard to get or unavailable like the application event logs, OS perfmon counters, and OS events. This session covers WQL queries from SSIS, accessing OS performance counters from SQL, automated reactions to WMI events, WMI from Windows PowerShell and much more.

Database Platform
400
BRK
Jason Massie

 

Troubleshooting Query Plans Gone Wrong

Have you ever flushed the procedure cache to resolve a performance problem? Have you ever had to use a join or index hint when you should not have to? Have you ever updated statistics to fix a performance problem and wondered why it worked? In this session, we get deep into how the query optimizer decides how to execute a SQL statement. We look at common problems that cause the optimizer to choose the wrong access paths like underestimation. We go over a detailed demo on parameter sniffing, problems with local variables and statistics. We also look at solutions to these problems in several areas including good design at the application, schema, and query levels, proactive maintenance and reactive fixes. This session applies to Microsoft SQL Server 2000 through 2008 but we pay special attention to new features in SQL Server 2008 like plan freezing, new hints, filter indexes, and statistics among others.

Database Platform
400
BRK
Jason Massie

I'll be there Monday through Saturday. Hope to see you there!

IBM Cubing Services and MDX support

Friday, May 30th, 2008

Via Amyn Rajan of Simba Technologies, I see that IBM's Cubing Services OLAP tool now supports OLEDB for OLAP:
http://blogs.simba.com/simba_technologies_ceo_co/2008/05/ibm-cubing-serv.html

Another boost for the MDX language then! It's the lure of Excel compatibility that is driving all this of course, but it also opens the way for other MDX-friendly client tools as well.

Meanwhile MDX support is also proceeding in other, less direct ways. I mentioned before here that I hoped some of the new breed of data warehouse appliance vendors would start to support MDX as well and in a way it's starting to happen: Vertica and Pentaho announced a partnership a few months back and I've already heard of one instance of a company using Mondrian on top of Vertica. Hopefully other vendors will begin to realise that raw query performance is not much use unless matched with a language that allows you to easily express the queries and calculations you need.

IBM Cubing Services and MDX support

Friday, May 30th, 2008

Via Amyn Rajan of Simba Technologies, I see that IBM's Cubing Services OLAP tool now supports OLEDB for OLAP:
http://blogs.simba.com/simba_technologies_ceo_co/2008/05/ibm-cubing-serv.html

Another boost for the MDX language then! It's the lure of Excel compatibility that is driving all this of course, but it also opens the way for other MDX-friendly client tools as well.

Meanwhile MDX support is also proceeding in other, less direct ways. I mentioned before here that I hoped some of the new breed of data warehouse appliance vendors would start to support MDX as well and in a way it's starting to happen: Vertica and Pentaho announced a partnership a few months back and I've already heard of one instance of a company using Mondrian on top of Vertica. Hopefully other vendors will begin to realise that raw query performance is not much use unless matched with a language that allows you to easily express the queries and calculations you need.

Clustered index impacts on writes vs reads

Thursday, May 29th, 2008

After today’s webcast, “What a Cluster! Clustered Indexes” we got an email from David Holt at Kaye Scholer LLP. He writes:

I just attended the webcast “What A Cluster! Clustered Indexes”; thanks for the info!

Clustered indexes obviously impact writes as well as reads, and there can be a conflict between the needs of the two; i.e., if a lot of different people are writing to the same table then you don’t want them to block each other, yet you want simultaneous queries (selects) on that table to be efficient. That seems to argue that the clustered index should make writes as efficient as possible, even if it’s not the most effective clustered index for reads. Would you agree?

Uh oh – you know it’s dangerous when someone from a law firm asks if you would agree, and it’s even more dangerous when that answer is no. I’ll take my life into my own hands and answer anyway.

When designing an index strategy for a table, it helps to know the read/write mix and any time ranges for each. I can illustrate with two examples of opposite ends of the spectrum.

Let’s say our company has a web filtering database that logs all employee web activity to a database table, including the user info, date/time, and the URL they were surfing. Every now and then, they’ll run reports to see what kinds of web sites the users are hitting, and sometimes they’ll audit the web history of a particular individual.

Example #1: The Web History Table

The web activity table would be extremely write-intensive, since we have a lot of company employees who sit around surfing the web all day long. (Like you right now, come to think of it – get back to work!) A write performance reduction as small as 10% might have a significant impact on our ability to keep up with our surf-happy employees. Plus, we store the data on a SATA RAID5 array with slow writes to begin with.In a case like this, we could make an argument for a clustered key using an identity field or using a datestamp field. That would keep write speeds quick at the cost of making reports slower. However, at report time, these users don’t really care how long the report takes to run – they just want to see it sooner or later.

Example #2: The IP Subnet Table

Our web filtering application would store a list of IP address subnets – network TCP/IP address ranges that identify workstations on our network. We would not do very frequent inserts into this table, because our network isn’t expanding like crazy.

When we query this table, the queries use a lot of range queries because they want to group users together by location. The IP address subnets are in similar ranges. The fields on this table include a lot of things that we want to report on, like geographic location and company department.

This table could have a completely different primary key design because it’d have different design objectives: faster queries instead of faster inserts.

Data warehouses are another great example of this kind of design: nightly loads happen during one window (nighttime), and end users do queries during a different window (daytime). Depending on our loads, we might design our primary keys differently. If we have plenty of time at night to load our files, and our users want faster responses on their daytime queries, then our primary keys wouldn’t be designed for fast inserts.

There’s yet another consideration in this design: the number of other indexes on the table. If a table has ten wide covering indexes, then the arrangement of the primary key may not make as much of a difference as you’d think, because there’s so much overhead in maintaining the indexes.

Bottom Line: Measure Every Change

This is why I drove home the Measure, Change, Measure, Revert point in the presentation: the only way to find out for sure is to get a complete picture of your query loads, design a replayable test (like a series of queries) and keep measuring the results.  If possible, replicate the production system as closely as possible in development, because the storage systems will affect your results.

Quoted Identifiers

Thursday, May 29th, 2008

Recently we had some issues with stored procedures having quoted identifiers set differently.  We wanted to run a query to get a listing of any other procedures having quoted identifiers set on or off.

With some help from people at work we found two queries using the DMV's.

select

* from sys.sql_modules

where uses_quoted_identifier <> 1

and

Select

name

From
sys.all_objects

where ObjectPropertyEx ( Object_ID , 'ExecIsQuotedIdentOn' ) = 0 and type = 'p'

How can I load image data into a SQL Server image datatype?

Thursday, May 29th, 2008

There are a number of articles (like this one) explaining how to do this for SQL Server 2000, but it’s so much easier in SQL Server 2005 that I’ll post the 2005-compatible code here. You should read this Microsoft Research article to get the Microsoft best practices justification for storing various file sizes in the database versus on the filesystem. Note, the following code sample is extremely simplistic and various iterations of similar logic are available from Microsoft and others.

if exists(select * from sysobjects where type=N'U' and name=N'picTable')
drop table dbo.picTable;
go
create table dbo.picTable(FileName sysname, Document image)
go
if exists(select * from sysobjects where type=N'P' and name=N'spLoadImagesFromLocalFolder')
drop procedure dbo.spLoadImagesFromLocalFolder;
go
create procedure [dbo].[spLoadImagesFromLocalFolder](@directory sysname,@fileextention nvarchar(10))
as
begin
declare @cmd varchar(8000),@fname nvarchar(200),@sqlstmt nvarchar(500),@ParmDefinition nvarchar(500),@cnt int

if object_id(‘tempdb..#filestoload’) is not null
begin
drop table #filestoload
end
create table #filestoload ( RowNum int identity, fname varchar(8000) null)

SET @cmd = ‘dir ‘ + QUOTENAME(@directory,’”‘) + ‘ /T:W /A:-D /b’
insert into #filestoload ([fname])
exec [master].[dbo].[xp_cmdshell] @cmd
/* print @cmd */
delete from #filestoload where fname is null or lower(fname) not like N’%.’+lower(@fileextention)
select @fname = ”

select @cnt=count(*) from #filestoload
set @ParmDefinition = N’@fname nvarchar(200)’;
while @cnt>0
begin
select @fname = fname from #filestoload where RowNum > @cnt
set @sqlstmt=N’INSERT INTO picTable(FileName, Document) SELECT ”’+@fname+N”’ AS FileName,* FROM OPENROWSET(BULK N”’+@directory+@fname+N”’, SINGLE_BLOB) AS Document’
/* print @sqlstmt */
exec sp_executesql @sqlstmt,@ParmDefinition,@fname=@fname
select @cnt=@cnt-1;
end
drop table #filestoload;
end
go
exec [dbo].[spLoadImagesFromLocalFolder] N’C:\’,N’JPG’

Hope that helps!

Does SQL Server support PCI Compliance Standards features including periodic changing of keys, destruction of old keys, split knowledge and establishment of dual control of keys, and prevention of unauthorized substitution of keys?

Thursday, May 29th, 2008

SQL Server 2008 has the encryption key management feature’s you’re looking for…

SQL Server 2008 Extensible Key Management enables the encryption keys that protect the database files to be stored in an off-box device such as a smartcard, USB device, or EKM/HSM module. This also enables data protection from database administrators (except members of the sysadmin group). Data can be encrypted by using encryption keys that only the database user has access to on the external EKM/HSM module.

Extensible Key Management also provides the following benefits:

* Additional authorization check (enabling separation of duties).
* Higher performance for hardware-based encryption/decryption.
* External encryption key generation.
* External encryption key storage (physical separation of data and keys).
* Encryption key retrieval.
* External encryption key retention (enables encryption key rotation).
* Easier encryption key recovery.
* Manageable encryption key distribution.
* Secure encryption key disposal..

See the full article Understanding Extensible Key Management for details.

What happened to Notification Services in SQL Server 2008?

Thursday, May 29th, 2008

The quick answer is that it’s been dropped. Microsoft posted the following in the 2008 Read me documentation:

5.1 SQL Server Notification Services Removed from SQL Server 2008

Notification Services is not a component of SQL Server 2008, but will continue to be supported as part of the SQL Server 2005 product support cycle. In future releases, support for key notification scenarios will be incorporated into Reporting Services.

Clustered Indexes & SQL 2005’s Performance Dashboard

Thursday, May 29th, 2008

In today’s installment of the Quest Pain of the Week webcasts, Jason Hall and I will be discussing a reader’s question about finding missing clustered indexes, their impact on performance, and how to be proactive about monitoring performance to find those types of design issues.

MSSQLTips.com has a great post called “Finding primary keys and missing primary keys in SQL Server.”  That post includes two snippets of code, one for each of the popular SQL versions:

Finding missing primary keys in SQL 2005:

SELECT c.name, b.name

FROM sys.tables b

INNER JOIN sys.schemas c ON b.schema_id = c.schema_id

WHERE b.type = 'U'

AND NOT EXISTS

(SELECT a.name

FROM sys.key_constraints a

WHERE a.parent_object_id = b.OBJECT_ID

AND a.schema_id = c.schema_id

AND a.type = 'PK' )

Finding missing primary keys in SQL 2000:

SELECT c.name, a.name

FROM sysobjects a

INNER JOIN sysusers c ON a.uid = c.uid

WHERE xtype = 'U'

AND NOT EXISTS

(SELECT b.name

FROM sysindexes b

WHERE a.id = b.id

AND (b.status & 2048)<>0)

You can read their post for similar queries on finding tables WITH primary keys too.

Also in today’s webcast, I’ll be showing a few screenshots from Microsoft’s SQL Server Performance Dashboard Reports.  They’re a free set of SSRS reports that are integrated into SQL Server Management Studio.  The reports I’ll be showing in the webcast are somewhat limited in that they only show data that’s currently in the plan cache.  If your server is under memory pressure, it may not retain this data for too long, and you would have to revisit these reports pretty often to get a good picture of what’s going on under the hood.

For more information about the dashboard and how to use it, check out Kevin Kline’s Tool Time article “SQL Server 2005 Dashboard Reports” in SQL Server Magazine’s August 2007 issue.

LINQ to MDX

Tuesday, May 27th, 2008

Seeing that Marco Russo has released his book "Programming Microsoft LINQ" reminded me of a conversation I had with him a while ago about something I've heard various people ask about over the last year - will there be a LINQ to MDX?

Before we go on, I should state that it's my opinion that there isn't a big enough market out there for anyone (Microsoft or a third party) to justify spending time developing LINQ to MDX. That's not to say that I wouldn't want to see it - I would - just that I doubt anyone much would use it. MDX remains too much of a niche language, and off-the-shelf tools work well most of the time so there's less need to write custom MDX-generation code. As far as I know Microsoft isn't planning on developing LINQ to MDX and I'd be surprised if it ever did, so this will remain a theoretical discussion.

But for the sake of argument if you were to implement LINQ to MDX the main problem you'd have to tackle would be the same one you have with using MDX in Reporting Services and Integration Services: MDX can't guarantee fixed column names for any given query. However I had an idea on how to avoid this, and that is to think in terms of LINQ to MDX sets rather than LINQ to MDX queries. So for example if you take the following SQL query on a dimension table:

Select Year, Quarter, Month
From TimeDim
Where Year=2008 and (Month=March or Month=April)

that would then translate easily into the following MDX set expression:

{([TimeDim].[Year].[2008], [TimeDim].[Quarter].[Q3], [TimeDim].[Month].[March]), ([TimeDim].[Year].[2008], [TimeDim].[Quarter].[Q3], [TimeDim].[Month].[April])}

MDX sets have to be made up of tuples containing the same dimensionality, and if you think of a set's dimensionality in terms of columns in a SQL SELECT statement then you can see how that might map onto LINQ concepts. Instead of using LINQ to create an MDX SELECT statement directly, you'd use LINQ to create MDX sets and then pass all of these sets into another function which would then run the query using the sets created as axes.

Since I'm no LINQ expert this was the point where I dropped a mail to Marco to ask him his opinion; he thought it was feasible and even came up with an idea of what the code might look like in C#:

var timeSet = from period in cubeBudget.TimeDim.Members
            where period.Year == 2008
                    && (period.Level == TimeDim.Level.Year
                        || period.Month == "March"
                        || period.Month == "April" )
              select period;

Var measures = from measure in cubeBudget.Measures
               Where new string[] { "Sales", "Quantity", "Price" }.Contains( measure.Name )
               select measure;

Var query = from measure in measures
            from period in timeSet
            select new { period.Name, measure.Name, measure.Value };

var cellset = from cell in cubeBudget
              where cell.Columns( measures )
                    && cell.Rows( timeSet )
              select cell;

In the code above, timeset would define the set of members on the Period dimension you wanted to use and measures would return the set of measures. Then you could use them in two ways: query would return a flattened rowset and cellset would return a cellset. But this all seems very convoluted and probably just as confusing to the average developer as raw MDX.

Another alternative approach would ignore MDX altogether and query Analysis Services using SQL directly (see http://cwebbbi.spaces.live.com/blog/cns!7B84B0F2C239489A!751.entry for more details on this topic), although I'm sure you'd run into the limitations of the SQL that is supported very quickly and in any case you lose all of the flexibility and functionality of the MDX language when you do this.

Maybe if we're looking for a way to programmatically generate MDX then LINQ isn't the way to do it. It's something olap4j is working towards and they have taken an approach that is much more in tune with the multidimensional nature of MDX. One of the 'open issues' that caught my eye in the section of the olap4j spec that deals with this functionality is the question "Is this API at the right level, or is it too close to MDX?", meaning I guess that it would be all too easy to come up with an interface that is just as complex as MDX itself. Where would you draw the line between ease-of-use and functionality? Is it the MDX language that is confusing for people, or the multidimensional concepts (concepts that any interface would have to reflect) that underpin it? Can you abstract MDX to an interface to make it easier to use? I wish I knew more Java so I could test drive olap4j - is there anyone reading this who is using it? As always, I'd be interested to hear anyone's thoughts on this matter so please leave some comments...