Archive for February, 2010

How To Get a Job as a DBA

Sunday, February 28th, 2010

With my upcoming book due to be released this month, I have been getting more and more emails sent my way regarding a common question “how do I get a job as a DBA?” I started to build a new blog post and I was surprised to find that I have already written quite a bit about the topic. So, I decided to build a page that contains all of the stuff I have written previously that pertains to landing a job as a DBA.

How To Get a Job as a DBA

There you will find a handful of resources that I consider to be quite vital in your job search. I reference other blogs, job websites, as well as some helpful books. In time I hope to build it out a bit more, but for now it should suffice for most anyone that is looking to find out more about a job in database administration.

Why I blog

Sunday, February 28th, 2010

I enjoy blogging because it allows me to do the following:

  1. Research & Experiment- I find that I am oftentimes testing the limits of SQL Server, or an idea, or something that someone else says or writes about.
  2. Learn and then share ideas – I am always running into something, whether it’s a new way of doing something or finding a solution to an error. Learning and then publishing my findings is enjoyable for me.
  3. Take notes – I can’t remember the number of times where I actually went back to my blog to find the answer to something…I forget just like the other guy and need a reference myself.
  4. Have fun – If you have read any of my posts, you know that I keep it on the light-hearted side of things, and many of my blogs contain humor.

Blogging is for sure a lot of fun. I certainly enjoy learning, and then passing my findings along to others.  My mentor Don Bishop used to tell me to share everything and keep nothing to myself; back in 1998 I never knew why, but I do now. Thanks, Don.

From my adventures in blogging SQL Server topics, I'm finding that I do Research & Experimentation more than anything, whether it's a proof-of-concept or some wacked-out idea or premonition, and I make this conclusion because most of my posts seem to lean towards this category.  I should change my monker from "Sharing tips and tricks so that you might avoid the beatings that I've endured" to "Research & Experimentation, Sharing Ideas, and Fun".  There are many outlets for training these days - SQLServerpedia, MSDN, ASP.net, SQL Share , SQLServerCentral, and others - but few do the Research and Experimentation as I do on my site.

What is "Research and Experimentation" you ask?  For me it's simply doing something with our product that hasn't been done before, or taking an idea and moving it beyond a "Hello World" example. And I am also learning that this category many times leads to dead-ends, e.g. where nothing really comes out of idea.  I have probably a half-dozen topics right now that I am going to blog about that led to little or nothing, other than a conclusion, which might be along the lines of "don't do this", or "this was a bad idea", or simply "nothing came from this". Being creative is a refreshing outlet for me, but my plentiful ideas harvest little more often than not.

Going forward I'll probably throw in a little caveat that "hey this is an experiment" in some of my research blog postings, just to make sure that you know from what perspective the idea should be viewed.  I think of this now because I am going to put out a really cool post that perfectly fits this category, one that I spent quite a bit of time and effort on, that really went nowhere.  More importantly, though, I mention this now because a colleague of mine and good friend who assisted me on it - myself frustrated on the outcome - reminded me of the good that comes from these 'dead end' ideas.

As always, thanks for visiting.

Lee Everest

 

blogbanner     face

SSRS NOT LIKE expression

Saturday, February 27th, 2010
I had the recent request to provide the ability to filter data in a report that was not like specific pattern.  This requirement is not uncommon or difficult when placed on the dataset using a parameter:

SELECT FirstName
FROM Person.Person
WHERE FirstName NOT LIKE @filter
The difficulty was that the report was being presented from a snapshot so the filter needed to be placed in the form of an expression.  Again, this does not seem overly difficult except that there is no NOT LIKE comparison operator in SSRS, but fortunately we can embed VB.NET in reports. 

In this example I have a single report using my local instance of SQL 2008 and the AdventureWorks2008 as the data source and a dataset populated using the below query:

SELECT FirstName
FROM Person.Person

From the Report menu select Report Properties and enter the below VB.NET code in the Custom Code box:
   
Public Function NotLike(ByVal val As String, ByVal filter As String)As Boolean
If val.Contains(filter) Then
   Return False
Else
   Return True
End If
End Function

So you should end up with something like this

Now let's get an idea of how this works by displaying the results in a table side by side with the FirstName column from our dataset.

While in the Design tab of BIDS drag and drop a table from the toolbox on the design work surface.  In the first details field of the table drag and drop your FirstName column from your dataset from the Report Data tab.  Right click the second details field and choose Expression and enter the following expression:

=code.NotLike(Fields!FirstName.Value, "ld")


Now preview the report and take a look at what our embedded code is providing:


We can see that all first names that do not contain "ld" returns True, so they are NOT LIKE the filter text we entered, while names that do contain the filter, like Donald, returns False. 

We need to create a parameter that will hold the user provided provided  filter value.  Go back to the Design tab and from the Report Data tab right click the Parameters folder and choose Add New Parameter.  The parameter will be called Comparison and the prompt will be Comparison string and will be a text data type:


From the Report Data tab right click the dataset and select Dataset Properties:

Go to the Filters tab and add a filter.  In the Expression open the expression builder and enter the below:


=code.NotLike(Fields!FirstName.Value, Parameters!Comparison.Value)
In the Value text box enter =True:


Now click the preview tab again and in the Comparison string parameter prompt enter ld and click View Report.  The data set is now filtered on the expression utilizing our embedded code only where a value of True is returned representing values that are not like the provided string comparison.

This is a rather simplistic example of utilizing embedded code to filter a report, but should provide a good starting point.  If you are interested in reviewing more in depth information on embedded code within SSRS or creating and using custom code references in expressions then please take a look at the documentation on MSDN:



SQL SERVER – INSERT TOP (N) INTO Table – Using Top with INSERT

Friday, February 26th, 2010

During my recent training at one of the clients, I was asked regarding the enhancement in TOP clause. When I demonstrated my script regarding how TOP works along with INSERT, one of the attendees suggested that I should also write about this script on my blog. Let me share this with all of you and do let me know what you think about this.

Note that there are two different techniques to limit the insertion of rows into the table.

Method 1:

INSERT INTO TABLE …
SELECT TOP (N) Cols…
FROM Table1

Method 2:

INSERT TOP(N) INTO TABLE …
SELECT Cols…
FROM Table1

Today, we will go over the second method, which in fact is the enhancement in TOP clause along with INSERT. It is very interesting to also observe the difference between both the methods. Let us take one real example and understand what exactly happens in either case.

Method 1:

INSERT INTO TABLE …
SELECT TOP (N) Cols…
FROM Table1

Method 2:

INSERT TOP(N) INTO TABLE …
SELECT Cols…
FROM Table1

Today we will go over the second method which in fact is the enhancement in TOP along with INSERT. It is very interesting to also observe difference between both the methods. Let us play with one real example and we understand what exactly is happening in either of the case.

USE tempdb

GO

-- Create Table

IF EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N'TestValue') AND type IN (N'U'))

DROP TABLE TestValue

GO

CREATE TABLE TestValue(ID INT)

INSERT INTO TestValue (ID)

SELECT 1

UNION ALL

SELECT 2

UNION ALL

SELECT 3

UNION ALL

SELECT 4

UNION ALL

SELECT 5

GO

-- Select Data from Table

SELECT *

FROM TestValue

GO

-- Create Two Table where Data will be Inserted

IF EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N'InsertTestValue') AND type IN (N'U'))

DROP TABLE InsertTestValue

GO

CREATE TABLE InsertTestValue (ID INT)

IF EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N'InsertTestValue1') AND type IN (N'U'))

DROP TABLE InsertTestValue1

GO

CREATE TABLE InsertTestValue1 (ID INT)

GO

-- Option 1: Top with Select

INSERT INTO InsertTestValue (ID)

SELECT TOP (2) ID

FROM TestValue

ORDER BY ID DESC

GO

-- Option 2: Top with Insert

INSERT TOP (2) INTO InsertTestValue1 (ID)

SELECT ID

FROM TestValue

ORDER BY ID DESC

GO

-- Check the Data

SELECT *

FROM InsertTestValue

GO

SELECT *

FROM InsertTestValue1

GO

-- Clean up

DROP TABLE InsertTestValue

DROP TABLE InsertTestValue1

DROP TABLE TestValue

GO

Now let us check the result of above SELECT statements.

Insert alogn with Top (N)
Insert alogn with Top (N)

It is very interesting to see when Option 2 is used, ORDER BY is absolutely ignored and data is inserted in any order.

In future articles, we will talk about performance for these queries. What are your thoughts on this feature? Have you used INSERT TOP(N) in your application?

Reference: Pinal Dave (http://blog.SQLAuthority.com)


Filed under: Database, Pinal Dave, SQL, SQL Authority, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

A Rhode Island Yankee In Queen Elizabeth’s Rink

Friday, February 26th, 2010
I just found out via Facebook (thanks Jason Strate and Tim Mitchell) that Aaron Bertrand (web|twitter) took a very expensive route to make it onto TV and into the 2010 Winter Olympics.  Here are the details:  http://lebertrand.wordpress.com/2010/02/25/312/ Congratulations Nicole and Aaron!

Help with Learning Powershell

Friday, February 26th, 2010

If you’re not reading Buck Woody’s blog, why not? Today he posted a helpful hint for getting performance counters directly out of PowerShell v2. I’ll add a little bit to the hint, don’t try running this on your XP boxes. It doesn’t hurt anything, but you get a helpful little message “Get-Counter : This cmdlet can only run on Vista and above.”


MacGyver Moments

Thursday, February 25th, 2010

Aaron Bertrand and Denny Cherry double-teamed me in another round of Web 2.0 Chainposting, and you know I never dare break the chain. I spent a few days thinking about what MacGyver moments I have had that I could share. And to be honest, very few came to mind. Apparently I am not very imaginative, or I have been fortunate enough to always have all the resources I need at my disposal.

Technically the rules for this chainpost do not call for me to recall a MacGyver moment only from my IT history. As such, many of my moments come from coaching. I would gladly recount for you the time we came from seven points down with two minutes to play, quite possibly my best coaching moment ever, but I doubt you want to read about that. I could tell you about the fantasy football website I built in ASP.NET on top of MS Access ‘97 and the late nights I put in trying to figure out a way to enforce the league rules by building various arrays, but that doesn’t seem too exciting either. And there are the handful of times I have coded a solution that helped streamline one thing or another, but it really doesn’t strike me as “MacGyver-esque”.

So, I decided to think about someone who is like MacGyver in real life. Someone who constantly makes something out of nothing and is able to get himself out of seemingly any situation. Watch this video and you’ll agree how nothing we do in IT can compare to what a real MacGyver would do in real life while thinking of England.

I’m tagging Kevin Kline, and only Kevin Kline, because I don’t want to hear him whine anymore.

Spotlight on UNPIVOT, Part 2

Thursday, February 25th, 2010
UNPIVOT Those Columns!In my last blog entry, we looked at the UNPIVOT command and how it works in unpivoting columns into rows.

We learned that T-SQL simply translates the UNPIVOT operator into an OUTER APPLY, coupled with a filter to get rid of NULL values. With that knowledge, we were able to unpivot data ourselves using an APPLY operator directly. This gave us more flexibility… We could include NULLs if we wanted to and we could very easily unpivot multiple sets of columns.

In this entry, we’ll go over some scenarios for when unpivoting data is helpful (more likely essential), and we’ll present many examples.

Before we begin, though, I have to say one thing. Most of these examples employ unpivoting because the tables are horribly designed and not normalized. I urge you, in designing your tables, to not emulate some of these table structures I am going to present.

Aggregations Over Columns

Let’s say you’re a school teacher and you have given 12 tests (exams) to your students over the course of the semester. Each test score is a percentage value from 0 to 100. And here’s the interesting part… you gave your students the option to skip any test of their choice during the semester, so one of the 12 tests might have a NULL value. Some students might take advantage of that… others (the teacher’s pets no doubt) will take every exam.

Here’s the (poorly-designed) table:

if object_id('tempdb..#Scores','U') is not null drop table #Scores
go
create table #Scores
(
   StudentID int primary key
  ,Test1 int
  ,Test2 int
  ,Test3 int
  ,Test4 int
  ,Test5 int
  ,Test6 int
  ,Test7 int
  ,Test8 int
  ,Test9 int
  ,Test10 int
  ,Test11 int
  ,Test12 int
)
go
insert #Scores select 1,  96,  87,null, 100,  99,  91,  73,  90,  90,  91,  93,  92
     union all select 2,  83,  84,  89,  79,  88,null,  93,  82,  85,  86,  88,  82
     union all select 3, 100, 100,  98,  91, 100,  95,  96,  96,  99, 100,  98, 100
 
select * from #Scores
/*
StudentID Test1 Test2 Test3 Test4 Test5 Test6 Test7 Test8 Test9 Test10 Test11 Test12
--------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ------ ------ ------
        1    96    87  NULL   100    99    91    73    90    90     91     93     92
        2    83    84    89    79    88  NULL    93    82    85     86     88     82
        3   100   100    98    91   100    95    96    96    99    100     98    100
*/    
 
Your job is to get an average test score for each student. Well, that seems easy enough. We just add the scores together and divide by 12, right?

Oh wait…

There’s that darn NULL value we have to deal with… some students might have it, and some may not. And we don’t know which of the 12 test scores have the NULL value.

So we have to come up with a really awful query like this:

select StudentID
      ,AverageScore=( coalesce(Test1,0)+coalesce(Test2,0)+coalesce(Test3,0)
                     +coalesce(Test4,0)+coalesce(Test5,0)+coalesce(Test6,0)
                     +coalesce(Test7,0)+coalesce(Test8,0)+coalesce(Test9,0)
                     +coalesce(Test10,0)+coalesce(Test11,0)+coalesce(Test12,0) )
                    * 1.0 /                                                           
                    ( case when Test1 is not null then 1 else 0 end
                     +case when Test2 is not null then 1 else 0 end
                     +case when Test3 is not null then 1 else 0 end
                     +case when Test4 is not null then 1 else 0 end
                     +case when Test5 is not null then 1 else 0 end
                     +case when Test6 is not null then 1 else 0 end
                     +case when Test7 is not null then 1 else 0 end
                     +case when Test8 is not null then 1 else 0 end
                     +case when Test9 is not null then 1 else 0 end
                     +case when Test10 is not null then 1 else 0 end
                     +case when Test11 is not null then 1 else 0 end
                     +case when Test12 is not null then 1 else 0 end )
from #Scores
/*
StudentID    AverageScore
--------- ---------------
        1 91.090909090909
        2 85.363636363636
        3 97.750000000000
*/
You have to admit… That is really ugly.

But it gets uglier…

Even worse, you have to find the minimum and maximum test score for each student. Take a look at this monster (try not to laugh… I dare you), which finds the maximum score:

select StudentID
      ,MaxScore=case 
                  when Test1>=coalesce(Test2,0) and Test1>=coalesce(Test3,0) 
                   and Test1>=coalesce(Test4,0) and Test1>=coalesce(Test5,0)
                   and Test1>=coalesce(Test6,0) and Test1>=coalesce(Test7,0) 
                   and Test1>=coalesce(Test8,0) and Test1>=coalesce(Test9,0) 
                   and Test1>=coalesce(Test10,0) and Test1>=coalesce(Test11,0) 
                   and Test1>=coalesce(Test12,0) then Test1
                  when Test2>=coalesce(Test1,0) and Test2>=coalesce(Test3,0) 
                   and Test2>=coalesce(Test4,0) and Test2>=coalesce(Test5,0)
                   and Test2>=coalesce(Test6,0) and Test2>=coalesce(Test7,0) 
                   and Test2>=coalesce(Test8,0) and Test2>=coalesce(Test9,0) 
                   and Test2>=coalesce(Test10,0) and Test2>=coalesce(Test11,0) 
                   and Test2>=coalesce(Test12,0) then Test2
                  when Test3>=coalesce(Test1,0) and Test3>=coalesce(Test2,0) 
                   and Test3>=coalesce(Test4,0) and Test3>=coalesce(Test5,0) 
                   and Test3>=coalesce(Test6,0) and Test3>=coalesce(Test7,0) 
                   and Test3>=coalesce(Test8,0) and Test3>=coalesce(Test9,0) 
                   and Test3>=coalesce(Test10,0) and Test3>=coalesce(Test11,0) 
                   and Test3>=coalesce(Test12,0) then Test3
                  when Test4>=coalesce(Test1,0) and Test4>=coalesce(Test2,0) 
                   and Test4>=coalesce(Test3,0) and Test4>=coalesce(Test5,0)
                   and Test4>=coalesce(Test6,0) and Test4>=coalesce(Test7,0) 
                   and Test4>=coalesce(Test8,0) and Test4>=coalesce(Test9,0) 
                   and Test4>=coalesce(Test10,0) and Test4>=coalesce(Test11,0) 
                   and Test4>=coalesce(Test12,0) then Test4
                  when Test5>=coalesce(Test1,0) and Test5>=coalesce(Test2,0) 
                   and Test5>=coalesce(Test3,0) and Test5>=coalesce(Test4,0) 
                   and Test5>=coalesce(Test6,0) and Test5>=coalesce(Test7,0) 
                   and Test5>=coalesce(Test8,0) and Test5>=coalesce(Test9,0) 
                   and Test5>=coalesce(Test10,0) and Test5>=coalesce(Test11,0) 
                   and Test5>=coalesce(Test12,0) then Test5
                  when Test6>=coalesce(Test1,0) and Test6>=coalesce(Test2,0) 
                   and Test6>=coalesce(Test3,0) and Test6>=coalesce(Test4,0) 
                   and Test6>=coalesce(Test5,0) and Test6>=coalesce(Test7,0) 
                   and Test6>=coalesce(Test8,0) and Test6>=coalesce(Test9,0) 
                   and Test6>=coalesce(Test10,0) and Test6>=coalesce(Test11,0) 
                   and Test6>=coalesce(Test12,0) then Test6
                  when Test7>=coalesce(Test1,0) and Test7>=coalesce(Test2,0) 
                   and Test7>=coalesce(Test3,0) and Test7>=coalesce(Test4,0) 
                   and Test7>=coalesce(Test5,0) and Test7>=coalesce(Test6,0) 
                   and Test7>=coalesce(Test8,0) and Test7>=coalesce(Test9,0) 
                   and Test7>=coalesce(Test10,0) and Test7>=coalesce(Test11,0) 
                   and Test7>=coalesce(Test12,0) then Test7
                  when Test8>=coalesce(Test1,0) and Test8>=coalesce(Test2,0) 
                   and Test8>=coalesce(Test3,0) and Test8>=coalesce(Test4,0) 
                   and Test8>=coalesce(Test5,0) and Test8>=coalesce(Test6,0) 
                   and Test8>=coalesce(Test7,0) and Test8>=coalesce(Test9,0) 
                   and Test8>=coalesce(Test10,0) and Test8>=coalesce(Test11,0) 
                   and Test8>=coalesce(Test12,0) then Test8
                  when Test9>=coalesce(Test1,0) and Test9>=coalesce(Test2,0) 
                   and Test9>=coalesce(Test3,0) and Test9>=coalesce(Test4,0) 
                   and Test9>=coalesce(Test5,0) and Test9>=coalesce(Test6,0) 
                   and Test9>=coalesce(Test7,0) and Test9>=coalesce(Test8,0) 
                   and Test9>=coalesce(Test10,0) and Test9>=coalesce(Test11,0) 
                   and Test9>=coalesce(Test12,0) then Test9
                  when Test10>=coalesce(Test1,0) and Test10>=coalesce(Test2,0) 
                   and Test10>=coalesce(Test3,0) and Test10>=coalesce(Test4,0) 
                   and Test10>=coalesce(Test5,0) and Test10>=coalesce(Test6,0) 
                   and Test10>=coalesce(Test7,0) and Test10>=coalesce(Test8,0)
                   and Test10>=coalesce(Test9,0) and Test10>=coalesce(Test11,0) 
                   and Test10>=coalesce(Test12,0) then Test10
                  when Test11>=coalesce(Test1,0) and Test11>=coalesce(Test2,0) 
                   and Test11>=coalesce(Test3,0) and Test11>=coalesce(Test4,0) 
                   and Test11>=coalesce(Test5,0) and Test11>=coalesce(Test6,0) 
                   and Test11>=coalesce(Test7,0) and Test11>=coalesce(Test8,0) 
                   and Test11>=coalesce(Test9,0) and Test11>=coalesce(Test10,0) 
                   and Test11>=coalesce(Test12,0) then Test11
                  when Test12>=coalesce(Test1,0) and Test12>=coalesce(Test2,0) 
                   and Test12>=coalesce(Test3,0) and Test12>=coalesce(Test4,0) 
                   and Test12>=coalesce(Test5,0) and Test12>=coalesce(Test6,0) 
                   and Test12>=coalesce(Test7,0) and Test12>=coalesce(Test8,0) 
                   and Test12>=coalesce(Test9,0) and Test12>=coalesce(Test10,0) 
                   and Test12>=coalesce(Test11,0) then Test12
                end
from #Scores
/*
StudentID MaxScore
--------- --------
        1      100
        2       93
        3      100
*/ 
 
(Sheesh… The things I do to demonstrate stupidity… I can’t believe I actually typed that whole thing out… I must be insane).

Well, the UNPIVOT clause can come to our rescue. It can unpivot the columns into rows and we can use T-SQL’s ability to do aggregations. So we can easily calculate the Average, Minimum, Maximum, and even the Standard Deviation of those test scores:

select StudentID
      ,AverageScore=avg(TestScore*1.0)
      ,MinScore=min(TestScore)
      ,MaxScore=max(TestScore)
      ,StdDev=stdev(TestScore)
from #Scores
unpivot (TestScore for Descript in (Test1,Test2,Test3,Test4,Test5,Test6
                                   ,Test7,Test8,Test9,Test10,Test11,Test12)) P
group by StudentID
/*
StudentID AverageScore MinScore MaxScore           StdDev
--------- ------------ -------- -------- ----------------
        1    91.090909       73      100 7.18963900977711
        2    85.363636       79       93 3.95658254741963
        3    97.750000       91      100 2.80016233295663
*/
Now isn’t that a lot easier by a factor of about a thousand?

Transpositions

This section is just for fun, really… however it might give you some ideas for other things.

Let’s take our table of test scores and turn the columns into rows and rows into columns… in other words, we’ll unpivot and then re-pivot.

;with UnpivotedScores as
(
  select StudentColumn='Student'+convert(varchar(10),StudentID)
        ,TestScore
        ,TestID=convert(int,substring(TestColumnName,5,len(TestColumnName)))
  from #Scores
  unpivot (TestScore for TestColumnName in (Test1,Test2,Test3,Test4,Test5,Test6
                                           ,Test7,Test8,Test9,Test10,Test11,Test12)) U
)
select * 
from UnpivotedScores
pivot (max(TestScore) for StudentColumn in (Student1,Student2,Student3)) P
/*
TestID Student1 Student2 Student3
------ -------- -------- --------
     1       96       83      100
     2       87       84      100
     3     NULL       89       98
     4      100       79       91
     5       99       88      100
     6       91     NULL       95
     7       73       93       96
     8       90       82       96
     9       90       85       99
    10       91       86      100
    11       93       88       98
    12       92       82      100
*/
So in this query, after unpivoting the data, I took the StudentID’s of 1 and 2 and 3 and created a column called StudentColumn containing the strings ‘Student1’, ‘Student2’, and ‘Student3’. I also created a column called TestID, extracting its value out of TestColumnName, a column created by the UNPIVOT. Finally, I PIVOTed the data across the three students.

Of course, I was only able to do this because I knew that I had only 3 studentsin my data. If I had more, then I would have to list more student columns in the PIVOT.

Here’s another fun and crazy exercise.

Create a table of 20,000 rows, with each row containing 8 random values from 1 to 999999.

if object_id('tempdb..#BunchesOfNums','U') is not null drop table #BunchesOfNums
go
create table #BunchesOfNums 
(
   Num1 int
  ,Num2 int
  ,Num3 int
  ,Num4 int
  ,Num5 int
  ,Num6 int
  ,Num7 int
  ,Num8 int
)
go
set nocount on
insert #BunchesOfNums 
select abs(checksum(newid()))%999999+1
      ,abs(checksum(newid()))%999999+1
      ,abs(checksum(newid()))%999999+1
      ,abs(checksum(newid()))%999999+1
      ,abs(checksum(newid()))%999999+1
      ,abs(checksum(newid()))%999999+1
      ,abs(checksum(newid()))%999999+1
      ,abs(checksum(newid()))%999999+1 
go 20000
 
select * from #BunchesOfNums
/*
  Num1   Num2   Num3   Num4   Num5   Num6   Num7   Num8
------ ------ ------ ------ ------ ------ ------ ------
430583 396264 325292 396579 294463 540881 827428 332591
819493 634199 705003 711098 387911 146887 931888 705560
967582  19768 391719 243203 716007 378208 471666 908523
206660 382012 753810 289947 725329 309939 268837 770277
651409 353534  54753 799079 317000 589131  77782 136318
  4796 481041 870133 282436 243489 890761 307494 510633
682449 686105 311957 626728 781477 902128 974246 256178
933380 499938 653796  33348 536858 359010 875253 414892
... And so on for 20,000 rows
*/
Let’s reorganize all of this data, sorting it across the columns (i.e. horizontally):

;with NumbersUnpivot as
(
  select Num
        ,NewRow=(row_number() over (order by Num)-1)/8+1
        ,NewCol=(row_number() over (order by Num)-1)%8+1
  from #BunchesOfNums
  unpivot (Num for NumColumn in (Num1,Num2,Num3,Num4,Num5,Num6,Num7,Num8)) P
)
select Num1,Num2,Num3,Num4,Num5,Num6,Num7,Num8
from (select Num
            ,NewRow
            ,ColName='Num'+convert(varchar(10),NewCol)
      from NumbersUnpivot) A
pivot (max(Num) for ColName in (Num1,Num2,Num3,Num4,Num5,Num6,Num7,Num8)) P
order by NewRow
/*
  Num1   Num2   Num3   Num4   Num5   Num6   Num7   Num8
------ ------ ------ ------ ------ ------ ------ ------
    17     40     63     67     68     71     79     90
    94     97    103    109    120    122    134    140
   166    173    174    177    183    192    194    195
   204    206    207    208    216    220    222    229
   238    247    263    273    277    278    290    300
 . . .
999772 999772 999773 999780 999781 999783 999783 999787
999794 999804 999806 999808 999808 999822 999830 999831
999832 999840 999841 999855 999868 999875 999887 999905
999905 999910 999930 999938 999940 999949 999963 999966
999966 999972 999986 999988 999991 999991 999992 999996
*/
Again, this was an UNPIVOT followed by a PIVOT. The original table was unpivoted, creating a derived table of 1 column (called Num) containing all 160,000 of our numbers. I created new columns called NewRow and NewCol, which were calculated using the ROW_NUMBER() function based on the sorting of the Num column. NewRow was calculated by dividing by 8 and NewCol was calculated by using doing a modulo 8 (i.e. the remainder in dividing by 8). Then I re-PIVOTed the numbers based on these new Row/Column positions.

We could also reorganize the numbers down across the rows (i.e. vertically). This time, we have to use 20,000 (the number of rows) for our divisor. But let’s make it more general and assume we don’t know up front how many rows there are. Let’s use the COUNT() aggregate to figure that number out and use that as the divisor:

;with NumbersUnpivot as
(
  select Num
        ,NewRow=(row_number() over (order by Num)-1)%(count(Num) over () / 8)+1
        ,NewCol=(row_number() over (order by Num)-1)/(count(Num) over () / 8)+1
  from #BunchesOfNums
  unpivot (Num for NumColumn in (Num1,Num2,Num3,Num4,Num5,Num6,Num7,Num8)) P
)
select Num1,Num2,Num3,Num4,Num5,Num6,Num7,Num8
from (select Num
            ,NewRow
            ,ColName='Num'+convert(varchar(10),NewCol)
      from NumbersUnpivot) A
pivot (max(Num) for ColName in (Num1,Num2,Num3,Num4,Num5,Num6,Num7,Num8)) P
order by NewRow
/*
  Num1   Num2   Num3   Num4   Num5   Num6   Num7   Num8
------ ------ ------ ------ ------ ------ ------ ------
    17 125498 250005 374416 499463 623675 749265 875013
    40 125501 250008 374421 499471 623680 749268 875025
    63 125512 250010 374429 499472 623687 749269 875029
    67 125517 250018 374431 499480 623691 749278 875031
    68 125533 250028 374433 499485 623691 749289 875033
 . . .
125479 249986 374402 499427 623659 749216 874991 999988
125480 249987 374409 499433 623661 749242 874994 999991
125484 249999 374411 499446 623662 749243 874997 999991
125488 250001 374413 499453 623663 749251 875003 999992
125493 250003 374416 499455 623675 749254 875004 999996
*/
This certainly works, but the query is not really that efficient. Because of the COUNT() aggregate and the way it’s used, the query employs a whopping 384159 reads and 738 writes, because it needs to spool information to tempdb to do its work.

The following is more efficient and only employs 650 reads and 0 writes… an enormous difference!

;with NumbersUnpivot as
(
  select Num
        ,NewRow=(row_number() over (order by Num)-1)%NumRows+1
        ,NewCol=(row_number() over (order by Num)-1)/NumRows+1
  from #BunchesOfNums
  cross join (select NumRows=count(*) from #BunchesOfNums) C
  unpivot (Num for NumColumn in (Num1,Num2,Num3,Num4,Num5,Num6,Num7,Num8)) P
)
select Num1,Num2,Num3,Num4,Num5,Num6,Num7,Num8
from (select Num
            ,NewRow
            ,ColName='Num'+convert(varchar(10),NewCol)
      from NumbersUnpivot) A
pivot (max(Num) for ColName in (Num1,Num2,Num3,Num4,Num5,Num6,Num7,Num8)) P
order by NewRow
/*
  Num1   Num2   Num3   Num4   Num5   Num6   Num7   Num8
------ ------ ------ ------ ------ ------ ------ ------
    17 125498 250005 374416 499463 623675 749265 875013
    40 125501 250008 374421 499471 623680 749268 875025
    63 125512 250010 374429 499472 623687 749269 875029
    67 125517 250018 374431 499480 623691 749278 875031
    68 125533 250028 374433 499485 623691 749289 875033
 . . .
125479 249986 374402 499427 623659 749216 874991 999988
125480 249987 374409 499433 623661 749242 874994 999991
125484 249999 374411 499446 623662 749243 874997 999991
125488 250001 374413 499453 623663 749251 875003 999992
125493 250003 374416 499455 623675 749254 875004 999996
*/
As you can see, I calculated the number of rows up front (into a column called NumRows) and I CROSS JOINed that with the table itself so that I could use it as the divisor in calculating NewRow and NewCol.

I encourage you to examine the plans of the two queries above to see the differences.

Column Data Repair

How often have you seen a scenario like this?: A table has 3 or 4 or 5 or more columns that contain the same kind of information. An address is a good example. A customer might have 1 or 2 or 3 lines of address and there are 3 columns for that information to be stored.

Another example is credit card information. A table has 5 columns to store up to 5 possible credit card numbers that we have on file for a person.

Again, this is very bad design… the credit card information should be put into a child table of its own, with one row for each credit card. Then many credit cards could be recorded and old ones can be deleted as much as desired.

But in the bad table design with 5 credit card columns, people will just erase outdated credit card numbers over time and not reorganize them and you end up with lots of possible “holes” as in the following example:

use TempDB
go
if object_id('CredCardInfo','U') is not null drop table CredCardInfo
go
create table CredCardInfo
(
   ID int primary key
  ,Card1 varchar(16)
  ,Card2 varchar(16)
  ,Card3 varchar(16)
  ,Card4 varchar(16)
  ,Card5 varchar(16)
)
go
 
insert CredCardInfo values
  (1,'','4898198371982987','5379110787298721','','')
 ,(2,'6798172988177792','','','5798116090379113','4798116444218977')
 ,(3,'','','','','4980926667212798')
 ,(4,'','','6877172983719871','','4079267155432178')
 ,(5,'4076372911736532','6888327238009837','5368200841237897','','')
 
select * from CredCardInfo
/*
ID Card1            Card2            Card3            Card4            Card5
-- ---------------- ---------------- ---------------- ---------------- ----------------
 1                  4898198371982987 5379110787298721                  
 2 6798172988177792                                   5798116090379113 4798116444218977
 3                                                                     4980926667212798
 4                                   6877172983719871                  4079267155432178
 5 4076372911736532 6888327238009837 5368200841237897                  
*/
You can see that the first 4 rows have empty columns where credit card numbers used to be. We want to repair this information so that all the credit card numbers are like the 5th row, where all the card numbers are in the 1st and 2nd and 3rd slots and so on with no “holes”.

So here is a stored procedure that does just that:

use TempDb
go
if object_id('RepairCardInfo','P') is not null drop procedure RepairCardInfo
go
create procedure RepairCardInfo
as
with CardsUnpivot as
(
  select ID
        ,Card
        ,OrigColName
        ,NewColName='Card'
                   +convert(char(1),row_number() over (partition by ID 
                                                       order by OrigColName))
  from CredCardInfo
  unpivot (Card for OrigColName in (Card1,Card2,Card3,Card4,Card5)) P
  where Card<>''
)
,IDsThatNeedChange as
(
  select ID
  from CardsUnpivot
  where OrigColName<>NewColName
)
,NewCardValues as
(
  select ID,Card1,Card2,Card3,Card4,Card5
  from (select ID,Card,NewColName 
        from CardsUnpivot
        where ID in (select ID from IDsThatNeedChange)) A
  pivot (max(Card) for NewColName in (Card1,Card2,Card3,Card4,Card5)) P
)
update C
set Card1=coalesce(n.Card1,'')
   ,Card2=coalesce(n.Card2,'')
   ,Card3=coalesce(n.Card3,'')
   ,Card4=coalesce(n.Card4,'')
   ,Card5=coalesce(n.Card5,'')
from CredCardInfo C
join NewCardValues N on C.ID=N.ID
 
Like other examples we’ve seen in this article, we UNPIVOT the data and then re-PIVOT it.

Take a closer look at the CardsUnpivot CTE… here’s what it produces:

select ID,Card,OrigColName
      ,NewColName='Card'
                 +convert(char(1),row_number() over (partition by ID 
                                                     order by OrigColName))
from CredCardInfo
unpivot (Card for OrigColName in (Card1,Card2,Card3,Card4,Card5)) P
where Card<>''
/*
ID Card             OrigColName NewColName
-- ---------------- ----------- ----------
 1 4898198371982987 Card2       Card1
 1 5379110787298721 Card3       Card2
 2 6798172988177792 Card1       Card1
 2 5798116090379113 Card4       Card2
 2 4798116444218977 Card5       Card3
 3 4980926667212798 Card5       Card1
 4 6877172983719871 Card3       Card1
 4 4079267155432178 Card5       Card2
 5 4076372911736532 Card1       Card1
 5 6888327238009837 Card2       Card2
 5 5368200841237897 Card3       Card3
*/
Notice that after the UNPIVOTing, we create a NewColName, based on the ROW_NUMBER() ordered by OrigColName (within each ID). And that NewColName is sequenced as ‘Card1’, ‘Card2’, etc for each ID.

The next CTE, IDdsThatNeedChange, finds all IDs where OrigColName and NewColName are different. Note that ID Number 5 has them all matching because its data is already in the format we want, with all the numbers in slot 1 and 2 and 3.

The final NewCardValues CTE rePIVOTs the data back into the 5 credit card columns… but only for those IDs that need the change.

And that CTE is used to UPDATE the original table… updating only those rows that need change.

If we execute this procedure, and then look at the data again, we see that all is repaired:

set nocount off
exec RepairCardInfo
/* (4 row(s) affected) */
  
select * from CredCardInfo
/*
ID Card1            Card2            Card3            Card4            Card5
-- ---------------- ---------------- ---------------- ---------------- ----------------
 1 4898198371982987 5379110787298721                                   
 2 6798172988177792 5798116090379113 4798116444218977                  
 3 4980926667212798                                                    
 4 6877172983719871 4079267155432178                                   
 5 4076372911736532 6888327238009837 5368200841237897                  
*/

Note that only 4 of 5 rows were actually UPDATEd. If we run the procedure again, nothing happens, because all of our data is already repaired:

set nocount off
exec RepairCardInfo
/* (0 row(s) affected) */
Vertical Presentations of Data

Sometimes it’s nice to see data presented in a vertical format rather than in columns.

In AdventureWorks, there’s a view called Sales.vStoreWithDemographics, which JOINs many tables to come up with Contact information, Name and Address information, and Demographic information for Stores in the database.

Let’s say we want to create an inline table-valued function which accepts a @CustomerID parameter to present data for the Main Office of a Customer Store. Since there can potentially be more than one contact for a store, we will use ROW_NUMBER() to only look at the first instance.

If we attempt to use the UNPIVOT operator to do this, we run into trouble:

if object_id('VerticalMainOfficeData','IF') is not null 
  drop function VerticalMainOfficeData
go
create function VerticalMainOfficeData
(
  @CustomerID int
)
returns table
as
return
with BaseData as
(
  select SeqNo=row_number() over (order by ContactType)
        ,Name,AddressLine1,AddressLine2,City,StateProvinceName
        ,PostalCode,CountryRegionName
        ,YearOpened,NumberEmployees,Specialty 
        ,SquareFeet,Brands,AnnualSales
  from AdventureWorks.Sales.vStoreWithDemographics
  where CustomerID=@CustomerID
    and AddressType='Main Office'
)
select Property,Value
from BaseData
unpivot (Value for Property in (Name
                               ,AddressLine1
                               ,AddressLine2
                               ,City
                               ,StateProvinceName 
                               ,PostalCode 
                               ,CountryRegionName 
                               ,YearOpened
                               ,NumberEmployees
                               ,Specialty
                               ,SquareFeet
                               ,Brands
                               ,AnnualSales)) P
where SeqNo=1
/*
Msg 8167, Level 16, State 1, Procedure VerticalMainOfficeData, Line 22
The type of column "AddressLine1" conflicts with the type of other columns 
  specified in the UNPIVOT list.
*/
That’s yet another problem with the UNPIVOT operator. Every single column name specified in the list must be exactly the same datatype and have exactly the same length.

But, as we now know, we can just use an APPLY operator instead of UNPIVOT, and we get truckloads of flexibility, as you can see by this function definition, which returns our information in a nice fancy way:

if object_id('VerticalMainOfficeData','IF') is not null 
  drop function VerticalMainOfficeData
go
create function VerticalMainOfficeData
(
  @CustomerID int
)
returns table
as
return
with BaseData as
(
  select SeqNo=row_number() over (order by ContactType)
        ,Name,AddressLine1,AddressLine2,City,StateProvinceName
        ,PostalCode,CountryRegionName
        ,YearOpened,NumberEmployees,Specialty 
        ,SquareFeet,Brands,AnnualSales
  from AdventureWorks.Sales.vStoreWithDemographics
  where CustomerID=@CustomerID
    and AddressType='Main Office'
)
select Property,Value
from BaseData
cross apply 
   (values ('NAME AND ADDRESS:','')
          ,('  Name',Name)
          ,('  Address',AddressLine1)
          ,('  ',AddressLine2)
          ,('  City',City)
          ,('  State/Province',StateProvinceName)
          ,('  Postal Code',PostalCode)
          ,('  Country/Region',CountryRegionName)
          ,('','')
          ,('DEMOGRAPHIC INFO:','')
          ,('  Year Opened',str(YearOpened,4))
          ,('  Number of Employees',convert(varchar(10),NumberEmployees))
          ,('  Specialty',Specialty)
          ,('  Square Feet',convert(varchar(10),SquareFeet))
          ,('  Brands',Brands)
          ,('  Annual Sales','$'+convert(varchar(20),AnnualSales,1))) P(Property,Value)
where SeqNo=1
  and Value is not null
Since we’re effectively UNIONing the data in the APPLY, we of course had to CONVERT any numerical data to a VARCHAR value, but we didn’t have to care about the various lengths of the various VARCHAR values. Here are a couple of examples of the output:

select * from dbo.VerticalMainOfficeData(34)
/*
Property              Value
--------------------- -----------------------------
NAME AND ADDRESS:     
  Name                Cycles Wholesaler & Mfg.
  Address             Science Park South, Birchwood
                      Stanford House
  City                Warrington
  State/Province      England
  Postal Code         WA3 7BH
  Country/Region      United Kingdom
                      
DEMOGRAPHIC INFO:     
  Year Opened         1999
  Number of Employees 15
  Specialty           Touring
  Square Feet         21000
  Brands              4+
  Annual Sales        $800,000.00
*/
 
select * from dbo.VerticalMainOfficeData(388)
/*
Property              Value
--------------------- ---------------------------
NAME AND ADDRESS:     
  Name                Road Way Sales and Rental
  Address             2900 - 25055 Dunning Street
  City                Vancouver
  State/Province      British Columbia
  Postal Code         V7L 4J4
  Country/Region      Canada
                      
DEMOGRAPHIC INFO:     
  Year Opened         1979
  Number of Employees 6
  Specialty           Mountain
  Square Feet         10000
  Brands              3
  Annual Sales        $300,000.00
*/  
 
Notice in the first example how the second address line is displayed… with a blank string in the Property column… so it looks like it’s just a second line of the address like it should be. The second example had no second line of address (it was NULL in the original table)… it was eliminated by the WHERE clause that eliminated any NULL values.

Finally, I decided to go wild and create a more extensive function that included contact information (up to 2 contacts… no Main Office of any Store in AdventureWorks has more than 2 contacts anyway).

Here is the function I created to do that.

if object_id('VerticalMainOfficeDataWithContacts','IF') is not null 
  drop function VerticalMainOfficeDataWithContacts
go
create function VerticalMainOfficeDataWithContacts
(
  @CustomerID int
)
returns table
as
return
with BaseData as
(
  select SeqNo=row_number() over (order by ContactType)
        ,Name,AddressLine1,AddressLine2,City,StateProvinceName
        ,PostalCode,CountryRegionName
        ,YearOpened,NumberEmployees,Specialty 
        ,SquareFeet,Brands,AnnualSales
        ,ContactType,Title,FirstName,MiddleName
        ,LastName,Suffix,Phone,EmailAddress
  from AdventureWorks.Sales.vStoreWithDemographics
  where CustomerID=@CustomerID
    and AddressType='Main Office'
)
,ContactInfo as
(
  select SeqNo
        ,ContactType
        ,ContactName=coalesce(Title+' ','')
                    +FirstName+' '
                    +coalesce(MiddleName+' ','')
                    +LastName
                    +coalesce(' '+Suffix,'')
        ,Phone
        ,EmailAddress
  from BaseData
)
,ContactUnpivot as
(
  select Property=Property+str(SeqNo,1)
        ,Value
  from ContactInfo
  cross apply (values ('ContactType',ContactType)
                     ,('ContactName',ContactName)
                     ,('Phone',Phone)
                     ,('EmailAddress',EmailAddress)) C(Property,Value)
)
,ContactColumns as
(
  select *
  from ContactUnpivot 
  pivot (max(Value) for Property in (ContactType1,ContactName1,Phone1,EmailAddress1 
                                    ,ContactType2,ContactName2,Phone2,EmailAddress2)) P
)
,UnpivotInput as
(
  select *
  from BaseData
  cross join ContactColumns
  where SeqNo=1
)
select Property,Value
from UnpivotInput  
cross apply 
   (values ('NAME AND ADDRESS:','')
          ,('  Name',Name)
          ,('  Address',AddressLine1)
          ,('  ',AddressLine2)
          ,('  City',City)
          ,('  State/Province',StateProvinceName)
          ,('  Postal Code',PostalCode)
          ,('  Country/Region',CountryRegionName)
          ,('','')
          ,('DEMOGRAPHIC INFO:','')
          ,('  Year Opened',str(YearOpened,4))
          ,('  Number of Employees',convert(varchar(10),NumberEmployees))
          ,('  Specialty',Specialty)
          ,('  Square Feet',convert(varchar(10),SquareFeet))
          ,('  Brands',Brands)
          ,('  Annual Sales','$'+convert(varchar(20),AnnualSales,1))
          ,('','')
          ,('CONTACT'+case when ContactType2 is not null then ' #1' else '' end+':','')
          ,('  Type',ContactType1)
          ,('  Name',ContactName1)
          ,('  Phone',Phone1)
          ,('  Email',EmailAddress1)
          ,('',case when ContactType2 is not null then '' end)
          ,('CONTACT #2:',case when ContactType2 is not null then '' end)
          ,('  Type',ContactType2)
          ,('  Name',ContactName2)
          ,('  Phone',Phone2)
          ,('  Email',EmailAddress2)) P(Property,Value)
where Value is not null
I’ll leave it up to you to examine how it works as an exercise. In a nutshell, it unpivots and re-pivots the contact information into a single-row derived table, and then that’s JOINed back to the original base data and then the whole thing is unpivoted.

Here are a couple examples of the output. Note how the first one has only 1 contact (with the heading ‘CONTACT’) and the second one has 2 contacts (with the headings ‘CONTACT #1’and ‘CONTACT #2’:

select * from dbo.VerticalMainOfficeDataWithContacts(34)
/*
Property              Value
--------------------- -----------------------------
NAME AND ADDRESS:     
  Name                Cycles Wholesaler & Mfg.
  Address             Science Park South, Birchwood
                      Stanford House
  City                Warrington
  State/Province      England
  Postal Code         WA3 7BH
  Country/Region      United Kingdom
                      
DEMOGRAPHIC INFO:     
  Year Opened         1999
  Number of Employees 15
  Specialty           Touring
  Square Feet         21000
  Brands              4+
  Annual Sales        $800,000.00
                      
CONTACT:              
  Type                Owner
  Name                Ms. Barbara J. German
  Phone               1 (11) 500 555-0181
  Email               barbara4@adventure-works.com
*/  
 
select * from dbo.VerticalMainOfficeDataWithContacts(50)
/*
Property              Value
--------------------- ------------------------
NAME AND ADDRESS:     
  Name                Hometown Riding Supplies
  Address             Bundesallee 9571
                      Rechnungsstelle - C 035
  City                Berlin
  State/Province      Brandenburg
  Postal Code         14197
  Country/Region      Germany
                      
DEMOGRAPHIC INFO:     
  Year Opened         1991
  Number of Employees 46
  Specialty           Mountain
  Square Feet         38000
  Brands              4+
  Annual Sales        $1,500,000.00
                      
CONTACT #1:           
  Type                Owner
  Name                Mr. William Vong
  Phone               809-555-0100
  Email               william6@adventure-works.com
                      
CONTACT #2:           
  Type                Purchasing Manager
  Name                Mr. Greg Harrison
  Phone               1 (11) 500 555-0112
  Email               greg2@adventure-works.com
*/
I hope this article gave you some interesting examples of using UNPIVOT (and yes, PIVOT also). I also hope it stimulated your imagination on creative things you can do with T-SQL.

A SQLSaturday We Can Call Our Own

Thursday, February 25th, 2010

At last, we have just started the real work that it is involved in organizing a SQLSaturday event (see related post). Earlier today our group had a conference call for preliminary planning for the event. A lot about logistics, sponsorships, swag, funds and others were discussed.  We are now slowly seeing the fruit of this effort we all started back in October last year. SQLSaturday is now coming to the Southern California Region.

There have been plans by different groups here in SoCal for an event like this. All those came to reality by way of the SQLSaturday concept. There have been many roadblocks on the onset. The first site we talked to turned down our request. Then the next location didn’t materialize. We were finally  able to convince one and they are offering their rooms for free (no, not a Microsoft facility). It was not easy to find a location that was willing to host the event. But then that is not the hardest part.

We are now just beginning to do the actual work of putting this together. We’re seeing challenges ahead of us but we’re more interested in the positive results than in the tangible upsets. We’re making this happen.

(Note: Please stand by for more updates. We will soon work with the SQLSaturday people to setup our own website for this event)


Tagged: socal, sql saturday, SQL Server, SQLSat, Training

A SQLSaturday We Can Call Our Own

Thursday, February 25th, 2010

At last, we have just started the real work that it is involved in organizing a SQLSaturday event (see related post). Earlier today our group had a conference call for preliminary planning for the event. A lot about logistics, sponsorships, swag, funds and others were discussed.  We are now slowly seeing the fruit of this effort we all started back in October last year. SQLSaturday is now coming to the Southern California Region.

There have been plans by different groups here in SoCal for an event like this. All those came to reality by way of the SQLSaturday concept. There have been many roadblocks on the onset. The first site we talked to turned down our request. Then the next location didn’t materialize. We were finally  able to convince one and they are offering their rooms for free (no, not a Microsoft facility). It was not easy to find a location that was willing to host the event. But then that is not the hardest part.

We are now just beginning to do the actual work of putting this together. We’re seeing challenges ahead of us but we’re more interested in the positive results than in the tangible upsets. We’re making this happen.

(Note: Please stand by for more updates. We will soon work with the SQLSaturday people to setup our own website for this event)


Tagged: socal, sql saturday, SQL Server, SQLSat, Training