Jamie Thomson’s whistlestop tour of SSIS addins

November 21st, 2009 by Brent Ozar

At #SQLbits, Jamie Thomson (Blog – Twitter) demoed the following SSIS add-ins:

  • Kimball SDC
  • Trace File Source
  • XMLify
  • File Watcher
  • Dynamic Data Flow
  • Rank Transform
  • Normaliser
  • Twitter
  • Compression

XMLify

Jamie find this useful when parsing error output from other SSIS steps, which often have several result sets with different columns.  XMLify dumps all of the data into a single XML file for easier error storage.

It’s currently only available for SQL 2005 on Codeplex, but Jamie’s updated it for 2008.  He expects to upload it to Codeplex shortly.

Kimball SDC

SSIS includes a Slowly Changing Dimensions wizard, but Jamie says it has extremely poor performance due to the amount of lookups it does, plus you can’t modify your work.  You have to reuse the wizard.  One workaround is the KimballSCD tool on Codeplex, which handles Type 1 and Type 2 dimensions.  These dimensions track the history of our data – for example, as our customers change from Single to Married to Divorced, we may want to capture those changes so we can view their status over time.

The KimballSCD component takes your incoming stream of data and dumps out separate streams with new records, changed records, deleted records, invalid input, and so forth.  It has a staggering array of configuration choices on how to handle errors.

Twitter Task

Built by Andy Leonard and Jessica Moss, this SSIS component can fetch and retrieve tweets.  Of course, when he went to demo Twitter.com, the page took forever to load.  Ah, the curse of using Twitter and WiFi networks for demos!

He demoed how to send tweets, then how to receive them and pipe the results through the Term Extraction task.  It’s basically data mining for text – it finds the most popular phrases in the stream of tweets.  Jamie asked if attendees found it interesting, and they definitely did.  A couple of attendees wanted to know when Twitter search integration would be included.  Doh!

Normaliser

This component takes a flat source like a list of orders that includes both header and detail info in the same row, and then normalizes it.  You have to pass in the data in the right sort number, because it uses those sorts to determine duplicates.  You pick which fields determine a header versus a detail record using checkboxes.

Rank Transform

The ranking T-SQL functions in SQL 2005 will take a set of data and add a rank column, and this SSIS component does the same thing.  Jamie says it’s still a little buggy, but he’ll work out the bugs in the next few days.  (Honesty is the best policy!)  The bugs are problems with the UI – checkboxes don’t show up as checked when they’re supposed to be.  The data under the covers works though.

It has nice options for rank, dense rank, row number, and row number by partition.  Rank and dense rank are two different ways to handle ties.

Compression Task

Pass in a file – but only one file at a time – and this task will gzip it.  It works for SQL 2005, and there’s a newer version coming on Codeplex to work in SQL Server 2008’s SSIS.  There’s an identical task to unzip.  I can see this as being useful for passing big XML files over networks.

File Watcher

This task watches a directory and waits for files to arrive.  When a file is created, it goes on to the next task in the package.  The File Watcher passes the name of the newly created file to the next task as well, so you can process that file.

Trace Reader

This takes a SQL Server trace file as a source, and then splits it out into three data outputs: short queries that took under 300ms, queries that took 300-1,000ms, and queries that took over a second.  Useful for DBAs who want to automate performance tuning on a large scale.

Dynamic Data Flow

The only non-free tool in the demo, Dynamic Data Flow is by a company called CozyRoc.  You can use one destination in SSIS, but populate different tables based on the contents of your data.

I bailed out just before the last task was shown because I had to get to my own session.

End of the Line

Jamie did an awesome job of demoing a lot of stuff in a short period of time without being overwhelming.  I can really appreciate how much work he had to put into these demos – every single component needed its own set of data.  Fantastic work.

Jamie Thomson’s whistlestop tour of SSIS addins

November 21st, 2009 by Brent Ozar

At #SQLbits, Jamie Thomson (Blog – Twitter) demoed the following SSIS add-ins:

  • Kimball SDC
  • Trace File Source
  • XMLify
  • File Watcher
  • Dynamic Data Flow
  • Rank Transform
  • Normaliser
  • Twitter
  • Compression

XMLify

Jamie find this useful when parsing error output from other SSIS steps, which often have several result sets with different columns.  XMLify dumps all of the data into a single XML file for easier error storage.

It’s currently only available for SQL 2005 on Codeplex, but Jamie’s updated it for 2008.  He expects to upload it to Codeplex shortly.

Kimball SDC

SSIS includes a Slowly Changing Dimensions wizard, but Jamie says it has extremely poor performance due to the amount of lookups it does, plus you can’t modify your work.  You have to reuse the wizard.  One workaround is the KimballSCD tool on Codeplex, which handles Type 1 and Type 2 dimensions.  These dimensions track the history of our data – for example, as our customers change from Single to Married to Divorced, we may want to capture those changes so we can view their status over time.

The KimballSCD component takes your incoming stream of data and dumps out separate streams with new records, changed records, deleted records, invalid input, and so forth.  It has a staggering array of configuration choices on how to handle errors.

Twitter Task

Built by Andy Leonard and Jessica Moss, this SSIS component can fetch and retrieve tweets.  Of course, when he went to demo Twitter.com, the page took forever to load.  Ah, the curse of using Twitter and WiFi networks for demos!

He demoed how to send tweets, then how to receive them and pipe the results through the Term Extraction task.  It’s basically data mining for text – it finds the most popular phrases in the stream of tweets.  Jamie asked if attendees found it interesting, and they definitely did.  A couple of attendees wanted to know when Twitter search integration would be included.  Doh!

Normaliser

This component takes a flat source like a list of orders that includes both header and detail info in the same row, and then normalizes it.  You have to pass in the data in the right sort number, because it uses those sorts to determine duplicates.  You pick which fields determine a header versus a detail record using checkboxes.

Rank Transform

The ranking T-SQL functions in SQL 2005 will take a set of data and add a rank column, and this SSIS component does the same thing.  Jamie says it’s still a little buggy, but he’ll work out the bugs in the next few days.  (Honesty is the best policy!)  The bugs are problems with the UI – checkboxes don’t show up as checked when they’re supposed to be.  The data under the covers works though.

It has nice options for rank, dense rank, row number, and row number by partition.  Rank and dense rank are two different ways to handle ties.

Compression Task

Pass in a file – but only one file at a time – and this task will gzip it.  It works for SQL 2005, and there’s a newer version coming on Codeplex to work in SQL Server 2008’s SSIS.  There’s an identical task to unzip.  I can see this as being useful for passing big XML files over networks.

File Watcher

This task watches a directory and waits for files to arrive.  When a file is created, it goes on to the next task in the package.  The File Watcher passes the name of the newly created file to the next task as well, so you can process that file.

Trace Reader

This takes a SQL Server trace file as a source, and then splits it out into three data outputs: short queries that took under 300ms, queries that took 300-1,000ms, and queries that took over a second.  Useful for DBAs who want to automate performance tuning on a large scale.

Dynamic Data Flow

The only non-free tool in the demo, Dynamic Data Flow is by a company called CozyRoc.  You can use one destination in SSIS, but populate different tables based on the contents of your data.

I bailed out just before the last task was shown because I had to get to my own session.

End of the Line

Jamie did an awesome job of demoing a lot of stuff in a short period of time without being overwhelming.  I can really appreciate how much work he had to put into these demos – every single component needed its own set of data.  Fantastic work.

A Loan At Last!

November 20th, 2009 by Brad Schulz
Flashy New Car and HomeGot your eye on a flashy new car?

Or perhaps you’re considering a new home?

If you don’t have the cash to pay for either of these, you’ll need to get a loan.

And before you jump in head-first, you’ll want to know how much you’ll be paying each month, and perhaps you’ll want to know the obscene amount of how much interest you’ll end up paying over the life of the loan.

Sure, there are truckloads of loan calculators out there on the web… They’re a dime a dozen.

But who needs those when you have the raw power of SQL Server at your fingertips?

This blog entry will demonstrate how to create a loan payment schedule based on a fixed interest rate with compounded interest, and, as always, we’ll go a little bit beyond that to explore some other things.

The monthly payment amount, c, is expressed by the following formula:

Monthly Payment Calculation

where P is the loan amount, M is the number of months of the loan, and r is the monthly interest rate expressed as a decimal.

The balance of the loan after payment number N is expressed by the following formula:

Balance Calculation for Month N

With all this in mind, here is a stored procedure to produce a payment schedule:

use tempdb;
go
  
if object_id('usp_LoanSchedule') is not null drop procedure usp_LoanSchedule;
go
  
create procedure usp_LoanSchedule
   @LoanAmount     numeric(10,2)
  ,@AnnualRate     numeric(10,8)  /* Expressed as percent */
  ,@NumberOfMonths int
  ,@StartDate      datetime
as
with InputVariables as
(
  select P,M,R
        ,C=round((P*R)/(1-power(1+R,-M)),2)
  from (select P=@LoanAmount
              ,M=@NumberOfMonths
              ,R=@AnnualRate/12/100) InputData
)
,MonthlyPayments(PmtNo
                ,PmtDate
                ,Balance
                ,Principle
                ,Interest
                ,CumulPrinciple
                ,CumulInterest) as
(
  select N
        ,dateadd(month,datediff(month,'19000101',@StartDate),'19000101')
        ,cast(NewBalance as numeric(10,2))
        ,cast(P-NewBalance as numeric(10,2))
        ,cast(C-(P-NewBalance) as numeric(10,2))
        ,cast(P-NewBalance as numeric(10,2))
        ,cast(C-(P-NewBalance) as numeric(10,2))
  from InputVariables
  cross apply (select N=1) CalcPmtNo
  cross apply (select NewBalance=round(P*power(1+R,N)
                                      -(power(1+R,N)-1)*C/R,2)) CalcNewBalance
  union all
  select N
        ,dateadd(month,1,mp.PmtDate)
        ,cast(NewBalance as numeric(10,2))
        ,cast(mp.Balance-NewBalance as numeric(10,2))
        ,cast(C-(mp.Balance-NewBalance) as numeric(10,2))
        ,cast(mp.CumulPrinciple+mp.Balance-NewBalance as numeric(10,2))
        ,cast(mp.CumulInterest+C-(mp.Balance-NewBalance) as numeric(10,2))
  from MonthlyPayments mp
  cross join InputVariables
  cross apply (select N=mp.PmtNo+1) CalcPmtNo
  cross apply (select NewBalance=case
                                   when N=M
                                   then 0.00  /* Last Payment */
                                   else round(P*power(1+R,N)
                                             -(power(1+R,N)-1)*C/R,2)
                                 end) CalcNewBalance
  where N<=M
)
select PmtNo
      ,PmtMonth=datename(month,PmtDate)+str(year(PmtDate),5)
      ,PmtAmount=Principle+Interest
      ,Principle
      ,Interest
      ,CumulPrinciple
      ,CumulInterest
      ,Balance
from MonthlyPayments  
order by PmtNo
option (maxrecursion 1000)
;
If we want to buy a $20,000 car at 6.8% (annual) interest with a 5-year term (60 months) with our first payment in December 2009, then we can call it like so:

exec usp_LoanSchedule @LoanAmount     = 20000
                     ,@AnnualRate     = 6.8
                     ,@NumberOfMonths = 60
                     ,@StartDate      = '20091201'
/*
PmtNo PmtMonth       PmtAmount Principle Interest CumulPrinciple CumulInterest  Balance
----- -------------- --------- --------- -------- -------------- ------------- --------
    1 December 2009     394.14    280.81   113.33         280.81        113.33 19719.19
    2 January 2010      394.14    282.39   111.75         563.20        225.08 19436.80
    3 February 2010     394.14    284.00   110.14         847.20        335.22 19152.80
    4 March 2010        394.14    285.61   108.53        1132.81        443.75 18867.19
  ... ...                 ...       ...      ...            ...           ...      ...
   57 August 2014       394.14    385.33     8.81       18830.89       3635.09  1169.11
   58 September 2014    394.14    387.52     6.62       19218.41       3641.71   781.59
   59 October 2014      394.14    389.71     4.43       19608.12       3646.14   391.88
   60 November 2014     394.14    391.88     2.26       20000.00       3648.40     0.00
*/
Let’s go over how the logic works.

The procedure essentially consists of a single SELECT statement, which is made up of two CTE’s (Common Table Expressions).

The first CTE is called InputVariables, and it just produces a small one-row table of the variables that we need for the calculations… in other words, our friends P, M, r, and c. Note how r is calculated by dividing the @AnnualRate parameter by 12 (to get a monthly rate) and then by 100 (to represent the rate as a decimal). Also note how c (the monthly payment amount) was calculated based on the other 3 variables.

I could have just created local variables (@P, @M, @R, and @C) instead of setting them up in a CTE, but I had two reasons for not doing that. First of all, I didn’t want all those pesky at-signs (@) to be all throughout the code and second, I wanted to encapsulate the entire creation of the loan schedule table in a single SELECT statement (I’ll explain why later).

The second CTE, called MonthlyPayments, is a recursive one. The first part of the recursive CTE (before the UNION ALL) is the anchor. It produces the data for the very first payment (note how I also adjust the first payment date to the first of the month). The second part (the recursive part) of the CTE builds upon that, creating the second payment, then the third, and so on, up until the last payment. It will continue tacking on payments as long as its WHERE clause (WHERE N<=M) is true… in other words, as long as the Payment Number (N) is less than or equal to the number of months of the loan (M). Once the WHERE clause evaluates to false (when N>M), then it will stop.

Both the anchor and the recursive part of the CTE use a couple of CROSS APPLYs. The first CROSS APPLY establishes a new column N, the Payment Number. The only reason I do this is because I want to use N in the calculation of the Balance. You’ll recall that the formula for the Balance uses N as one of its variables.

The second CROSS APPLY is the calculation of that NewBalance, making use of P, r, N, and c. Note the CASE statement that checks if we are processing the final payment (CASE WHEN N=M) and, if so, it forces the final balance to be zero.

The Principle of the payment is calculated by subtracting the NewBalance from the Balance of the previous payment (or in the case of the very first payment, by subtracting NewBalance from the original loan amount P). The Interest part of the payment is calculated by simply subtracting the Principle from the Payment Amount (c).

Since the recursive CTE is building the schedule one payment at a time, we can take advantage of that and calculate cumulative totals for the Principle and the Interest (CumulPrinciple and CumulInterest) as we go along.

Finally, the main SELECT statement pulls all the rows from the MonthlyPayments recursive CTE, dressing up the date by spelling out the month’s name and ORDERing BY the PmtNo.

Note the OPTION clause. By default, the maximum number of recursion levels is 100. The number of levels of recursion we will reach depends on the value of @NumberOfMonths that is passed to the procedure. If we’re doing a mortgage payment that spans 30 years, then @NumberOfMonths will be 360 and we will need a level of recursion of at least 360 or else we will get this message:

/*
Msg 530: The statement terminated. 
The maximum recursion 100 has been exhausted before statement completion.
*/
So our general MAXRECURSION value of 1000 should certainly be reasonable enough, assuming that no loan is going to last more than 83 years.

Great! We have a stored procedure that produces a loan schedule! Terrific!

Now what?

Not much, that’s what. This stored procedure works just fine, but we can’t really do anything with it. All we can do is EXECUTE it. Unless we INSERT its result set into a temporary table, we can’t query any important information out of it.

However, we could query information out of this loan schedule data if we created an inline table-valued function (TVF) instead of the stored procedure, like so:

use tempdb;
go
  
if object_id('ufn_LoanSchedule') is not null drop function ufn_LoanSchedule;
go
  
create function ufn_LoanSchedule
(
   @LoanAmount     numeric(10,2)
  ,@AnnualRate     numeric(10,8)  /* Expressed as percent */
  ,@NumberOfMonths int
  ,@StartDate      datetime
)
returns table
as
return
with InputVariables as
(
  select P,M,R
        ,C=round((P*R)/(1-power(1+R,-M)),2)
  from (select P=@LoanAmount
              ,M=@NumberOfMonths
              ,R=@AnnualRate/12/100) InputData
)
,MonthlyPayments(PmtNo
                ,PmtDate
                ,Balance
                ,Principle
                ,Interest
                ,CumulPrinciple
                ,CumulInterest) as
(
  select N
        ,dateadd(month,datediff(month,'19000101',@StartDate),'19000101')
        ,cast(NewBalance as numeric(10,2))
        ,cast(P-NewBalance as numeric(10,2))
        ,cast(C-(P-NewBalance) as numeric(10,2))
        ,cast(P-NewBalance as numeric(10,2))
        ,cast(C-(P-NewBalance) as numeric(10,2))
  from InputVariables
  cross apply (select N=1) CalcPmtNo
  cross apply (select NewBalance=round(P*power(1+R,N)
                                      -(power(1+R,N)-1)*C/R,2)) CalcNewBalance
  union all
  select N
        ,dateadd(month,1,mp.PmtDate)
        ,cast(NewBalance as numeric(10,2))
        ,cast(mp.Balance-NewBalance as numeric(10,2))
        ,cast(C-(mp.Balance-NewBalance) as numeric(10,2))
        ,cast(mp.CumulPrinciple+mp.Balance-NewBalance as numeric(10,2))
        ,cast(mp.CumulInterest+C-(mp.Balance-NewBalance) as numeric(10,2))
  from MonthlyPayments mp
  cross join InputVariables
  cross apply (select N=mp.PmtNo+1) CalcPmtNo
  cross apply (select NewBalance=case
                                   when N=M
                                   then 0.00  /* Last Payment */
                                   else round(P*power(1+R,N)
                                             -(power(1+R,N)-1)*C/R,2)
                                 end) CalcNewBalance
  where N<=M
)
select PmtNo
      ,PmtDate
      ,PmtAmount=Principle+Interest
      ,Principle
      ,Interest
      ,CumulPrinciple
      ,CumulInterest
      ,Balance
from MonthlyPayments  
;
(This is why I originally created a single one-stop-shopping query in our original stored procedure without the use of local variables… so that I could easily translate it into an in-line TVF).

Now we can call the TVF just as if it were a table, pulling out any columns we wish:

select PmtNo
      ,PmtDate
      ,Principle
      ,Interest
      ,Balance
from dbo.ufn_LoanSchedule(20000,6.8,60,'20091201')
order by PmtNo
/*
PmtNo PmtDate                 Principle Interest  Balance
----- ----------------------- --------- -------- --------
    1 2009-12-01 00:00:00.000    280.81   113.33 19719.19
    2 2010-01-01 00:00:00.000    282.39   111.75 19436.80
    3 2010-02-01 00:00:00.000    284.00   110.14 19152.80
    4 2010-03-01 00:00:00.000    285.61   108.53 18867.19
  ...   ...                        ...      ...      ...
   57 2014-08-01 00:00:00.000    385.33     8.81  1169.11
   58 2014-09-01 00:00:00.000    387.52     6.62   781.59
   59 2014-10-01 00:00:00.000    389.71     4.43   391.88
   60 2014-11-01 00:00:00.000    391.88     2.26     0.00
*/
Look back at the code where we created the TVF. Did you notice something missing in the definition? Two things, actually. I deliberately left out the ORDER BY clause because we may want to order the data in a different way when we call the TVF. But the OPTION clause is also missing that specifies the MAXRECURSION value. T-SQL will not allow it in a query that’s part of a TVF definition. If we were to tack it on to the end of the query and try to CREATE the FUNCTION, we’d get the following error message:

/*
Msg 156, Level 15, State 1, Procedure ufn_LoanSchedule, Line 67
Incorrect syntax near the keyword 'option'.
*/
Since we cannot specify any MAXRECURSION in the TVF definition, it is (unfortunately) up to us to include the OPTION clause in any query that calls the TVF if we know that we are going to breach the minimum level of 100.

So, for example, let’s say we have just bought a quaint little cottage for $500,000. (Don’t scoff… that price is dirt cheap for a humble little abode here in the San Francisco Bay Area). We get a 30-year (360-month) fixed loan with a 5% interest rate, with our first payment due in January 2010. We need to add the OPTION clause to the following query to prevent an error:

select PmtNo
      ,PmtDate
      ,Principle
      ,Interest
      ,Balance
from dbo.ufn_LoanSchedule(500000,5,360,'20100101')
option (maxrecursion 360)
/*
PmtNo PmtDate                 Principle Interest   Balance
----- ----------------------- --------- -------- ---------
    1 2010-01-01 00:00:00.000    600.78  2083.33 499399.22
    2 2010-02-01 00:00:00.000    603.28  2080.83 498795.94
    3 2010-03-01 00:00:00.000    605.79  2078.32 498190.15
    4 2010-04-01 00:00:00.000    608.32  2075.79 497581.83
  ...  ...                         ...      ...       ...
  357 2039-09-01 00:00:00.000   2639.84    44.27   7984.14
  358 2039-10-01 00:00:00.000   2650.84    33.27   5333.30
  359 2039-11-01 00:00:00.000   2661.89    22.22   2671.41
  360 2039-12-01 00:00:00.000   2671.41    12.70      0.00
*/
Anyway, now that we have a TVF, we can get some interesting information from the loan schedule data.

For example, using our mortgage scenario, how much can we write off in mortgage interest on our income taxes in the year 2012?:

select TotalInterest=sum(Interest)
from dbo.ufn_LoanSchedule(500000,5,360,'20100101')
where year(PmtDate)=2012
option (maxrecursion 360)
/*
TotalInterest
-------------
     24058.33
*/
What will our loan balance be at the end of 2015?:

select Balance
from dbo.ufn_LoanSchedule(500000,5,360,'20100101')
where PmtDate='20151201'
order by PmtNo
option (maxrecursion 360)
/*
  Balance
---------
449676.39
*/
How long will it take for us to pay off half of the loan?:

select top 1 PmtNo,PmtDate
from dbo.ufn_LoanSchedule(500000,5,360,'20100101')
where CumulPrinciple>=Balance
order by PmtNo
option (maxrecursion 360)
/*
PmtNo PmtDate
----- -----------------------
  242 2030-02-01 00:00:00.000
*/
(Yikes… That’s the problem with how these loans work. It’ll take over 20 years of the 30-year loan term to pay off half of the principle.)

Finally, we can test out different rates and see how the monthly payment changes and how much total cumulative interest we’ll end up paying for each rate:

select Rate
      ,PmtAmount
      ,CumulInterest
from (select 5.1 union all
      select 5.3 union all
      select 5.5 union all
      select 5.7 union all
      select 5.9) Rates(Rate)
cross apply dbo.ufn_LoanSchedule(500000,Rate,360,'20100101')
where PmtNo=360
option (maxrecursion 360)
/*
Rate PmtAmount CumulInterest
---- --------- -------------
 5.1   2714.75     477310.00
 5.3   2776.52     499547.20
 5.5   2838.95     522022.00
 5.7   2902.00     544720.00
 5.9   2965.68     567644.80
*/
(Don’t you find it disgusting interesting how you pay more in interest over the life of the loan than you do in principle if your rate is a little over 5.3%?).

I hope this article has been helpful not just in showing how to calculate loan schedule data, but also in demonstrating recursive CTE’s and TVF’s and other techniques. I certainly learned something I didn’t expect (the inability to specify a MAXRECURSION value in a TVF) in putting the article together.

I’ll bet that your borrowing future will be brighter now that you have the tools to analyze potential loans. From now on, you’ll be able to make those payments with a smile on your face.

Yeah, right.

PASS Summit 2009 Final Thoughts…or What I’ve Learned From The 2009 Summit

November 20th, 2009 by Kendal Van Dyke

Welcome to PASS Summit Unite 2009 Last week I wrote a day by day recount of my PASS Summit experience (Monday, Tuesday, Wednesday, Thursday, Friday). Since then I've been tagged by The Midnight DBAs™  to talk about what I learned from the Summit…so I'm going to use the opportunity to share a few thoughts that didn't fit into my previous posts.

What worked

  • I rode the Light Link Rail from the airport into downtown for a whopping $2.50. That sure beat spending $40 on a cab!
  • To cut costs I split a room at the Sheraton. That put me right next to the convention center and lowered the amount to a rate acceptable for my company to cover, plus it gave me the chance to build a professional relationship beyond just chatting in the hallway between sessions. Win-win-win.
  • Twitter was a hot thing at the Summit this year. Because of the relationships that I formed on Twitter in the months leading up to the Summit I showed up feeling like I already knew 75 people there. It's a lot easier going to a party with people you know than going by yourself. Twitter also proved itself useful at the Summit in two ways:
    1. Entertainment: Reading the comments tagged with #sqlpass during the keynotes (e.g. Tuesday when the rack of servers were about to lift off and Thursday when Dell was…well, I have no idea what Dell was doing on Thursday. I don't think they did either).
    2. Value: Figuring out where people were at or what the nighttime entertainment of choice was.
  • I knew I wanted to meet as many people as I could so I pre-printed a lot of personal contact cards with URLs for Twitter, LinkedIn, and my personal email address. Exchanging information after a greeting was as easy as handing them my card. Surprisingly there were people who came to the Summit without any business cards to hand out.
  • Flying the redeye home. It gave me Friday to do some sightseeing around Seattle and sleeping made the long flight home easier.

 What I'll do differently next time

  • Bring a travel umbrella. Although I had a waterproof jacket, that only covered the top half. I was fortunate that it didn't rain more.
  • Arrive a day earlier. I flew on Monday which made for a loooong day after all the opening activities were done. It would have been nice to arrive on Sunday and adjust to the time zone difference.
  • I'm not 100% sold on staying at the Sheraton. Sure it's nice to be close by, but it wouldn't have been a big deal to walk an extra couple of blocks for $50 less per night.
  • Bring a real camera. I used the camera on my iPhone and it was OK, but I would have appreciated having zoom and higher quality pictures (especially at the keynotes). Fortunately other people have been kind enough to post their pictures on Flickr.
  • Get better contact cards. I printed mine at Staples at the last minute. They worked, but they could have looked much better if I had pre-ordered them from moo.com instead of leaving it to the day before I arrived.
  • Leave the extension cord at home. I brought a 16' cord with me so I didn't have to camp out for a plug along the outside wall at each session. I didn't even use it once. I either ran off battery power or just put my laptop to sleep and listened to the speaker.

imageOther random thoughts
Initially I complained about the Summit being in Seattle. "It's prohibitive for people on the east coast to come!" I said. However, after going to the Summit I realize that Seattle is a great location for it. Why? Public transportation into downtown and having everything within walking distance. I'd love for the conference to be in my hometown of Orlando, and we certainly have the facilities for it, but if you want to get anywhere here you need to drive. I don't think that having everyone drive away at the end of the day to get to their hotels, go to dinner, etc. facilities networking. Meeting people and getting the chance to build relationships with them is one of the biggest things I got out of the week. I'm not saying there aren't other cities that could host the Summit, just that I can understand why Seattle works well.

So what now?
Now it's back to the real world and time to put all those great things I learned into action! First off, I'm getting the Summit DVDs and sharing them with my coworkers. I got to go to the Summit, now they get to benefit from it. I'll hold lunch and learns every week and let them pick which sessions to watch. Second, seeing the emphasis that Microsoft is placing on BI reinforced to me that I need to learn SSAS and BI, and fast! I've already started identifying ways that my company can benefit from it; now it's time to put my head down and figure this stuff out. Finally, I'm already looking forward to the next Summit in 2010. You can bet that I'm going to do everything I can to make it back next year and make it even better!

Jason Massie, Michelle Ufford, and Aaron Nelson - tag, you're it!

Christian Bolton on SQL Server memory

November 20th, 2009 by Brent Ozar

At #SQLBits, I was quite excited to attend this session because of the presenter.  Christian Bolton of COEO (BlogTwitter) was the head author on our upcoming book SQL Server 2008 Internals and Troubleshooting, and he’s one of the only 8 Microsoft Certified Masters on SQL Server that work outside of Microsoft.

Physical vs Virtual Memory

Physical memory means the fast, volatile storage of the chips themselves.  We’re finding 64GB is very common these days.  If every app tried to access this memory directly, we’d run into performance problems quickly.  That’s why virtual address space came about.  On 32-bit OSs, the default settings leave 2GB for the kernel and 2GB for applications.  On 64-bit OSs, that breaks out to up to 8TB of kernel memory and 8TB of application (user mode) space.

All applications just work with virtual address space first – they don’t know whether they’re working with physical memory or the page file.  The allocation of memory to either chips or the page file is the job of the Virtual Memory Manager.

What Uses SQL’s Virtual Memory

The Buffer Pool is SQL’s main memory consumer, and that’s managed by the min and max memory settings in SQL Server.

Another consumer is VirtualAlloc – which just means anything that doesn’t use the buffer pool.  The heap manager, thread stacks, and multi-page allocations (things that need more than 1 page to store data).  For example, execution plans are normally quite small, but they can exceed one page, and we need to store them in memory.  Linked servers and extended stored procs also use VirtualAlloc to grab memory.

The buffer pool will never be bigger than physical memory.  If you’ve got 32GB of memory and a 64GB page file, your buffer pool won’t be bigger than 32GB.  On an x86 box, this can be problematic since you won’t have more than 2GB of user mode space by default.  Because there’s not much memory, SQL Server has to reserve some memory before the buffer pool grabs hold of the rest, and that’s called memtoleave.  It’s calculated with MemReserved + (NumThreads * StackSize).  By default, MemReserved is 256mb.  NumThreads is the max number of worker threads configured.  StackSize is .5mb on x86, 2mb on x64.  Christian had an excellent visual demo showing how the memory gets used up on different CPUs and memory amounts – I can’t begin to convey that here, but hey, that’s where his great memory chapter in the book comes in.

Christian demoed the RMLUtils to stress test SQL Server and affect the memtoleave space.  He asked how many attendees were using wait stats for performance analysis, and the number was about the same as my presentation’s attendees – around 10%.  Folks just aren’t using wait stats yet.  But if you turn around and ask how many of those 10% use it as their primary troubleshooting tool, it’s usually 100%.  Folks who use them, love them, as I’ve blogged before about wait stats.

SQL Server’s Memory Model

Memory nodes are the lowest level allocator, and you can view info about them in sys.dm_os_memory_nodes.

Memory clerks are the next level, and they’re used to access nodes.  When something in SQL wants memory, they ask the clerks, and the clerks allocate nodes.  There’s a general MEMORYCLERK_SQLGENERAL, and then heavy memory users get their own clerks, like MEMORY_CLERK_SQLBUFFERPOOL and MEMORYCLERK_SQLQUERYPLAN. You can check what they’re using with the DMV sys.dm_os_memory_clerks.

He talked about the different kinds of caches, and you can query them in sys.dm_os_memory_cache_counters.  He touched on how the plan cache has hard-coded maximum sizes based on your memory size – you don’t want plans running you out of buffer cache.  When you throw in the buffer pool and query memory too (for joins/hashes), it’s easy to see how SQL Server really needs all the memory it can possibly get.

Christian tied it all together by showing a slick DMV query that breaks out cache space usage by database.  I glanced through his blog but I couldn’t find it – I’ll hit him up for that later.

Best Practice: Lock Pages in Memory

Locking pages in memory makes sure SQL Server’s memory is not paged out to disk.  Otherwise, Windows may push it out to disk – often with bad drivers.  The first thing to do when running into SQL’s memory getting swapped out is to check for updated drivers, then tweak down SQL’s max memory setting, then use AWE/lock pages in memory, then consider upgrading to Windows 2008.  Even if you have bad drivers, Win 2008 won’t be as aggressive with trimming SQL Server’s address space.  Memory allocated using the AWE mechanisms can’t be trimmed.  Should you lock pages in memory on 64-bit systems?  Yes, because it locks pages in the working set.  Only the data cache is locked, though – other memory like the plan cache can still be trimmed.

You can lock pages in memory on Standard Edition, but you need to get on the right patch levels – see this knowledge base article for details.

Christian asked how many DBAs set the max server memory, and I was surprised that the vast majority of DBAs haven’t.  I’m a big cheerleader for setting that.  Glenn Berry recently wrote a set of guidelines for max server memory.  Christian recommended leaving 2GB for the OS, plus the memory for the worker threads we calculated earlier, plus 1GB for the multi-page allocations/linked servers/etc, plus 1-3GB for other applications (if necessary.)  This is a worst-case-scenario guideline because we just want to make sure our server isn’t paging to disk.

Bottom line – I learned a lot from this session, and this one alone was worth the price of admission!

Christian Bolton on SQL Server memory

November 20th, 2009 by Brent Ozar

At #SQLBits, I was quite excited to attend this session because of the presenter.  Christian Bolton of COEO (BlogTwitter) was the head author on our upcoming book SQL Server 2008 Internals and Troubleshooting, and he’s one of the only 8 Microsoft Certified Masters on SQL Server that work outside of Microsoft.

Physical vs Virtual Memory

Physical memory means the fast, volatile storage of the chips themselves.  We’re finding 64GB is very common these days.  If every app tried to access this memory directly, we’d run into performance problems quickly.  That’s why virtual address space came about.  On 32-bit OSs, the default settings leave 2GB for the kernel and 2GB for applications.  On 64-bit OSs, that breaks out to up to 8TB of kernel memory and 8TB of application (user mode) space.

All applications just work with virtual address space first – they don’t know whether they’re working with physical memory or the page file.  The allocation of memory to either chips or the page file is the job of the Virtual Memory Manager.

What Uses SQL’s Virtual Memory

The Buffer Pool is SQL’s main memory consumer, and that’s managed by the min and max memory settings in SQL Server.

Another consumer is VirtualAlloc – which just means anything that doesn’t use the buffer pool.  The heap manager, thread stacks, and multi-page allocations (things that need more than 1 page to store data).  For example, execution plans are normally quite small, but they can exceed one page, and we need to store them in memory.  Linked servers and extended stored procs also use VirtualAlloc to grab memory.

The buffer pool will never be bigger than physical memory.  If you’ve got 32GB of memory and a 64GB page file, your buffer pool won’t be bigger than 32GB.  On an x86 box, this can be problematic since you won’t have more than 2GB of user mode space by default.  Because there’s not much memory, SQL Server has to reserve some memory before the buffer pool grabs hold of the rest, and that’s called memtoleave.  It’s calculated with MemReserved + (NumThreads * StackSize).  By default, MemReserved is 256mb.  NumThreads is the max number of worker threads configured.  StackSize is .5mb on x86, 2mb on x64.  Christian had an excellent visual demo showing how the memory gets used up on different CPUs and memory amounts – I can’t begin to convey that here, but hey, that’s where his great memory chapter in the book comes in.

Christian demoed the RMLUtils to stress test SQL Server and affect the memtoleave space.  He asked how many attendees were using wait stats for performance analysis, and the number was about the same as my presentation’s attendees – around 10%.  Folks just aren’t using wait stats yet.  But if you turn around and ask how many of those 10% use it as their primary troubleshooting tool, it’s usually 100%.  Folks who use them, love them, as I’ve blogged before about wait stats.

SQL Server’s Memory Model

Memory nodes are the lowest level allocator, and you can view info about them in sys.dm_os_memory_nodes.

Memory clerks are the next level, and they’re used to access nodes.  When something in SQL wants memory, they ask the clerks, and the clerks allocate nodes.  There’s a general MEMORYCLERK_SQLGENERAL, and then heavy memory users get their own clerks, like MEMORY_CLERK_SQLBUFFERPOOL and MEMORYCLERK_SQLQUERYPLAN. You can check what they’re using with the DMV sys.dm_os_memory_clerks.

He talked about the different kinds of caches, and you can query them in sys.dm_os_memory_cache_counters.  He touched on how the plan cache has hard-coded maximum sizes based on your memory size – you don’t want plans running you out of buffer cache.  When you throw in the buffer pool and query memory too (for joins/hashes), it’s easy to see how SQL Server really needs all the memory it can possibly get.

Christian tied it all together by showing a slick DMV query that breaks out cache space usage by database.  I glanced through his blog but I couldn’t find it – I’ll hit him up for that later.

Best Practice: Lock Pages in Memory

Locking pages in memory makes sure SQL Server’s memory is not paged out to disk.  Otherwise, Windows may push it out to disk – often with bad drivers.  The first thing to do when running into SQL’s memory getting swapped out is to check for updated drivers, then tweak down SQL’s max memory setting, then use AWE/lock pages in memory, then consider upgrading to Windows 2008.  Even if you have bad drivers, Win 2008 won’t be as aggressive with trimming SQL Server’s address space.  Memory allocated using the AWE mechanisms can’t be trimmed.  Should you lock pages in memory on 64-bit systems?  Yes, because it locks pages in the working set.  Only the data cache is locked, though – other memory like the plan cache can still be trimmed.

You can lock pages in memory on Standard Edition, but you need to get on the right patch levels – see this knowledge base article for details.

Christian asked how many DBAs set the max server memory, and I was surprised that the vast majority of DBAs haven’t.  I’m a big cheerleader for setting that.  Glenn Berry recently wrote a set of guidelines for max server memory.  Christian recommended leaving 2GB for the OS, plus the memory for the worker threads we calculated earlier, plus 1GB for the multi-page allocations/linked servers/etc, plus 1-3GB for other applications (if necessary.)  This is a worst-case-scenario guideline because we just want to make sure our server isn’t paging to disk.

Bottom line – I learned a lot from this session, and this one alone was worth the price of admission!

Donald Farmer #SQLBits keynote on PowerPivot

November 20th, 2009 by Brent Ozar

At SQLbits, Donald Farmer (@DonaldDotFarmer) talked about IT in terms of data, and did it with an interesting picture story from the early 20th century.  Data today isn’t all that different from the messy desktops and clean library card catalogs of the last century.

Donald Farmer onstage at SQLbits

Donald Farmer onstage at SQLbits

IT professionals still break into the same 3 roles that existed a century ago:

  • Acquiring and storing data
  • Validating the data quality
  • Managing calculations on the data

Our roles, though, conflict with those of analysts – power users who want to do this work themselves.

Letting People Build Something Cool

Analysts do their slicing and dicing in our “approved” tools, but when they bang up against the ceilings of our capabilities, what do they do?  Universally, they export to Excel, and then start doing crazy things in this radically familiar environment.

To build the future of BI, Microsoft is delivering PowerPivot as an optional add-in for Excel 2010.  Excel 2010 installs side-by-side with older versions, so users can install it without violating corporate policies on using older Excels.  Farmer demoed PowerPivot with a >100 million row table using his laptop, sorting & dicing it in real time.  This demo still hasn’t gotten old for me yet – I love seeing that.  Since PowerPivot is a column-oriented data storage mechanism, it performs really well in memory.  It still takes a long time to get 100 million rows into PowerPivot (over the network, for example) but once it’s there, it flies.

PowerPivot data is read-only, though – once it’s been pulled in and compressed into column-based storage, it’s pretty much stuck.  If you need to make modifications, the easiest way is to use linked tables, and modify the linked ones.  Farmer demoed how to create relationships with those tables, and how PowerPivot helps detect which relationships don’t violate integrity rules.

To get that data into PowerPivot, one easy way is to use RSS. Business users can click the RSS icon on new SSRS reports, and Excel will open the feed to create a PowerPivot document.  From there, users can easily do their own slicing & dicing.

When we look at Excel spreadsheets from analysts, often the front couple of sheets are nicely formatted reports.  The rest of the sheets are various intermediate pivot tables and aggregates hitting other sheets of source tabs.  Excel’s new formulas in PowerPivot help you get those well-formatted aggregates faster with less intermediate steps.

I Made Something Cool! Now What?

Once users have built something cool in Excel, they didn’t have a solid way to share that with multiple users.  Farmer says SharePoint 2010 is like the ultimate file share for Excel because:

  • IT knows the files are backed up
  • IT knows who’s building the reports
  • IT knows who’s accessing the files (which means we can tell which reports we may need to take over)
  • People can consume the data without actually having PowerPivot or even Excel – it just works in the web

PowerPivot spreadsheets with slicers look fantastic in SharePoint.  I can envision people building their own report portals without BI team help.  There’s still a question of where the data’s coming from and how accurate it is, but that’s the case no matter how reports are delivered anyway.

Farmer says that this new self-service BI doesn’t interfere with real BI teams – you still need real BI in order to get analytics.  The self-service BI delivers the answers to the “background noise” of analytics requests – small, quick-hit requests that never end.  Farmer says self-service BI isn’t for the big projects, and the big tools aren’t right for self-servie needs either.

To try out PowerPivot, hit:

PHOTOS From MVP Deep Dives Book Launch!!

November 20th, 2009 by Jen McCown

At long, long, long last, we have the pix up from the MVP Deep Dives book lauch at SQLPASS!!

For those who haven't heard:

"Responding to BillG’s MVP Summit challenge to “Do philanthropy where you are,” The SQL Server MVP Deep Dives book is a collaboration of 53 MVPs sharing their expertise and passion for SQL Server. This is an all-volunteer book. All author proceeds are going to WarChild.org – an organization that helps children traumatized by war. Because this is a book for charity, Manning Publications wanted to also donate and gave us a higher than normal royalty. In addition, if you purchase the book through this link: www.SQLServerMVPDeepDives.com then the purchase will also count toward Warchild's Manning affiliate account and Warchild will receive an extra 10% of the purchase." -Paul Nielsen

The book launch was held at the PASS Community Summit this month, and I got to be the unofficial-official photographer for the event.  Paul Nielsen and Kahlen DeLaney gave short intro speeches, then all the MVPs at the Summit who'd contributed to the book lined up and signed Deep Dives copies for attendees.  There were also copies for industry notables, including one for BillG himself (too bad the man didn't show & stand in line, or it could've been an even bigger event :)

I just got my copy in this week, and I can't wait for some time to dive in (so to speak).  Huge thanks and big love to Paul, Kahlen, and the other 51 MVPs who took time for this worthy project. 

I'll also put the link up on the MidnightDBA.com SQLPASS09 events page this weekend, for your reference and enjoyment. Here's the tinyurl so you can tweet and retweet away:  http://tinyurl.com/ybtwesp

-Jen McCown, http://www.MidnightDBA.com

SQLAuthority News – Book Review – Expert SQL Server 2008 Encryption by Michael Coles

November 19th, 2009 by Pinal Dave

Expert SQL Server 2008 Encryption (Paperback)
Michael Coles (Author), Rodney Landrum (Author)


Link to Amazon

“What is your opinion on encryption? What I mean is: In a world filled with data, how do you see encryption?” This is the precise question Michael Coles posed to me on March 3rd of this year, while we were heading to Starbucks in Seattle. We were both attending the Microsoft MVP Summit there.

In the information era, security has become one of the most vital aspects of life. Although the topic may seem a little mundane, its importance cannot be overemphasized. It is the pillar of the information age and I shudder to think where we would be without it. We don’t leave our houses unlocked and risk thieves or opportunists taking off with our valuables. We also often take precautions, not only to preserve the precious, but also to avoid the sheer hassle of replacement and misuse. So too it should be with our information.

Encryption’s roots are extremely old and it has resolved numerous security problems over the years. In days gone by, couriers were entrusted with letters sealed with a royal wax stamp. If on delivery, the seal was broken, it was obvious that there had been a security breach. This very concept evolved as CRC checksum and developed into a complex algorithm.  While CRC checksum alerted the end user to the fact that content had been modified, its limitation was that it allowed manipulation of the content to occur in the first place. With encryption, only the authenticated owner can access and modify content.

In response to Michael’s question, I began to tell him what I knew about public and private keys. He looked at me doubtfully and asked me directly if I had ever used encryption in my career. My reluctant answer to this was “No”. He strongly suggested that I not underestimate its capabilities and explore its possibilities. I took his advice and have since implemented encryption for many of my clients, who are now far safer from unauthorized access to data.

To be very honest, in my experience, not many people know much about the subject beyond a little about public and private keys.  You do not often find experts discussing symmetric and asymmetric keys, which are just the tip of the iceberg. SQL Server has come a long way with regard to security. Encryption has taken on a whole new meaning in SQL Server 2008. There are many new features such as Extensible Key Management, Transparent Data Encryption, not to mention the pre-2008 ones such as cryptographic hashing, SQL CLR and many more. In performance terms, these are great enhancements.

The one exception is Transparent Data Encryption, where the whole database is encrypted. This can considerably reduce performance if the SQL Server box is not sufficiently powerful. It this is the case, it is a good time to offload all the encryption and decryption to third-party hardware. SQL Server allows third-party management of encryption and decryption through Extensible Key Management.

Extending the earlier courier analogy, consider the fact that even if our letter is secure and safe in our hands, as soon as we hand it to the courier it is exposed to risk and can be compromised. SQL Server 2008 has many new features, which secure data while it is being communicated between applications. A number of features were introduced that check whether data is manipulated during transmission.

Data is everywhere and taking in terms of Terra Bytes (TB) is the current reality. When a large amount of data needs to be handled, there are two major challenges. The first challenge is the actual encryption process and the resources needed to perform it. The second challenge is how to use the data once it has been encrypted.  In a regular database searching through TBs of data can take a very long time. Imagine how long this could take in and encrypted database?

In Seattle, Michael and I discussed these challenges and a few more subjects. The discussion lasted more than four hours. I have always known Michael to be an excellent author. He is renowned in the industry for his expertise of XML and Full Text Search. To my mind, he is an expert who has the extraordinary ability of relating complex concepts in simple terms. No matter how long, boring or complicated the topic, his delivery is always sweet, like chocolate that melts in the mouth.

Michael always addresses uncommon subjects. Perhaps his experience as a Sergeant in the army has given him the spirit to explore the unexplored.  I have never before encountered a single book on the subject of encryption for SQL Server and Michael’s will a “first”.

I recently had the pleasure of reading it and especially like the manner in which he and his co-author, Rodney, explain the significance of encryption. While many of the concepts covered are domain-specific, quite a few topics are common to all and the appendix is a “must read” for anyone planning a security strategy. One thing that really makes this book special is the fact that each module is written independently and you can find solutions by simply reading the relevant one.

I am a hands-on developer and only like books that have a lot of workable examples. With the exception of the first chapter, the book is filled with examples and hands-on experiments. The first chapter in itself is quite unique, as it not only provides a introduction to encryption, but also the very interesting history of encryption.  Even non-technical readers will enjoy this.

Summary:
It is my great pleasure to welcome this one-of-a- kind book to the SQL Server world. There is no doubt that this book is exceptional and will inspire anyone one who is ready to take their current security mechanism to the next level using encryption.

Stars : 5 Stars

Reference: Pinal Dave (http://blog.sqlauthority.com)

Posted in Pinal Dave, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority Book Review, SQLAuthority News, SQLServer, T SQL, Technology Tagged: SQL Server Encryption

EDMPASS Meeting Next Week

November 19th, 2009 by Thomas LaRock

Next Wednesday night, November 25th, I will be taking part in a panel discussion hosted by the Edmonton PASS Chapter. Yeah, Edmonton. Alberta. Canada. America’s Hat. Just slightly West and North of Moose Jaw. But still south of Nunavut. Bonus points if you have even heard of that place.

All that is my way of saying I will be participating remotely. So, if you were making travel arrangements to see me in Edmonton you will be disappointed to only hear my voice. I will, however, show up in Edmonton as soon as Colin Stasiuk (blog | @BenchmarkIT) comes through with the Bruins tickets he has been promising me for years.

So, if you would like to hear the sound of my voice, regardless of your travel plans, then you just need to click on this link: http://www.eventbrite.com/event/485596432

If you are craving more details, you can find them at http://benchmarkitconsulting.com/colin-stasiuk/2009/11/18/edmpass-meeting-one-week-from-today/

This event marks the first time I will be presenting as a remote speaker. Not sure if that qualifies as something worthy of padding the resume, but I’ll do that anyway. I am looking forward to the discussion which will be focusing on PASS and how we are able to stuff ten pounds of liquid awesome into a five pound sack every year. If one person leaves this event and does not immediately register for the 2010 Summit then I will have failed. Let’s not let that happen, shall we?

Colin says that pizza will be provided, but I am guessing that I will need to bring my own. That’s fine, but I don’t have any details of the LiveMeeting yet so I don’t know if people will be able to see me eat. I can always UStream that, no problem. You’re welcome.