A PASSionate Community

August 31st, 2010 by Joe Webb

Much has been voiced about the Professional Association for SQL Server (PASS) Nominating Committee’s decision to not put Steve Jones (blog, twitter) on the ballot for this year’s Board of Directors election. I’ve watched this controversy unfold with great interest, biting my tongue to keep from making rash or reactionary comments one way or the other.

I’m currently working on a guest blog post for a series that Kevin Kline (blogtwitter) is starting on the election process. In the coming days my guest post will appear on my site and on Kevin’s blog. I’m hopeful that Kevin’s series will help to provide some good, creative, and perhaps even actionable discussion around the PASS election process.

In the meantime, there has been one aspect of the brouhaha that I’d like to call attention to that may not be immediately obvious – the passion we all have for the community.

Aw, Come On Man!

It’s been said that the opposite of love is not hate; it’s apathy. Love and hate are strong emotions. If you love someone or something, you do it with a passion. Likewise if you hate something, you have a certain fervor about it. Whether there is love or hate, strong emotions abound and you care deeply about it.

Where there’s apathy, though, there is a lack of caring, a lack of passion or fervor. There’s an emptiness and the once loved/hated object ceases to have relevance in your life. It’s a sad state, apathy.

If nothing else, the latest PASS controversy has proven that people in the community have deep-seated emotions about PASS. It’s shown that PASS is a relevant and important player in the SQL community.

So in that respect, I’m glad that this election debate has stirred emotions and passion in most people in the community. If it hadn’t, I’d have been really worried about PASS and the community as a whole.

Where Do We Go From Here?

Now let’s see if we can turn this into a positive experience that, while difficult to go through, makes PASS and the community a stronger and more vibrant place.

Look for a post with my opinion on the election process soon. After that, I’ll get back to the “So I Got Promoted, Now What?” series.


Filed under: Community, SQLServerPedia Syndication

Hartford SQL Server User Group Meeting Tomorrow

August 31st, 2010 by Thomas LaRock

Sorry for the late post on this, but better late than never, right?

Tomorrow at noon (ET) will be the second meeting for the Hartford SQL Server User Group (HSSUG). That’s right, we meet during the day because we prefer to be home at night instead of, well, at a user group meeting. Weird, I know. And I bet some people would rather we met at night because they can’t get the time off from work but we know we can’t please everyone. So we decided to please ourselves first and worry about others later.

Anyway, the pizza starts flowing at noon and the first speaker takes the stage at about 1PM. And who do we have lined up for that slot?

Installing SQL Server 2008 on a Windows Server 2008 R2 Failover Cluster

Abstract: As DBAs we get to work on a lot of the features of the SQL Server product, but it is not uncommon for us to spend our entire careers without having a chance to work closely with clustering.  There are a lot of DBAs who are otherwise perfectly qualified in all respects for attractive job openings that absolutely require clustering experience. Chuck Boyce will walk through the basics of installing SQL Server 2008 on a Windows 2008 R2 failover cluster and show you some ways that you can obtain this skill on your own using a tool called VMLite.

Bio: Chuck Boyce is a Database Administrator who has worked for several multi billion dollar companies on the US East Coast. He began working with database technologies in 1998, working with Sybase on Unix platforms. Chuck began working with SQL Server in 2000.   Chuck has been awarded the Microsoft MVP award three times – twice for SQL Server. He is the former Executive Chair of the Professional Association for SQL Server (PASS) Chapter Program. He founded the Philadelphia PASS chapter and has served as both Track Lead and Co-Chair of the Philly Code Camp. He participates regularly in the SQL Server community on twitter and can be found there most days at http://twitter.com/chuckboycejr

After Chuck finishes up we will have a brief intermission to let people stretch their legs and their minds. We will get started up again after we all sit down and listen to our sponsor talk for about ten minutes. Well, everyone has to sit down except for me, because I’ll be the one speaking. I’ll also be handing out extra tickets for people that (1) attend my talk and (2) are active participants. What are the tickets for? Great question.

The door prize for tomorrow is my one remaining MSDN Ultimate subscription. It is a one year subscription that has a potential value of just under $12k. That’s right, one lucky attendee tomorrow is going to win an MSDN subscription.

After my talk is done we will have the second session of the day:

PowerPivot Demonstration and Discussion

Abstract: PowerPivot is a data analysis add-in for Microsoft Excel.  In this session we will discuss and demo the new SQL Server 2008 R2 PowerPivot feature.

Bio: Jon Gore is a BI Technical specialist from Microsoft Corporation.

After Jon is done we will hold the drawing for the prize and yes, you will need to be there to win.

Interested in attending? Go to the website to register otherwise bring your own lunch because we won’t be ordering for you unless we know you are coming. And by “register” I mean “click to join HSSUG”, so we can send you an invite to the meeting. We aren’t fancy technical here folks, just two guys that like to meet twice a year in a secluded spot for a few hours at a time.

SELECT, Deconstructed

August 31st, 2010 by Jen McCown

Today let’s expand on the logical processing order of SELECT that I mentioned in last week’s N Things Worth Knowing About SELECT blog.

We’re looking at the SELECT statement clauses in the order that the SQL engine logically processes them…we’ll even write it that way – it’ll look weird, but we’ll be reading it like the SQL engine does.

You can find more on SELECT’s logical processing order in BOL, and in Itzik Ben-Gan’s T-SQL Fundamentals and T-SQL Programming books.

Here’s our basic T-SQL query (using AdventureWorks):

SELECT P.Name ,
  P.ProductNumber ,
  P.Color ,
  P.StandardCost ,
  SC.Name [Category]
FROM Production.Product P
LEFT OUTER JOIN Production.ProductSubCategory SC ON P.ProductSubCategoryID = SC.ProductSubCategoryID
WHERE P.ProductSubCategoryID IS NOT NULL

If we get all pseudo-mathematical on this, we can say that there’s some theoretical set of rows that we’ll get back out of this query; each logical step – in order – further refines that set of rows until we get back the actual rowset we want.

FROM

The SQL engine starts with the FROM clause first, to see where the data’s coming from. In this case we’re pulling from the Product table…easy. So, the theoretical rowset right now is everything in the Product table. (You can think about each step passing on that theoretical rowset to the next step for further refinement.)

-- Logical order:

FROM Production.Product P

ON/JOIN

Next we look at the ON and JOIN clauses: JOIN tells us what data structure (remember, it doesn’t have to be a table) we’re hooking up with, and ON give us the criteria for hooking our result sets together. We’re still building our theoretical resultset here; a JOIN can either expand the resultset, or limit it.

-- Logical order:

FROM Production.Product P
ON P.ProductSubCategoryID = SC.ProductSubCategoryID
LEFT OUTER JOIN Production.ProductSubCategory SC

Notice that the ON clause is evaluated before the WHERE clause; so we have to understand that where we put a limiting clause (like Color=’black’) can make a huge difference in our resultset. In this case, including the search condition “color=’black’” in the ON clause makes no difference, because it’s a LEFT OUTER JOIN…it’s going to return all the rows from the left side – black, red, blue, etc. – regardless of the join condition. Go on, try it:

SELECT P.Name ,
  P.ProductNumber ,
  P.Color ,
  P.StandardCost ,
  SC.Name [Category]
FROM Production.Product P
LEFT OUTER JOIN Production.ProductSubCategory SC ON P.ProductSubCategoryID = SC.ProductSubCategoryID
AND P.color='black'
WHERE P.ProductSubCategoryID IS NOT NULL

Toldya. In the ON clause, we’re comparing resultsets to make a match in a specific way. In the WHERE clause, we’re limiting the resultset that we got. Now, if this were an inner join, we could include the color=’black’ in the ON clause; it would limit the resultset just as if we’d said it IN WHERE. But I’m still more likely to put that condition in the WHERE clause; it’s more “proper”, in the sense that if that query ever needs to change to an outer join, the condition belongs in WHERE.

WHERE

Speaking of where…this is clearly the place where we’d want to whittle our resultset down to just the rows that we really want.

-- Logical order:

FROM Production.Product P
ON P.ProductSubCategoryID = SC.ProductSubCategoryID
LEFT OUTER JOIN Production.ProductSubCategory SC
WHERE P.ProductSubCategoryID IS NOT NULL AND Color='black'

In this statement, it’s the last refinement we make to our resultset, so next the SQL engine would evaluate the SELECT clause to see what columns to pull, and our logically-ordered query would look like this:

-- Logical order:

FROM Production.Product P
ON P.ProductSubCategoryID = SC.ProductSubCategoryID
LEFT OUTER JOIN Production.ProductSubCategory SC
WHERE P.ProductSubCategoryID IS NOT NULL AND Color='black'
SELECT P.Name ,
  P.ProductNumber ,
  P.Color ,
  P.StandardCost ,
  SC.Name [Category]

But, there’s a lot more we can do in a SELECT statement, so let’s continue with a more complex query that includes GROUP BY, HAVING, and ORDER BY:

SELECT P.Color ,
  COUNT(*) ItemCount,
  MIN(P.StandardCost) MinCost ,
  MAX(P.StandardCost) MaxCost ,
  AVG(P.StandardCost) AvgCost ,
  SC.Name [Category]
FROM Production.Product P
LEFT OUTER JOIN Production.ProductSubCategory SC ON P.ProductSubCategoryID = SC.ProductSubCategoryID
WHERE P.ProductSubCategoryID IS NOT NULL
AND P.color='black'
AND AVG(P.StandardCost) > 100
GROUP BY SC.Name, P.color
HAVING AvgCost > 100

GROUP BY

After SQL gets the base rowset down (by evaluating FROM, JOIN/ON, and WHERE), it looks at the GROUP clause to see if we’re going to lump some rows together. Notice, by the way, that we STILL haven’t looked at the actual SELECT clause:

-- Logical order:

FROM Production.Product P
ON P.ProductSubCategoryID = SC.ProductSubCategoryID
LEFT OUTER JOIN Production.ProductSubCategory SC
WHERE P.ProductSubCategoryID IS NOT NULL AND Color='black'
GROUP BY SC.Name, P.color

Today we want some aggregate data about each category, so we’ll GROUP BY the category name and color – when grouping, anything that’s not an aggregate (SUM, MIN, MAX, COUNT, etc) in the select list, MUST to be in the GROUP BY clause.

HAVING

A quick bit of trivia: the HAVING clause doesn’t require a GROUP BY clause; in that case, it just behaves like a WHERE. Oh, and speaking of WHERE…why couldn’t we just put the “AND AVG(P.StandardCost) > 100″ in the WHERE clause? Isn’t it a limiting factor, just like Color and subcategoryID?

Well, yeah it is, but remember: SQL evaluated WHERE, then GROUP BY (the aggregation clause), and THEN the HAVING clause. The aggregate isn’t allowed in the WHERE clause, because at that point, SQL hadn’t heard anything about grouping data…all the rows were still autonomous. So, aggregates go in HAVING:

-- Logical order:

FROM Production.Product P
ON P.ProductSubCategoryID = SC.ProductSubCategoryID
LEFT OUTER JOIN Production.ProductSubCategory SC
WHERE P.ProductSubCategoryID IS NOT NULL AND Color='black'
GROUP BY SC.Name, P.color
HAVING AvgCost > 100

SELECT

After HAVING, SQL finally takes a look at the SELECT clause itself, and our theoretical resultset becomes an actual resultset.

-- Logical order:

FROM Production.Product P
ON P.ProductSubCategoryID = SC.ProductSubCategoryID
LEFT OUTER JOIN Production.ProductSubCategory SC
WHERE P.ProductSubCategoryID IS NOT NULL AND Color='black'
GROUP BY SC.Name, P.color
HAVING AvgCost > 100
SELECT P.Color ,
COUNT(*) ItemCount,
MIN(P.StandardCost) MinCost ,
MAX(P.StandardCost) MaxCost ,
AVG(P.StandardCost) AvgCost ,
SC.Name [Category]

DISTINCT, ORDER BY, TOP

From here, SQL will process DISTINCT (if we included it), then ORDER BY (if we included it), then TOP (ditto). Let’s go ahead and include an ORDER BY:

-- Logical order:

FROM Production.Product P
ON P.ProductSubCategoryID = SC.ProductSubCategoryID
LEFT OUTER JOIN Production.ProductSubCategory SC
WHERE P.ProductSubCategoryID IS NOT NULL AND Color='black'
GROUP BY SC.Name, P.color
HAVING AvgCost > 100
SELECT P.Color ,
  COUNT(*) ItemCount,
  MIN(P.StandardCost) MinCost ,
  MAX(P.StandardCost) MaxCost ,
  AVG(P.StandardCost) AvgCost ,
  SC.Name [Category]
ORDER BY AvgCost

Here’s a big mystery revealed: THIS is why the ORDER BY clause is the ONLY clause that can refer to column aliases: ORDER BY is the only full clause that’s evaluated after the aliases are! Put it this way: we read the query top down, while SQL saves the top for nearly last. When you shout an alias name at it in your WHERE, SQL has NO IDEA what on earth you’re talking about.

And….there you go. Happy days,

Jen

http://www.MidnightDBA.com/Jen

5 things I want to change while on the PASS Board

August 31st, 2010 by Allen Kinsel

I've been doing a lot of thinking over the last 3 or 4 months about «why» I wanted to run for the PASS Board.  I think I have finally come to the point where I can distill it down into a list of 5 specific things that I will work to make happen while on the PASS Board

5. Pass Summit location.  I believe that we can reach more of our membership by moving the location of the annual summit.  I like the idea of a 3 year rotation in Seattle.

4. Transparency.  I plan to continue the push for as much release of information as the community wants.  Particular things I'd like to see: full disclosure of individual BOD votes, more exposure of the Executive Committee «election process»

3. Better Chapter Support.  We need to do a better job at giving our chapter leaders what they need, some of the most frequently requested things on the chapter «needs» list could be completed more easily than many think if I can succeed with item #1 on this list.  I don't necessarily want to «own» chapter support but, I think I can provide help to this group, especially where it lends itself to my strength in PASS : speaker management.

2. Professionalism.  I want our association to become more professional.  That is to say, to begin offering more than education as the primary benefit to membership.  I think there should be more benefit to being a PASS member than education & networking .

1. Community Involvement.  I want to get our volunteers engaged in as many processes and tasks as possible.  I've done what I consider to be a pretty good job at getting as many people involved in the program committee processes as reasonably possible, Id like to  expand that same involvement into all areas of PASS.  I wrote a lengthy reply on the PASS election forum about this subject, let me know what you think.

 

If you believe in these changes and you like how I communicate them, consider voting for me. I'm running for the PASS Board of Directors, and I need your help to make a difference.  Click here to read about why I'm running.

Related posts:
What I'm doing to evolve the Program Committee next
The long evolution of the PASS Program Committee
More on PASS Summit Community Choice sessions & a general PASS update
Developing Better SQL Speakers
I received my diagnosis

Replication Gotcha – Including An Article In Multiple Publications

August 31st, 2010 by Kendal Van Dyke

When administering replication topologies it's common to group articles into publications based on roles that subscribers fulfill. Often you'll have multiple subscriber roles and therefore multiple publications, and in some cases a subset of articles are common between them. There's nothing to prevent you from adding the same article to more than one publication but I wanted to point out how this can potentially lead to major performance problems with replication.

Let's start with a sample table:

CREATE TABLE [dbo].[ReplDemo]
   
(
     
[ReplDemoID] [int] IDENTITY(1, 1) NOT FOR REPLICATION
                        
NOT NULL ,
     
[SomeValue] [varchar](20) NOT NULL ,
     
CONSTRAINT [PK_ReplDemo] PRIMARY KEY CLUSTERED ( [ReplDemoID] ASC )
       
ON [PRIMARY]
   
)
ON  [PRIMARY]
GO

Now let's pretend that we need this table replicated to two subscribers which have different roles. We'll create one publication for each role and add the table to both publications:

-- Adding the transactional publication
EXEC sp_addpublication @publication = N'ReplDemo Publication A',
   
@description = N'Publication to demonstrate behavior when same article is in multiple publications',
   
@sync_method = N'concurrent', @retention = 0, @allow_push = N'true',
   
@allow_pull = N'true', @allow_anonymous = N'false',
   
@enabled_for_internet = N'false', @snapshot_in_defaultfolder = N'true',
   
@compress_snapshot = N'false', @ftp_port = 21, @ftp_login = N'anonymous',
   
@allow_subscription_copy = N'false', @add_to_active_directory = N'false',
   
@repl_freq = N'continuous', @status = N'active',
   
@independent_agent = N'true', @immediate_sync = N'false',
   
@allow_sync_tran = N'false', @autogen_sync_procs = N'false',
   
@allow_queued_tran = N'false', @allow_dts = N'false', @replicate_ddl = 1,
   
@allow_initialize_from_backup = N'false', @enabled_for_p2p = N'false',
   
@enabled_for_het_sub = N'false'
GO
EXEC sp_addpublication_snapshot @publication = N'ReplDemo Publication A',
   
@frequency_type = 1, @frequency_interval = 0,
   
@frequency_relative_interval = 0, @frequency_recurrence_factor = 0,
   
@frequency_subday = 0, @frequency_subday_interval = 0,
   
@active_start_time_of_day = 0, @active_end_time_of_day = 235959,
   
@active_start_date = 0, @active_end_date = 0, @job_login = NULL,
   
@job_password = NULL, @publisher_security_mode = 1
GO
-- Adding the transactional articles
EXEC sp_addarticle @publication = N'ReplDemo Publication A',
   
@article = N'ReplDemo', @source_owner = N'dbo',
   
@source_object = N'ReplDemo', @type = N'logbased', @description = N'',
   
@creation_script = N'', @pre_creation_cmd = N'drop',
   
@schema_option = 0x00000000080350DF,
   
@identityrangemanagementoption = N'manual',
   
@destination_table = N'ReplDemo', @destination_owner = N'dbo', @status = 8,
   
@vertical_partition = N'false',
   
@ins_cmd = N'CALL [dbo].[sp_MSins_dboReplDemo]',
   
@del_cmd = N'CALL [dbo].[sp_MSdel_dboReplDemo]',
   
@upd_cmd = N'SCALL [dbo].[sp_MSupd_dboReplDemo]'
GO

-- Adding the transactional publication
EXEC sp_addpublication @publication = N'ReplDemo Publication B',
   
@description = N'Publication to demonstrate behavior when same article is in multiple publications',
   
@sync_method = N'concurrent', @retention = 0, @allow_push = N'true',
   
@allow_pull = N'true', @allow_anonymous = N'false',
   
@enabled_for_internet = N'false', @snapshot_in_defaultfolder = N'true',
   
@compress_snapshot = N'false', @ftp_port = 21, @ftp_login = N'anonymous',
   
@allow_subscription_copy = N'false', @add_to_active_directory = N'false',
   
@repl_freq = N'continuous', @status = N'active',
   
@independent_agent = N'true', @immediate_sync = N'false',
   
@allow_sync_tran = N'false', @autogen_sync_procs = N'false',
   
@allow_queued_tran = N'false', @allow_dts = N'false', @replicate_ddl = 1,
   
@allow_initialize_from_backup = N'false', @enabled_for_p2p = N'false',
   
@enabled_for_het_sub = N'false'
GO
EXEC sp_addpublication_snapshot @publication = N'ReplDemo Publication B',
   
@frequency_type = 1, @frequency_interval = 0,
   
@frequency_relative_interval = 0, @frequency_recurrence_factor = 0,
   
@frequency_subday = 0, @frequency_subday_interval = 0,
   
@active_start_time_of_day = 0, @active_end_time_of_day = 235959,
   
@active_start_date = 0, @active_end_date = 0, @job_login = NULL,
   
@job_password = NULL, @publisher_security_mode = 1
GO
-- Adding the transactional articles
EXEC sp_addarticle @publication = N'ReplDemo Publication B',
   
@article = N'ReplDemo', @source_owner = N'dbo',
   
@source_object = N'ReplDemo', @type = N'logbased', @description = N'',
   
@creation_script = N'', @pre_creation_cmd = N'drop',
   
@schema_option = 0x00000000080350DF,
   
@identityrangemanagementoption = N'manual',
   
@destination_table = N'ReplDemo', @destination_owner = N'dbo', @status = 8,
   
@vertical_partition = N'false',
   
@ins_cmd = N'CALL [dbo].[sp_MSins_dboReplDemo]',
   
@del_cmd = N'CALL [dbo].[sp_MSdel_dboReplDemo]',
   
@upd_cmd = N'SCALL [dbo].[sp_MSupd_dboReplDemo]'
GO

After creating the publications we create our subscriptions, take & apply the snapshot, and we're ready to start making changes so we execute this simple insert statement:

INSERT  INTO dbo.ReplDemo
       
( SomeValue )
VALUES  ( 'Test' )

Here's the million dollar question: How many times does this insert statement get added to the distribution database? To find out we'll run the following statement on the distributor (after the log reader agent has done it's work, of course):

SELECT  MSrepl_commands.xact_seqno ,
       
MSrepl_commands.article_id ,
       
MSrepl_commands.command_id ,
       
MSsubscriptions.subscriber_id
FROM    distribution.dbo.MSrepl_commands AS [MSrepl_commands]
       
INNER JOIN distribution.dbo.MSsubscriptions AS [MSsubscriptions] ON MSrepl_commands.publisher_database_id = MSsubscriptions.publisher_database_id
                                                             
AND MSrepl_commands.article_id = MSsubscriptions.article_id
       
INNER JOIN distribution.dbo.MSarticles AS [MSarticles] ON MSsubscriptions.publisher_id = MSarticles.publisher_id
                                                             
AND MSsubscriptions.publication_id = MSarticles.publication_id
                                                             
AND MSsubscriptions.article_id = MSarticles.article_id
WHERE   MSarticles.article = 'ReplDemo'
ORDER BY MSrepl_commands.xact_seqno ,
       
MSrepl_commands.article_id ,
       
MSrepl_commands.command_id

Here's the output of the statement:

Query Results

That's one row for each publication the table article is included in. Now imagine that an update statement affects 100,000 rows in the table. In this example that would turn into 200,000 rows that will be inserted into the distribution database and need to be cleaned up at a later date. It's not hard to see how this could lead to performance problems for tables that see a high volume of insert\update\delete activity.

Workarounds
Two workarounds for this behavior come to mind:

  1. Modify data using stored procedures, then replicate both their schema and execution. This won't help for insert statements and is useless if you're only updating\deleting a single row each time the procedure executes. This also assumes that all dependencies necessary for the stored procedure(s) to execute exist at the subscriber
  2. Limit table articles to one publication per article. If you're creating publications from scratch then place table articles that would otherwise be included in multiple publications into their own distinct publication. If you're working with existing publications that already include the table article then subscribe only to the article(s) that you need rather than adding the article to another publication. (Subscribing to individual articles within a publication can get tricky - I'll demonstrate how to do this in a future post)

Change ‘Edit Top 200 rows’ setting in SSMS 2008

August 31st, 2010 by Pradeep Adiga

SQL Server Management Studio (SSMS) is a good feature rich tool. While many choose to do data manipulation using T-SQL queries, there are a lot of users who make use of SSMS for manipulating data.

Until SQL Server 2008, right clicking on the table would provide options to SELECT or EDIT “all” rows in the table.

But in SQL Server 2008, SSMS allows only Top 1000 rows to be selected and Top 200 rows to be edited.

This restriction may not be useful especially when one wants to see/edit more number of rows in SSMS. Is there a way to change this behavior? Yes, SSMS has that option under Tools –> Options.

The other feature is that if a value of 0 is entered in any of the above fields, SSMS will list all the rows from the table (Not a good idea to do a SELECT * on a big table)

Related posts:

  1. Compress database backups in SQL Server 2000 and 2005
  2. Change authentication mode from Windows Registry
  3. Cannot edit the Maintenance Job step

Come and gone

August 31st, 2010 by Gail Shaw

Or “Plan cache monitoring – insert and remove

Previously I took a look at the CacheHit and CacheMiss events to see how they behave and how to identify what’s been searched for in the cache. in this follow up, I want to take a similar look at the CacheInsert and CacheRemove events, see when they fire and how to identify the objects that they relate to.

Again, a word of caution, these can be frequently occurring events on busy servers and so traces should be kept short and to a minimum of events and columns. That said, these should occur a lot less often than the CacheHit and CacheMiss events. If they are occurring very frequently it may indicate that the SQL Server is not reusing plans efficiently.

CacheInsert

The CacheInsert event fires after a CacheMiss. The search for a matching plan in the cache failed, firing a CacheMiss event. Since there’s no plan, the optimiser is invoked to generate one and then that plan is inserted into the plan cache before the Query Execution engine begins execution.

The event is fairly simple, though of course there are a few surprises (what in SQL doesn’t have?).

Exec FireCacheEvents
GO

SELECT ID, SomeDate, Status
FROM TestingCacheEvents
WHERE Status =  'C'

CacheInsert

Three CacheInsert events for two batches. The first is simple enough, it’s the insert of the plan for the stored procedure. The procedure name is in the TextData column and the procedure’s ID is in the ObjectID column. The ObjectName column is not populated for this event.

The second and third Cacheinsert events are the interesting ones. The second one shows a parameterised version of the ad-hoc SQL statement, while the third shows an unparameterised version of the same ad-hoc SQL statement. Clearly this query was simple enough to qualify for auto-parameterisation. So, are there two plans for this been inserted into cache? To answer that one, I need to switch over to SSMS and query the plan cache.

select usecounts, size_in_bytes, cacheobjtype, objtype, st.text,  qp.query_plan
 from sys.dm_exec_cached_plans cp
 cross apply  sys.dm_exec_sql_text(cp.plan_handle) st
 cross apply  sys.dm_exec_query_plan(cp.plan_handle) qp
 where st.text not like  '%sys.dm_exec%'

ExecCachedPlans

Well, there are three entries and the Cache Object Type is listed as compiled plan for all three, but there’s something different between them. Take a look at the size in bytes for the ad-hoc and prepared statements. The ad-hoc is less than half the size of the prepared. That’s not the only thing different. If I open up the execution plans, the exec plan for the prepared statement looks normal. There’s a clustered index scan (there are no nonclustered indexes on this table yet) and a select operator. The plan for the adhoc statement however…

NotaPlan

There’s something missing here, like the rest of the plan. Ok, so the entry with the ad-hoc type is not a complete plan, so what is it?

There doesn’t seem to be anything on MSDN on this (at least nothing I could find), but two of Kalen’s books1. (Inside Microsoft SQL Server 2005: Query Tuning and Optimisation, page 283; Microsoft SQL Server 2008 Internals, page 533) mention this. These ad-hoc entries are shell queries, cached just to make the parameterised form of the query easier to find. All the execution plan contains for these shell queries is a pointer to the plan for the parameterised version of the query.

For the ad-hoc query that gets inserted as a shell plan and the parameterised query, the objectID is, as with the CacheHit and CacheMiss events, a hash of the query text and does not match to any object in the database. The full text of the batch is then given in the TextData column.

I think that’s enough for CacheInsert. On to it’s opposite.

CacheRemove

The CacheRemove event fires, as one would expect, when a plan is removed from cache. The interesting question is, when does that occur?

Is it every event that could possibly result in an inefficient or invalid plan, or just memory pressure and explicit cache flushes that trigger the CacheRemove event?

Let’s see…

As an aside, I’m going to add a nonclustered index to the table that I’m using, so that the update of statistics will have an effect. Otherwise the only execution path possible is a table scan, and changing statistics won’t affect that.

CREATE NONCLUSTERED INDEX idx_TestingCacheEvents_Status
 ON  TestingCacheEvents (Status)
GO

Right, onto the testing.

Exec FireCacheEvents
GO
SELECT ID, SomeDate, Status
FROM  TestingCacheEvents
WHERE Status = 'C'
GO
DBCC FREEPROCCACHE -- Gone.  All gone
WAITFOR DELAY '00:00:05'
GO

Exec FireCacheEvents
GO
SELECT ID, SomeDate, Status
FROM  TestingCacheEvents
WHERE Status = 'C'
GO
DBCC  FREEPROCCACHE(0x0500060063A9355540614285000000000000000000000000) -- Just one removed, the proc
WAITFOR DELAY '00:00:05'
GO

Exec FireCacheEvents
GO
SELECT ID, SomeDate, Status
FROM  TestingCacheEvents
WHERE Status = 'C'
GO
exec sp_recompile  'TestingCacheEvents' -- recompile on the table, both proc and ah-hoc plans  invalidated.
WAITFOR DELAY '00:00:05'
GO

Exec FireCacheEvents
GO
SELECT ID, SomeDate, Status
FROM  TestingCacheEvents
WHERE Status = 'C'
GO
UPDATE TOP(1)  TestingCacheEvents
 SET Status = 'C'
 WHERE Status = 'B'
UPDATE STATISTICS TestingCacheEvents WITH FULLSCAN
WAITFOR DELAY  '00:00:05'
GO

Exec FireCacheEvents
GO
SELECT ID, SomeDate, Status
FROM  TestingCacheEvents
WHERE Status = 'C'
GO
ALTER TABLE  TestingCacheEvents ALTER COLUMN FILLER CHAR(325)
WAITFOR DELAY '00:00:05'
GO
Exec FireCacheEvents
GO
SELECT ID, SomeDate, Status
FROM  TestingCacheEvents
WHERE Status = 'C'
GO

Well, this is interesting…

CacheRemove

The only things that caused the CacheRemove event to fire were the two DBCC FreeProcCache, the first that, as can be seen from the EventSubClass, cleared the entire plan cache, and the second that cleared a single plan.

If I go back and look at the whitepaper Plan Caching in SQL Server 2008, it differentiates between removing a plan from cache (which memory pressure and cache flushes do) and invalidating the plan (which recompiles, schema changes, stats changes and the like do). Based on that whitepaper and these results, I would conclude that the CacheRemove event only fires when the plan is actually removed from cache, not when it’s just invalidated. The invalidation of a plan simply results in a recompile next time the plan is needed.

Now that the question of when it appears is answered, let’s finish with what it looks like.

There are two possible formats for this event, and it’s the EventSubClass that shows which of the two the particular event is.

The first possibility is when all of the plan cache has been flushed, identified by a EventSubClass of “2 – ProcCacheFlush”. This will be a result of a DBCC FREEPROCCACHE and some server reconfiguration events. In this case, there will be no ObjectID and the text data will simply state that the cache has been flushed.

The second possibility is more interesting, especially if monitoring plans being aged out of memory. An EventSubClass of “1 – Compplan Remove” identifies a single plan being removed from cache. These are fired when either a single plan or part of the plan cache is cleared. Examples here are DBCC FREEPROCCACHE with a plan or SQL handle passed to it, DBCC FLUSHPROCINDB, database reconfigurations (restore, detach, offline)

With the one, the ObjectID does have a value and it’s the ObjectID of the pbject whose plan is being thrown out of cache. As with all the other cache events, for ad-hoc queries the ObjectID is just a hash of the query text and only really useful for matching to CacheMiss, CacheHit and CacheInsert events and the object name or ad-hoc batch appears in the TextData

That’s more than I planned to write, but CacheRemove turned out to be a little more complex than I expected. Next up, the last part of this sort series – the recompile events.

They See Things Differently

August 31st, 2010 by Jeremiah Peschka

Every once in a while, I’ll be talking with a friend about presentations and they mention that they really like the visual flow or the artwork or something about the presentation. After I get over being flattered (trust me, it’s easy to do), we’ll start talking about how I came up with the material in the presentation. I’d love to say that the photographs are mine, but they aren’t. There are a few techniques I use to help me make presentations that work well.

Tell Me a Story

When I’m giving a presentation, I try to be acutely aware about how the material flows. I don’t want to take the audience on a wild ride across a bunch of topics. It’s difficult enough to sit still for 60+ minutes and listen to a single topic. But sitting still for over 60 minutes and listening to a variety of topics? That’s nearly impossible.

Instead of trying to cover a bunch of topics, I cover one topic. I break the topic down and I find a logical beginning, middle, and end. Sometimes this part is easy, sometimes it isn’t. The point of the exercise, though, it to find a good way to teach a topic. Kevin Kline (blog | twitter) has a great presentation about the SQL Server internals where he visualizes the presentation from the perspective of a query traveling through SQL Server.

Telling a story makes the presentation more than a series of facts. Telling a story takes a series of facts and gives them a personal connection. Rather than list a bunch of facts, tell your audience how you got somewhere. In a presentation I give on dynamic SQL, I share with the audience how I learned to write good dynamic SQL by showing them examples of bad dynamic SQL. It’s a technique that works well because we’re sharing our embarrassment at the bad code we’ve written and then we learn how to get better. Buck Woody’s (blog | twitter) presentations are so popular because he peppers them with anecdotes. (It probably helps that Buck knows what he’s doing, but let me stick to only one point, okay?) The anecdotes do the same thing – they break make the material relatable.

Let’s Play Word Association

I do a lot of word association when I’m working on presentations. This isn’t some kind of goofy improv theater troupe exercise; it’s how I find great images for my presentations. I’ve been known to spend a lot of time agonizing over a single image to get the message just right.

Symbols are the instruments which convert raw intelligence into culture. Without them, explained Lewis Mumfor, ‘man’s life would be one of immediate appetites, immediate sensations, limited to a past shorter than his own lifetime, at the mercy of a future he could never anticipate, never prepare for. In such a world, out of hearing would be out of reach and out of sight would be out of mind.’

Alan Fletcher – The Art of Looking Sideways

Finding the right symbol to trigger a memory is difficult. There are many different ways to convey an idea, but only one will bring the idea to life. How do you find the image that brings an idea to life? Searching.

In my presentation on SQL Server internals, I use this image to help describe row and index operations. The linear form of the building kind of looks like a table with rows and columns. It doesn’t look exactly like a table might look (you could call that Excel), but it is an image that we’re all familiar with.

Finding Inspiration

The obvious question is “How do you come up with this stuff?”

The unfortunate answer is “I don’t know.”

I spend a decent amount of time looking at art – be it photos, drawings, paintings, whatever. I really enjoy visual communication in all of its forms, so I try to partake every chance I get. I think that has influence the way I find images to use in presentations.

When I’m looking for images, I go through a process of searching through flickr. I don’t necessarily search for funny pictures or pictures with any specific word association. Sometimes I just watch the flickr panda and hope that something interesting and creative commons licensed pops up. This can be a good way to find interesting pictures that set a mood or a tone. Or maybe the image will just continue on whatever theme I’m using through the slide deck. It’s not always important that I pick the right image for any single slide, but that the image fits the presentation as a whole.

Sometimes I’ll even watch the panda when I’m not creating a presentation. I’ll have it up in the background and I’ll save off interesting pictures that I’ve found. The key when I’m doing that is to make sure I have some way to capture the photo’s metadata so I can give credit later. Sometimes it’s easiest to save a text file with the same name as the photo so you know exactly what metadata belongs to each photo. It doesn’t matter how you do it, just make sure you can give credit where credit is due.

Another way I find inspiration is to read a lot of magazines. I don’t necessarily subscribe to them, but I look through them. Advertising is something that we normally bypass when we’re searching for an article we want to read. Advertisers, on the other hand, are trying to get your attention and convey a message with a single image and as few words as possible. Take a look at how advertisers are trying to get your attention. There are a lot of techniques that you can pick up from successful ads without even knowing what you’re doing – composition, layout, the amount of text to use.

There’s inspiration everywhere. Find it and use it.

When does SQL Server decimal NOT convert to .Net decimal?

August 31st, 2010 by Mladen Prajdić

Having the SSMS Tools Pack out in the wild enables me to get much “joy” from different bug reports people send me. And let me tell you, people send me back some seriously weird errors.

But the most unexpected error message I’ve seen so far was the OverflowException when calling System.Data.SqlClient.SqlDataReader.GetDecimal(Int32 i).

It turns out that SQL Server decimal data type is not mapped to the .Net decimal in it’s whole range. .Net decimal type only maps to SQL Server one in the decimal(29, X) range.

That means that if you have a decimal column that has the precision higher than 29 and no matter the scale, you won’t be able to use the native .Net data type.

So what to do? Let’s take a look with an example. The comments provide additional info.

SQL Code:

-- create a test table in tempdb with one valid and one invalid decimal mapping column.
CREATE TABLE TestTable
(
[ID] [INT] IDENTITY(1,1) NOT NULL,
[ValidDecVal] [DECIMAL](29, 2) NOT NULL,
[InvalidDecVal] [DECIMAL](30, 30) NOT NULL
)
GO

-- insert some data
INSERT INTO TestTable(ValidDecVal, InvalidDecVal)
-- both values are in correct range
SELECT 123456789012345678901234567.56, 0.123456789012345678901234567890

C# Code:

private void GetData()
{
using (SqlConnection conn = new SqlConnection(@"server=TestServer; database=tempdb; Integrated Security=SSPI;"))
{
using (SqlCommand cmd = new SqlCommand("SELECT ID, ValidDecVal, InvalidDecVal FROM TestTable", conn))
{
conn.Open();
SqlDataReader rdr = cmd.ExecuteReader(CommandBehavior.CloseConnection);

rdr.Read();
// get the id
int id = rdr.GetInt32(0);
// get the 29 precision value just fine
decimal validDecimal = rdr.GetDecimal(1);
// this errors out ...
decimal invalidDecimal = rdr.GetDecimal(2);
// so does this ...
decimal invalidDecimal = rdr.GetSqlDecimal(2).Value;

// the only thing to do is to either pass around SqlDecimal class
SqlDecimal invalidDecimalAsSqlDecimal = rdr.GetSqlDecimal(2);
// or to cast it to string
string invalidDecimalAsString = invalidDecimalAsSqlDecimal.ToString();

// ... do something with upper values
Console.WriteLine(invalidDecimalAsSqlDecimal);
Console.WriteLine(invalidDecimalAsString);
}
}
}

If you’re working with large precision types I’d like to hear how you overcome this problem in .Net.

The only way I’ve found to deal with this is to either pass around the raw SqlDecimal data type or its string value.

A quick search revealed this Connect item that showed the problem with Linq2SQL. I don’t know why this isn’t fixed yet but I hope it will be soon. If you encounter this problem, vote it up.


Defensive Db Programming Chapter 06

August 30th, 2010 by Jason Brimhall

We are here again after another week and ready for another episode in this series.  Today we get to talk about chapter 6 in the book by Alex Kuznetsov (Blog).  You can find more on this series by looking here.

Common Problems with Data Integrity

Alex explains to us that SQL server has built-in tools to help maintain the integrity of the data.  He quickly lists out three areas that can be useful in maintaining Data Integrity, as well.

  1. Applications
  2. Constraints
  3. Triggers

Application

Data integrity can be enforced from the application.  It should be noted that this could be an inefficient method of ensuring the integrity of the data.  If you try to do it from the application, be prepared for a little pain.

Constraints

It is well known that using constraints is the most reliable way to enforce data integrity rules.

Using constraints in the database is helpful and quick.  If data does not meet the constraint criteria, we can prevent it from being added.  There are different kinds of constraints and one should become familiar with the various constraints.  In this section, Alex also demonstrates the differences for the three states of constraints.  It is important to understand the state of the constraint to ensure the integrity of the data is at the level your expect.

There is a section devoted to UDFs in this chapter as well.  The section is within the Constraints section and should be read.

Triggers

The chapter is concluded with the discussion on using Triggers for enforcing data integrity.  Triggers are suitable in some areas where a constraint would not be suitable for enforcing data integrity.  Triggers do have their own problems that should be considered too though.  Some of the problems are with how the trigger was coded and some are inherent with the use of triggers.  It is well worth learning about these limitations by reading this section as well.  Test the samples provided and implement some of the suggestions in your triggers.

Conclusion

Alex concludes this chapter with the following statement:

Data integrity logic in the application layer is too easily bypassed. The only completely
robust way to ensure data integrity is to use a trusted constraint. Even then, we need to
test our constraints, and make sure that they handle nullable columns properly.

We also have a nice set of reminders of what to include in our assumptions to be tested.  The short of it, check your constraints and test as many scenarios as necessary based on your assumptions.