Archive for the ‘Database Design’ Category

Slides From Reno SQL Server User Group Available

Tuesday, March 19th, 2013

Well, I finally joined Slideshare.

Here are the slides from my most recent presentation (Database Design, Size DOES Matter!) to the SQL Server User Group in Reno, Nevada last week:



I’m going to see about pushing many of my presentations to Slideshare in the coming months. I like the idea of Slideshare, just not sure how valuable it will prove itself over time.

But I’m always willing to try something new.

Enjoy!

Slides From Reno SQL Server User Group Available is a post from: SQLRockstar - Thomas LaRock

The post Slides From Reno SQL Server User Group Available appeared first on SQLRockstar - Thomas LaRock.

The Accidental Architect

Monday, February 25th, 2013

datacenter1Most readers of this blog are likely to be familiar with the term accidental DBA. It is meant to describe someone that ended up in the role of database administration by pure chance. They simply happened to be in the right place at the right time (or right place at the wrong time). I even wrote a book about my experience and others have written books as well to help those who find themselves falling into the role of a DBA.

In my case the conversation went something like this: “Hey, you’ve done a database restore once before, how would you like to be the new DBA since everyone else just quit?” I answered “Of course!” and have not looked back.

Over time as the accidental DBAs skills grow sharper they get asked to do a little bit more in other areas. Sometimes this is a simple question similar to hey-what-do-you-know-about [something]? With Microsoft SQL Server you get a bundle of products that all have similar names: SQL Server, SQL Server Reporting Services, SQL Server Analysis Services, SQL Server Integration Services, etc. Over time the questions and answers went something like this:

“Hey, what do you know about SSRS? We need help with some reports.”
“I don’t know much about them, to be honest.”
“You don’t? Aren’t you the DBA?”

Well, yeah, I was the DBA. Since I kept getting those types of questions I started to think that I really did need to know everything. But of course you cannot know everything about everything; only a fool would pretend to be an expert in everything. Eventually I was able to focus on learning just the core engine. I learned enough to earn my MCM in SQL Server as well. I like to tell people that my MCM means I know a fair amount of detail for about 20% of the entire product line. I wish I were joking, but I’m not.

There is a lot more to learn. Each and every day new products come to market that allow someone like me the opportunity to learn something new. I won’t get to the level of depth for much of them, but I will learn about them. The reason why is simple: because I’m asked. The question goes something like hey-you-know-about-data-what-do-you-think-we-should-do-about [something]?

As a data professional I see this as a natural progression in our careers. I fell into the role of a DBA because I was someone that could adapt and learn new things fairly quickly. I would get new things tossed my way for the same reason. These days the questions I get have to do with architecture decisions. Should we use Azure, or some other host? Do I need a “big data” solution? How best can I mash these three different data sets together and perform some analysis quickly enough to make a business decision?

If that describes what is happening to you then you are what I call the “Accidental Architect”.

You didn’t apply for the role, but you get asked the questions. Your experience and insight is valued. People want to know what your vision is for the next 18 months. They ask you about roadmaps, they ask about insights, they ask about anything that relates to data in some way.

It’s a wonderful time to be in technology.

It’s a wonderful time to get together to connect, share, and learn about data.

The Accidental Architect is a post from: SQLRockstar - Thomas LaRock

The post The Accidental Architect appeared first on SQLRockstar - Thomas LaRock.

Designing a Database: 7 Things You Don’t Want To Do

Wednesday, January 16th, 2013

Painting ProblemsYour database design is awful.

The reason nobody has told you this yet is for one of two reasons: ignorance or apathy. They either don’t know it’s bad, or they don’t care.

Well *I* care about bad designs, as I typically bear the burden of having to make queries run fast and overcome the limits of the bad design. As a data professional for the past 15 years I have seen (and built) my share of database designs. Some are good, some are OK, but most make me want to stab someone with a paper clip shiv.

When I come across a design that is sub-optimal it makes me ask myself “what did this data do to deserve to be treated so poorly?” Data lasts longer than code and it should be treated accordingly.

In my continuing quest to help you respect your databases, I’d like to start today with pointing out where you are doing it wrong. You’ll thank me later.

Here are the seven things you don’t want to do when designing a database.

1. Do It Yourself

Like dentistry, database design is something that is best left to a professional and not something you should do for yourself. I don’t care if you are able to get one of those probes with a fancy mirror on the end, you should stop shoving sharp things in your mouth.

Just because you can do something doesn’t mean you should. If you haven’t designed a database before then don’t make a mission-critical system your first project. Go out and hire an expert to help guide you.

I think this Dilbert sums it up rather nicely:

21168.strip

 

2. Have No Performance Expectations (or *no* expectations)

I’ve been involved in more than one project where there were no performance expectations at all. Well, not until we went live in production and it was “too slow”. Without having ever defined an acceptable level of performance it was rather difficult to unwind a few months worth of work in order to get performance to an acceptable level. The end result was that we got a system deployed but nobody was happy with the process.

If you have not set any performance expectations then you should expect some headaches during the early stages of deployment. Likewise, if you have wild expectations for performance you should expect some disappointments especially if you haven’t done any stress testing. Chances are the test system with ten rows of data is not a good indication of how the millions of rows in production will behave.

3. Going Big, Just In Case

I often see data types being chosen as if they don’t matter. But the truth is (despite everything you were told in college) size matters. If you know that the only possible values for a certain column are between 0 and 100,000 then you don’t need to slap a BIGINT data type for that column when a INT would do just fine. Why does this matter? The BIGINT data type requires 8 bytes of storage, and the INT requires only 4 bytes of storage. That means for each row of data you could be wasting 4 bytes. Doesn’t sound like much, right?

OK then, let’s consider that your table has two million rows. Multiply those rows by 4 bytes and you have 8 million bytes, or roughly 7.8MB of wasted space. I know it doesn’t sound like a lot, does it? Well, it adds up, and quickly. I’ve only shown you one example for just one column, but how about your date columns? if you don’t have a need for calendar dates prior to the year 1900 or after the year 2079 then SMALLDATETIME is likely to work just fine for you. Oh, and let’s not forget that these columns can be indexed, and those indexes will also be unnecessarily wider as well.

Choosing the right data type matters, for all sorts of reasons. Take the time and make an effort to get it right at the start. I’ve even put together a script to help you right-size the pages currently residing in your buffer pool.

And if you are looking for a great use case for right-sizing your data types, look no further than Michelle Ufford (blog | @SQLFool) of GoDaddy: http://inside.godaddy.com/scaling-database-data-types/

4. Not Examining Foreign Keys As Part Of Your Indexing Strategy

I am assuming, of course, that you even have foreign keys defined. I’ve seen *many* databases that have little to no primary keys, foreign keys, or even any indexes defined. No, I don’t know who would do such a thing either. But they’re out there, and sooner or later you will find them, too.

Assuming you have FKs defined then you should be evaluating to see if it would make sense to add indexes to match those FK definitions. In some case, it will. In other cases, it won’t. But you should make certain that this type of review is part of your overall design process.

In fact, that reminds me of another thing you don’t want to be doing when designing a database…

5. Indexing Every Column, or Indexing No Columns

Assuming you have set some realistic performance benchmarks then you are likely going to want to consider building some indexes. If you don’t have any indexes defined then you are likely not concerned about performance at all anyway.

What I see most of the time are databases with too many indexes defined. This is usually the result of someone using an index tuning advisor tool but it can often be the case where it is due to someone reading a blog post that says “indexes are what you need” and they go about creating a dozen indexes in an effort to get one query to run faster.

While an index is wonderful to help you read data faster it adds overhead for every DUI statement (Delete, Update, Insert). Adding an index to every column in a table is likely to be a nightmare for any process that has data coming in to that table.

I have a script that I have used in the past for locating duplicate indexes in SQL Server and you can find similar scripts for detecting duplicate indexes in Oracle also.

6. Forgetting About Data Quality

As a DBA I understood my role to be focused on recovery. If the system went down I needed to be able to recover the data, and fast. That was my primary focus. Database designers don’t have to worry about the recovery of data (because that’s my job) and instead they focus on the integrity of the data.

If you are designing a database then you need to make certain you have accounted for data quality. You simply cannot expect someone else to do that for you. Imagine if the DBA was expecting someone else to take care of recovering the data? Unfortunately I have worked with many systems that have had outages due to what was lovingly called “garbage in, garbage out”. If you have built a system that relies on perfect data I am here to tell you that your system is going to fail one day, likely very soon.

There are many ways for you to enforce some type of data integrity. Normalization is one way. Another way is to deploy a service such as Data Quality Service. This allows for you to enforce rules and constraints that help guarantee a certain level of data quality.

7. No Data Retention Or Archiving Strategy

I’m willing to bet that you have data older than seven years sitting on your disks right now. Seven years seems to be that mythical line in the sand that everyone says they need, no matter what the system. If you ask someone how long they need to keep records for any system the answer almost always comes back “seven years”, even if the real answer is closer to seven weeks.

As a result systems get built with only one thing in mind: storing and preserving it in tables for all time. It is rare for someone to stand up and say “hey, maybe we could agree that data older than a year can be archived.” Inevitably someone will respond with “that’s fine, but if I need to run a report for the previous year you had better be able to get my data back within the hour.”

If you are designing a database you need to spend the time finding out exactly how much data will be retained. Knowing that information is going to help you project performance expectations as you store more and more data.

That’s my list of how I see good database ideas become bad database designs. If you find yourself doing any one of those seven things it is likely that over time your database design will become further and further away from ideal. Simply avoiding these seven things would keep your database from performance degradation over time.

Designing a Database: 7 Things You Don’t Want To Do is a post from: SQLRockstar - Thomas LaRock

It’s Christmas: 3 Ways To Be Wise About Your Rowsize

Thursday, December 20th, 2012

codeEmpty space.

That’s what you have inside your database, I’m certain of it.

I am also certain that you can take steps to avoid having large amounts of empty space inside of your database. The first step, as always, is admitting you have a problem. In this case that means you need to know how to look for the problem.

That’s what I’m here to do for you today. Let’s go look for some empty space together, shall we?

The Problem

Here’s the problem: SQL Server stores data in 8k pages. OK, so that’s not the problem, really.

The problem is when you have row sizes that are slightly more than 4k in length, but far less than 8k. That’s the wasted space, and it’s what I’m here to help you with today. Every time you insert a row that is 5k in size, you take up a full data page on disk. Imagine, if you will, having 100,000 rows in a table, each one at 5k in length. How much space will those 100,000 rows consume on disk? Ideally you would like for it to be close to 500Mb, right? Well it will be closer to 800Mb. That extra 300Mb? Yeah, just empty space. It’s allocated, but not used. And every time you run a query against that table you may need to pull back roughly 100,000 logical I/O (if you needed each row, and each row was on its own data page).

Imagine if you could get two rows of data onto a page. Then the same 100,000 rows would only need 50,000 pages, which is less than 400Mb in space, and more importantly it would only be (at most) 50,000 logical I/O. You may have just cut your query time in half!

We have a handful of ways to find out the size of a row, but here’s the method I prefer: the simplest.

The simplest way for me to find out if the rows are larger than 4k is to take the size of a table and divide by the number of rows. I said “simplest”, but that doesn’t necessarily mean “best”. But for those of you still running SQL2000 then that is probably the easiest thing to do. Good luck with that.

For those of us here in the 21st century we can use the dm_db_index_physical_stats system function in order to get details about the average record size. For example, this script would give us some details for the HumanResources.JobCandidate table in AdventureWorks2012:

SELECT avg_record_size_in_bytes
, avg_page_space_used_in_percent
, OBJECT_NAME(object_id) AS [ObjectName]
FROM sys.dm_db_index_physical_stats(db_id(), object_id('HumanResources.JobCandidate'), 1, null, 'detailed')

It would be rather tedious to run this for each and every table manually, so let’s put together something that returns details for every table in a database. We want to add some filters as well, to reduce the amount of noise. We don’t need to worry about the small stuff, like a row with an average record size of 11 bytes.

We are not concerned with non-clustered indexes in this example, we will want to focus on heaps and clustered indexes. We also want to focus on tables that have a row with a maximum size greater than 4k (because we want to get at least two rows onto a page as often as possible), and we are not going to be concerned with very small tables that have few rows or few pages.

With all the being said, here’s the script to get it done for us:

SELECT avg_record_size_in_bytes
, avg_page_space_used_in_percent
, OBJECT_NAME(object_id) AS [ObjectName]
FROM sys.dm_db_index_physical_stats(db_id(), NULL, NULL, null, 'detailed')
WHERE index_level = 0 --leaf level
AND index_id <= 1 --heap or clustered
AND page_count > 10 --we want to focus on tables of certain size 
AND record_count > 100 --we want to focus on tables with certain rowcounts
AND alloc_unit_type_desc = 'IN_ROW_DATA' --only in row, not LOB
AND max_record_size_in_bytes > 4000 --max size of rows greater than 4k
AND avg_page_space_used_in_percent < 80.0 --focus on pages less than 80% full

Feel free to adjust the filters as you feel necessary. You may only want to focus on tables with more than 1,000 pages, for example. I wanted to give you a script that would return a few rows from AdventureWorks2012.

OK, so now we have identified some tables to investigate, how do we fix them? Great question, and I have some suggestions for you.

1. Review your design

Take a look at that table. No, I mean LOOK AT IT!

Do you see any columns with redundant data? Is the table really wide because the data was denormalized in an effort to make reporting faster? Or is your “database” really just a former excel spreadsheet that got migrated to MS Access a dozen years ago and has since been promoted through versions of SQL Server like a star athlete through high school?

Perhaps there are two columns for email addresses, possibly leftover from a refactoring project that never got cleaned up. Yeah, go ahead and pretend like you’ve worked on a refactoring project that actually cleaned up the old data. I’ve yet to meet anyone that has seen such project through to completion, many projects don’t ever get around to removing that old data.

It might be time to review the design for this database however any changes you make are likely going to be disruptive to the business and require testing, change control, etc. It may cost you more to fix than to simply leave it alone! You will need to document what the costs are in terms of disk storage and query performance. If you can show that you can save money in the short term by making some changes then you are likely to get traction for a design review.

It is also possible you are working with vendor products that cannot be changed. Let’s assume that you can make changes and therefore you need to go and talk to your development team and discuss more about the business requirements. Get up off your arse and do it, go and ask the questions. You may be surprised to find people willing to help make things better.

If you suspect that you have some column datatypes that could use some additional investigation then might I suggest you read the next section…

2. Examine columns for right-sizing

What do you see for column datatypes. Do you see a bunch of CHAR(), NCHAR(), DATETIME, and BIGINTs defined? Do you need those types, or would smaller (or variable) ones suffice? Do you have ANY idea? Probably not, and that’s why I’ve blogged about this before, more than once actually. It isn’t hard to avoid datatype mismatches. If you suspect that you have some tables in a database that could use some right-sizing then you can run my scripts in order to examine those columns a bit more closely.

This is why I have my script focus on the max size of a row as opposed to the average size of a row. I want to know if the row has a chance to be greater than 4k as that is the row I need to focus on altering first, in order to get that row to have a max of less than 4k, if possible. Does this guarantee page fullness? Absolutely not! It just happens to be where I am drawing a line in the sand for this particular example.

Finding columns that need adjusting then gives you a new problem to solve: trying to convince your business users that a change is needed.  Many times I have heard managers simply say “let’s wait for a problem to happen before we try to fix anything”. If you still can’t make a compelling case for change you may want to consider a third alternative…

3. Compression

You can enable row and/or page compression on the table in order to help reduce the I/O. Enabling compression will not require any of those pesky code changes that people don’t want to make. It doesn’t mean that you shouldn’t test the effects of enabling compression, it just means that it is less intrusive than changing the datatypes for columns.

Compression can give you a boost in performance by allowing for more rows to fit onto a sibgle 8k page. However there is additional overhead with compression, and if you table has a high volume of DUI statements (that’s Delete, Update, and Inserts) you may not want to think about compression at all.

In addition to the increase in performance you also get a benefit in reduced disk space. You can get an estimate of the space saved by using the sp_estimate_data_compression_savings system stored procedure. This example will give me the estimate for enabling row compression on the Sales.SalesOrderDetail table:

EXEC sp_estimate_data_compression_savings 'Sales', 'SalesOrderDetail', NULL, NULL, 'ROW' ;
GO

I’ve given you a way to identify if you are seeing a lot of wasted space and also three ways to help fix the issue once found.

The choice to act is yours, of course.

It’s Christmas: 3 Ways To Be Wise About Your Rowsize is a post from: SQLRockstar | Thomas LaRock

Data Modeling Is Dead; Long Live Data Modeling!

Monday, December 17th, 2012

erd_diagramData modeling is dead. It is a product of an era that has passed; that of corporate silos that created their own versions of software to suit their own needs.

That is no longer the world in which we live. That era was one that had high costs associated with building and maintaining a database of customers.

Today’s era is one where you can subscribe to Salesforce.com for just a few dollars a day. You can decide for yourself to run a new report. How much did that same report cost in the old era? How long would it take for IT to deliver that report? That’s why businesses today are using such services, because it reduces time and costs.

Recently Karen López (blog | @datachick) wrote an article gives a good background on the differences between conceptual, logical, and physical data modeling. The TL;DR version is here: Conceptual modeling is when the business helps to map out their needs at a strategic level. Logical modeling is when a data architect gets involved and describes the business data requirements independently of the DBMS, uses, or organizational constraints of the data. Physical modeling is when a data architect and a DBA ensure that the logical model will meet the business needs for things like performance and recovery. A physical model is designed for a specific version of a DBMS or other data store.

On paper and in classrooms these mythological ideals sound great. In the non-academic world database modeling is not done as much as you might think. With the advent of cloud services such as Azure I can build and deploy applications without ever needing to have one database design or modeling session.

Here are the reasons why modeling doesn’t matter for most of us anymore.

Third Party Software Packages

We’ve all been there: someone on the business side gets a phone call from a salesman for an invite to a three-martini lunch. By the time lunch is over the business has decided that the salesman’s product is *THE* product that your business must have in order to do…well, it doesn’t matter what it promises to do, really. The point here is that your business decides to purchase some software that has a database backend. If you are lucky it is a database platform you can support, but that is not always the case. But hey, a database is a database, right? [Despite our shop being a dedicated Microsoft shop I once had a manager demand to know "how much longer" it would take for my team to support Oracle, because we were holding the business back, apparently.]

This software come in house, get’s installed, and guess what? All of your best practices and policies get thrown out the window. You make exceptions for every rule because this software requires ‘sa’ access, it needs a dedicated server, and you aren’t allowed to make any schema changes (not even additional indexes) in order to improve performance (I’m looking at you, Sharepoint).

While it is a recommended best practice to do logical and physical modeling together those tasks are not possible when purchasing a third party product. More than three-quarters of the apps supported by my customers right now are from third party vendors. Very little in house development is being done. That means they get the conceptual, logical, and physical model that was built for them by somebody else. As a DBA, you won’t even be able to touch anything, either.

You’re stuck with a pickup truck and your business expected a Ferrari.

Software-as-a-service (SaaS)

This is the same thing as above except that you don’t host the system. The DBA doesn’t have any input into…anything! Of course the DBA will get blamed for anything that goes wrong, but that’s a topic for a different blog post.

With SaaS you get someone else’s conceptual, logical, and physical design, and you simply hope it will be good enough for your needs most of the time. Salesforce is the prime example here. Your business does need not purchase and host their own customer relationship management system these days, they can just pay for the service. They save money, and time because this also means you don’t need to spend time in those pesky database design or modeling meetings.

Data-as-a-service (DaaS)

This is just like above but you don’t do anything except pull in data feeds. Now, once you pull in the feeds you might be thinking “hey, we’ll need to store that data”, which would mean the need for some actual modeling. But the reality is that it won’t get stored inside of a traditional database. It is much more likely to be stored, inside a spreadsheet, inside of another application such as Sharepoint. Don’t pretend as if you’ve never seen this before. If you haven’t, then just wait your turn, you will see it soon enough.

Here’s the crazy part: as much as the need for data modeling seems to be diminishing these days I also see that it is more important now than ever before. All of those companies that are providing SaaS and DaaS need to have data modelers on staff that can ensure their services can be consumed and shared easily. That stuff cannot just happen by magic. As easily as I can deploy an application to Azure without needing to know anything about modeling my application will not scale beyond a certain point without a proper database model and design. User needs must be anticipated, logical models are needed to ensure data quality, and physical designs are needed for performance and recovery. Companies run a huge risk by not having these.

Every time I meet with a customer that says “we can’t touch the app” I feel their pain. I know that our careers as data professionals are heading into the Cloud, and that often times our hands are tied. But as long as people are still building applications, as long as data needs to get shared between two endpoints, then there is always going to be the need for someone to understand how best to keep that data organized.

That’s where the data modeler, or architect, is needed most.

Even today.

Data Modeling Is Dead; Long Live Data Modeling! is a post from: SQLRockstar | Thomas LaRock

The Importance Of Trust

Monday, November 26th, 2012

Recently, I was going through all my servers and performing some basic health checks. One of these checks was to look for foreign keys and constraints that are not trusted. I figured this would be something of a rare occurrence and was completely surprised when I found out that roughly 75% of my servers had at least one database where foreign keys and / or constraints were not trusted.

I'm telling the truth. Trust me.

Why is this important? When these items are trusted, SQL can make some assumptions about the data in the tables and can use those assumptions to create more efficient query plans. If, however, the constraints are not trusted, SQL can't make any assumptions and must construct a query plan that may be more computationally intensive.

Before I get started, let me first explain what an untrusted foreign key or constraint is. (From here on, I'll use the term constraint to include both constraints and foreign keys.) When you define a constraint on a table, you are telling SQL Server to only allow certain data in certain columns. In the case of a foreign key, you are telling SQL Server that the value in a column in Table A must exist in Table B as a primary key. If you try to enter a value that is not in Table B, the insert will fail.

However, you can tell SQL Server to cheat and allow you to insert the value anyway. You can do this in a couple of ways. The most obvious is to disable the constraint and insert the data. You can then re-enable the constraint. Another way is to perform a bulk insert operation without specifying the CHECK_CONSTRAINTS option. This is often done to speed imports of large amounts of data.

Unfortunately, once you do this, SQL Server marks the constraint as "not trusted". Simply re-enabling the constraint will not change this. The constraint remains untrusted, even after being re-enabled. Re-enabling will prevent bad data from being inserted into the table again, but it does not validate the data that was inserted while the constraint was disabled. In order to make the constraint trusted, you need to tell SQL to validate the constraint against all the data that is currently in the table. I'll show how to do this later.

So who cares? If I know the data I am importing is valid, why not go ahead and disable the constraint, load the data, then re-enable the constraint? The problem is you know the data is valid, but SQL Server doesn't. And that can lead to sub-optimal performance.

Let me give a very simplified example. The following script will create two tables: Orders and Customers. There is a foreign key constraint on the Orders table that requires the value in Orders.CustomerNumber to be in the Customers table.


CREATE TABLE [dbo].[Orders]
       (
        [OrderNumber] [int] IDENTITY(1, 1)
                            NOT NULL
       ,[CustomerNumber] [int] NOT NULL
       ,[ProductNumber] [varchar](100) NOT NULL
       ,[Qty] [int] NOT NULL
       ,CONSTRAINT [PK_Orders] PRIMARY KEY CLUSTERED ([OrderNumber] ASC)
            WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
                  IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
                  ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
       )
ON     [PRIMARY]

GO

CREATE TABLE [dbo].[Customers]
       (
        [CustomerNumber] [int] IDENTITY(1, 1)
                               NOT NULL
       ,[CustomerName] [varchar](100) NOT NULL
       ,[Address] [varchar](100) NOT NULL
       ,[City] [varchar](50) NOT NULL
       ,[State] [char](2) NOT NULL
       ,[ZipCode] [varchar](10) NOT NULL
       ,CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED ([CustomerNumber] ASC)
            WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
                  IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
                  ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
       )
ON     [PRIMARY]

GO

ALTER TABLE [dbo].[Orders]  WITH CHECK ADD  CONSTRAINT [FK_Orders_Customers]
FOREIGN KEY([CustomerNumber])
REFERENCES [dbo].[Customers] ([CustomerNumber])
GO

ALTER TABLE [dbo].[Orders] CHECK CONSTRAINT [FK_Orders_Customers]
GO

Now, let's insert a few rows of data. First, we'll create a customer record, then two order records that link to that customer.


INSERT  INTO Customers
        (CustomerName
        ,Address
        ,City
        ,State
        ,ZipCode)
VALUES  ('Big Spender'
        ,'123 Main Street'
        ,'Gotham'
        ,'NY'
        ,'10111')

INSERT  INTO Orders
        (CustomerNumber
        ,ProductNumber
        ,Qty)
VALUES  (1
        ,'ABC123'
        ,'10')

INSERT  INTO Orders
        (CustomerNumber
        ,ProductNumber
        ,Qty)
VALUES  (1
        ,'ABC123'
        ,'11')

Now, let's say we want to execute the following query:


SELECT  *
FROM    Orders
WHERE   orders.CustomerNumber IN (SELECT    CustomerNumber
                                  FROM      Customers)

Not that great of a query, but this is just an example. Let's now run the query with the Include Actual Execution Plan option and see what query plan SQL came up with:

Notice that SQL doesn't even touch the Customers table. This is because the constraint we defined guarantees that every value in Orders.CustomerNumber exists in the Customers table.

Now, let's disable the constraint and re-run the same query:

ALTER TABLE Orders NOCHECK CONSTRAINT FK_Orders_Customers

Here's is the query plan now:

Because the constraint is not trusted, SQL must construct a query that accesses the Customer table. This will obviously require SQL Server to do more work than the plan we got when the constraint was trusted.

Now let's re-enable the constraint and see what happens:

ALTER TABLE Orders CHECK CONSTRAINT FK_Orders_Customers

When we run the query again, here's the plan SQL generates:

The plan is exactly the same as the one we got when the constraint was disabled! This is because the constraint is still untrusted. Even though we did not add any data to the tables while the constraint was untrusted, the query engine does not know that and SQL leaves the constraint marked as untrusted. Therefore, the query optimizer cannot use the additional information the constraint provides when it optimizes the query.

So how to we get the constraint trusted again? By running the ALTER TABLE command to tell SQL to verify the constraint:

ALTER TABLE Orders WITH CHECK CHECK CONSTRAINT FK_Orders_Customers

Note the double CHECK. This is required. Now when we run our query, we get our initial execution plan again:

What happens if, while the constraint was disabled, someone did insert invalid data? In that case, the above statement would fail with an error message. If this happens, you need to fix the problem before the constraint can be re-trusted.

Now this example was a bit contrived. We're dealing with two very simple tables with a total of three rows of data. Performance will not be an issue no matter what which query plan we end up with. But imagine you have a large data warehouse with millions of records. Each week, there is a new bulk load of data and someone forgot to code the import process to use the CHECK_CONSTRAINTS option. Queries against that data warehouse could end up taking much longer than they should.

How can you tell if you have any tables in your databases that have untrusted constraints? The sys.foreign_keys table contains a column named is_trusted. If the value in that column is 1, the foreign key is not trusted. For constraints, the sys.check_constraints table contains a column with the same name and functionality.

Below is some code that will search through all the foreign keys in a database and attempt to make them trusted.  Note that this will only look at foreign keys that are enabled but not trusted. If any are disabled, this will not try to enable them.

DECLARE @CorrectedCount INT
DECLARE @FailedCount INT
DECLARE UntrustedForeignKeysCursor CURSOR
FOR
        SELECT  '[' + s.name + '].' + '[' + o.name + ']' AS TableName
               ,i.name AS FKName
        FROM    sys.foreign_keys i
                INNER JOIN sys.objects o ON i.parent_object_id = o.OBJECT_ID
                INNER JOIN sys.schemas s ON o.schema_id = s.schema_id
        WHERE   i.is_not_trusted = 1
                AND i.is_not_for_replication = 0
                AND i.is_disabled = 0
        ORDER BY o.name

DECLARE @TableName AS VARCHAR(200)
DECLARE @FKName AS VARCHAR(200)

SET @CorrectedCount = 0
SET @FailedCount = 0

OPEN UntrustedForeignKeysCursor
FETCH NEXT FROM UntrustedForeignKeysCursor INTO @TableName, @FKName
WHILE @@FETCH_STATUS = 0
      BEGIN
			/* SELECT 'ALTER TABLE ' + @TableName + ' WITH CHECK CHECK CONSTRAINT [' + @FKName + ']' */

            BEGIN TRY
				/*
					This try-catch will allow the process to continue when a constaint fails to get re-trusted
				*/
                  EXECUTE('ALTER TABLE ' + @TableName + ' WITH CHECK CHECK CONSTRAINT [' + @FKName + ']')
                  SET @CorrectedCount = @CorrectedCount + 1
            END TRY
            BEGIN CATCH
                  SET @FailedCount = @FailedCount + 1
            END CATCH

            FETCH NEXT FROM UntrustedForeignKeysCursor INTO @TableName,
                  @FKName
      END

CLOSE UntrustedForeignKeysCursor
DEALLOCATE UntrustedForeignKeysCursor
SELECT  CAST(@CorrectedCount AS VARCHAR(10)) + ' constraints re-trusted.'
SELECT  CAST(@FailedCount AS VARCHAR(10))
        + ' constraints unable to be re-trusted.'

This code will only look for foreign keys that are untrusted. If you want to also check for untrusted constraints, change the table in the cursor definition from sys.foreign_keys to sys.check_constraints. Everything else can stay the same. The code will report a count of constraints it has fixed and was unable to fix.

As I said before, I was completely surprised by the number of databases I had that contained untrusted foreign keys and constraints. I recommend taking a look at your systems to see how many there are in your environment.

(Standard code disclaimers apply - do not run unless you understand what the code is doing. This code has been tested against SQL 2005 and SQL 2008 R2 servers.)

Share

Use What Works: Prefixing Database Tables With ‘tbl’

Wednesday, October 10th, 2012

“If you can’t tell a table from a view you need to try harder”.

That quote is from Karen López (blog | @datachick), and she often makes the comment in reference to my advocating the use of prefixes in object names.  My reason for wanting to use prefixes is simple enough: I want to know if I am looking at a table or a view when reviewing code.

Karen doesn’t believe that anyone should be using object names as a place to store meta data about the objects themselves. I would like to agree with her but then we’d both be wrong.

I am not a fan of her position or her quote. Can you imagine your child comes home from school having trouble understanding how division works and your advice is “you need to try harder”?

Our Tools Are Failing Us

Karen’s quote would have you believe that she is putting the onus on people to understand what database object they are looking at without any sort of hint or help. To me that is no different than calling the matter a “training issue”, an excuse for why things are the way they are but not a solution for how to avoid the issue from happening in the first place.

What Karen and I agree on here is that our tools are failing us. Karen reminds me often about how our tools have a greater impact on database design decisions than what we realize. When those tools fail to provide the details I need as a DBA, such as distinguishing a table from a view, we end up with prefixes for our table names.

I believe that I am in the minority these days when it comes to using table prefixes. I laugh when someone tells me they would never use a prefix to identify a table, but they would use one for views. It’s as if they don’t understand that by NOT using a prefix they are still designating the difference between a table and a view anyway!

I know many people that designate a variety of database objects. Indexes are common example, as folks want to know if the index is clustered, non-clustered, or perhaps filtered. Triggers, functions, stored procedures…all objects that get affixed with a prefix most of the time. Even data warehouses will use “Dim” and “Fact”. But for some reason the idea of using a prefix on a table is frowned upon by the same establishment that would advocate the use of prefixes elsewhere!

How SSMS 2012 Helps

Check out what I found in SSMS 2012 the other day. If you hover over the name of the object in a T-SQL statement you get a tooltip that explains what you are looking at:

How cool is that?

I love finding new things like this when using updated versions of tools that I have been using for years. I have no idea if this is new to SSMS 2012 or not, but it is new to me and I am enjoying it.

So, looks like I don’t need to try harder, as Karen would suggest. It would seem that Microsoft was the one that went out of their way to help me. They took a tool that had been failing me as a DBA and made it better.

I am still going to use prefixes though. If anyone can prove to me that the use of a prefix in a table name causes a performance issue then I will reverse my position on this matter. Until then, I will continue to use them, and to advocate for their use, because there are folks like me for which this method works best. Also, not everyone uses SSMS, so I want to be mindful for that.

A wise man named Buck Woody (blog | @buckwoody) once told me “Use what works”.

Using prefixes still works for me. It also works for others. It may not be your choice, but that does not make it the wrong choice for others.

Use What Works: Prefixing Database Tables With ‘tbl’ is a post from: SQLRockstar | Thomas LaRock

I Bet Firefighters Would Make Great DBAs

Tuesday, October 2nd, 2012

I bet he’d make an excellent DBA.

If your house were on fire who would you call?

Would you call the architect first? Or would you call the fire department? Of course you would.

Yet when your application performance is so poor as to be unusable, who do you call first? The architect? No! You call the database administrator. We’re the ones to put out the fires and restore performance to normal.

Much like a fireman, a good DBA is able to put out those corporate fires quickly while the architects are still busy deciding things like what bottle of wine to have with lunch.

Many people agree that good database performance starts with good database design. If that is true then why don’t the data architects get blamed for bad performance? I’m guessing it’s because they aren’t on-call like a DBA, probably from dining at restaurants that don’t allow cell phones to be used.

That being said, if and when you want your house built you don’t contract with the fire department, you call a builder. But the fire department does have a say into the guidelines that must be followed for home construction. The fire codes are there to protect not only the homeowner but the firefighters that will respond to fight the fire.

Your shop should have similar guidelines from your DBA regarding production application code. You let your architects design and build the system, but they should do so while keeping in mind the guidelines that the DBA team has laid out.

So who get’s to decide the final design choices in your shop? The architect? The DBA? Or someone else?

I suppose it depends on who you ask.

Or you could decide for yourself after listening to a DBA and an architect discuss why they feel they should be the one to decide. Come listen to Karen Lopez (blog | @datachick) and me present “Database Design Throwdown: The Blunder Games” at ER World on Tuesday October 23rd. It will be the last tune-up we have before we lay it all on the line at the PASS Summit in November where the winner gets a meat gift basket and the loser has to walk home.

I Bet Firefighters Would Make Great DBAs is a post from: SQLRockstar | Thomas LaRock

I Bet Firefighters Would Make Great DBAs

Tuesday, October 2nd, 2012

I bet he’d make an excellent DBA.

If your house were on fire who would you call?

Would you call the architect first? Or would you call the fire department? Of course you would.

Yet when your application performance is so poor as to be unusable, who do you call first? The architect? No! You call the database administrator. We’re the ones to put out the fires and restore performance to normal.

Much like a fireman, a good DBA is able to put out those corporate fires quickly while the architects are still busy deciding things like what bottle of wine to have with lunch.

Many people agree that good database performance starts with good database design. If that is true then why don’t the data architects get blamed for bad performance? I’m guessing it’s because they aren’t on-call like a DBA, probably from dining at restaurants that don’t allow cell phones to be used.

That being said, if and when you want your house built you don’t contract with the fire department, you call a builder. But the fire department does have a say into the guidelines that must be followed for home construction. The fire codes are there to protect not only the homeowner but the firefighters that will respond to fight the fire.

Your shop should have similar guidelines from your DBA regarding production application code. You let your architects design and build the system, but they should do so while keeping in mind the guidelines that the DBA team has laid out.

So who get’s to decide the final design choices in your shop? The architect? The DBA? Or someone else?

I suppose it depends on who you ask.

Or you could decide for yourself after listening to a DBA and an architect discuss why they feel they should be the one to decide. Come listen to Karen Lopez (blog | @datachick) and me present “Database Design Throwdown: The Blunder Games” at ER World on Tuesday October 23rd. It will be the last tune-up we have before we lay it all on the line at the PASS Summit in November where the winner gets a meat gift basket and the loser has to walk home.

I Bet Firefighters Would Make Great DBAs is a post from: SQLRockstar | Thomas LaRock

10 Ways to Avoid Datatype Mismatches

Monday, September 24th, 2012

How well do you know your data?

I’m not talking about user names, passwords, or credit card numbers. I don’t expect you to know the values stored inside the rows and tables of your database (although I *was* once called out for not knowing such details so apparently there are some people out there that expect the DBA to know every piece of information).

What I am asking you here has to do with datatypes. You know what those are, right? They are how you define your data when creating a table. They have fancy names like NVARCHAR, TIMESTAMP, and VARBINARY.

And most folks have no idea which ones they should be using. How confident are you that your systems have been designed by someone that took the time and effort to choose the datatypes wisely?

I’m willing to wager that you have systems in your care, right now, that have a less than optimal selection of datatypes for the actual data being stored. As such I have put together a list of ten ways for you to right-size your datatypes and your data.

The items below apply to relational database platforms as a whole, but many of the scripts will be focused on SQL Server. I’ve tried to make note of that where appropriate. If you have a script you’d like me to link to just let me know and I can include them in an update to this post. Just leave a comment at the end.

1. Understand the storage requirements for every datatype you consider

You need to know and understand the implications of choosing one datatype versus another. The storage implication between an INTEGER and BIGINT is 4 bytes, which may not seem like a lot but those 4 bytes can add up fast given that you are likely to have more than one column with the wrong datatype and millions of rows in the table.

Here’s a list to some database platforms that you may find useful:

2. Review all design decisions based on the shape of the data – where it is now and where it is likely to be later.

When designing a database you really need to understand how the data will move through your system; how it gets inserted, how it gets stored, and how it gets used. Once you have those details it will be easier for you to know you have made the right decision.

3. Set datatypes based on business requirements, not tool defaults

Different tools will use different defaults. If you create a new table in a database using SQL Server Management Studio (SSMS) you may very well end up with columns defined as NCHAR(10) on that table. If you then go and create a stored procedure using SSMS you will find that the default datatype choices are INTEGER, which will result in a likely mismatch of datatypes for the table you just created. Don’t rely on the defaults to build your system.

4. Measure and monitor fit of the data to its datatypes regularly

You can use my script to look inside a SQL Server database for integer values that may need to have their datatypes adjusted. Or you can run this script to check the datatypes currently residing in memory, as those datatypes are likely to be the ones you should focus on adjusting first. In either case you are being proactive in the measuring and monitoring of the data matching the defined datatype.

5. Review each index creation request to see if it is duplicate

You want to avoid adding unnecessary indexes to your database, as each one adds additional overhead for any delete, update, or insert statement. One of the easiest ways to achieve this is to review the index creation scripts before you execute them against the database. In other words, don’t just take scripts from your developers and deploy them to production without taking the time to review them.

Most people think of data governance as just something for tables and columns, but the right datatype and index choice will impact user satisfaction for both data quality and performance.

6. Measure for unused and duplicate indexes regularly

Reviewing the scripts before the indexes are created is great, but there’s a good chance you will not always be able to catch everything. You will need to monitor for duplicate indexes at regular intervals. I have a script that I have used in the past for finding duplicate indexes inside of SQL Server. Running that script helps me to find and eliminate wasted space inside of databases.

7. Review new stored procedures to verify parameters are matching

This could be a tad time consuming, but the benefits should be worth the effort over time. Datatype mismatches will often result in inefficient query plans to be chosen, that means you get poorer performance. If you can help avoid poor performance by simply reviewing and verifying the parameters defined inside of a stored procedure before it is deployed, why wouldn’t you take the time? I would also want to remind you that not all queries are part of a stored procedure, so you should be mindful of the parameter declarations for ad-hoc statements as well.

8. Find longest running and most often used queries

You can find which queries are being executed most frequently inside your server instance as well as the queries that have the highest duration. I like to use the diagnostic queries provided by Glenn Berry (blog | @GlennAlanBerry) in order to get the details from the DMVs. You can get a full copy of the SQL2012 version here. By finding the queries that are executed most often or have the highest duration you can prioritize the right-sizing of datatypes that are being actively used.

9. Look for implicit conversions in your plan cache

You can scrub your plan cache to find the queries that had implicit conversion warnings in the query plan. Here is a great SQL Server script from Jonathan Kehayias (blog | @SQLPoolboy) that will help you do just that.

10. Remember that size matters

Despite everything you may have been told while in college, size does matter.

That’s my list of ten ways to help you right-size your data and datatypes. They have served me well over the years, helping me to keep my environments running with as little waste as possible.

10 Ways to Avoid Datatype Mismatches is a post from: SQLRockstar | Thomas LaRock