New BI Webcast with Kevin Kline and Douglas McDowell on Sept 18th

Filed under: Analysis Services — KKline at 2:50 pm on Thursday, September 11, 2008

New to Business Intelligence (BI) on the SQL Server platform?  Learn about the value of adding BI to your skillset. 

From the registration page - “Join SQL Server expert Kevin Kline and Managing Director of Solid Quality Mentors, Douglas McDowell, in this webcast to learn what BI functionality is available on SQL Server.”

Find all the details and register at:

http://www.quest.com/events/listdetails.aspx?contentid=7880

SQL Server Data Mining in the Cloud

Filed under: Analysis Services — Brent Ozar at 8:06 am on Wednesday, August 27, 2008

The SQL Server 2008 Data Mining team is showing off some slick new technology at the KDD 2008 show in Las Vegas.  You didn’t get your trip to Vegas approved, you say?  Well, it pays to read Jamie’s Junk, also known as “The official blog of the SQL Server Analysis Services Development Manager(tm).”  He wrote a blog entry called KDD 2008 and Incredibly Awesome SQL 2008 Data Mining Demos, and I spent some time working with it to find out whether he was blowing things out of proportion.

At http://www.sqlserverdatamining.com/cloud/ you can play with a browser-based SQL Server Analysis Server.  Upload a CSV file or use one of their samples, and analyze away.  You’ll get a quick feel of some amazing technology, but for me, the real winner is their Excel 2007 Table Analysis Tools for the Cloud plugin, and I’ll show you how to use it.  (Go download that and install it fast before they figure out how popular it’s going to get.)

Cleaning Up the Source Data

Step 1: The Perfmon CSV File

Step 1: The Perfmon CSV File

Let’s start with a simple file that I have to analyze all the time: results from Performance Monitor.  We’ve written a lot here about how to analyze Performance Monitor results to see where your SQL Server bottlenecks are, but one of the tough steps is to figure out when your server is undergoing peak loads, and what’s causing those loads.

At right, check out the raw results of Perfmon: a plain CSV file.  We’re going to clean it up a little, but I want to show you the “before” to understand what we need to clean up.  We’re not going to do any heavy lifting, just a couple of tweaks.

We want the headers to be readable by human beings, because we’re going to use these as labels.  In cell A1, change the label to “Date”, and in the rest of the cells in row 1, do a search & replace to strip out the \\MYSERVERNAME\ label.  (If you have multiple servers in your Perfmon data, leave the server names in, of course.)

Step 2 - Added Calculated Date Fields

Step 2 - Added Calculated Date Fields

Format the date column, A, to show the date and time instead of the number format.  That’s the datestamp when the performance data was captured.  To enhance our analysis, we want to add a couple of date calculations.  SQL Server performance often depends on the day of the week and the time of day - weekends might have lower performance needs than weekdays, and we might have daily jobs at a certain time that stress the system.  Add two new columns for Day of Week (using the WeekDay function) and Hour of Day (using the Hour function).

Finally, we need to tell Excel that this huge amount of numbers is actually a table.  Click on any cell that has data (not an empty cell off to the side), click on the Home tab in the Office Ribbon at the top of the screen, and click on the Format as Table button in the ribbon.  Click on any color style that pops up - I’m partial to Table Style Medium 2 myself.  Excel will highlight the entire data range to confirm, and make sure the “My table has headers” box is checked.  Click OK.

Now would be a good time to save our work - save the file as an Excel file, not a CSV file.

Connecting To The Cloud

Step 3 - The Ribbon Tools

Step 3 - The Ribbon Tools

Assuming the “Table Analysis Tools for the Cloud” add-in has been installed successfully, the ribbon will change.  Click on the “Analyze (in the Cloud)” tab, and SQL Server Analysis Services options will appear.

Time to make the magic happen.

Let’s find out what kinds of load we have on this server by clicking on the Detect Categories button in the ribbon.  Accept the defaults, and click Run, and a warning will pop up:

“Table Analysis Tools for the Cloud needs to connect to the Internet to process your request. Your spreadsheet data will be uploaded for analysis using a secure protocol, then destroyed as the analysis completes. The operation cannot complete without your permission.  To learn more about the privacy of your data read out Privacy Statement”

Time to take a quick break and talk privacy.  This Excel add-in is going to connect to Microsoft’s servers, upload your data, run some analysis, and then send your data back to you via the internet.

Your data.  Someone else’s servers.  The internet.  Stop to think about that for a second, especially if you’re:

  • A public company or subject to SOX compliance
  • Dealing with health care data
  • Dealing with financial data
  • Analyzing your passwords spreadsheet
  • Slicing and dicing your little black book

This service may be secure in the future, but right now we’re talking about a demo, not an officially sanctioned Microsoft product with a EULA.  Be smart - only use this for dummy demo data for now.

Enough Warnings - Show Me The Categories!

Step 4 - Performance Categories

Step 4 - Performance Categories

Click Run, and the magic happens: the data is uploaded and analyzed.  It may take a few minutes depending on the size of your data source.  (I have no idea what kind of server horsepower is on the other end.)

The result is a list of categories.  In the screenshot at right, I have 10 categories of load.  About 60% of the performance monitoring times fell into the first two categories.  Let’s go through the categories individually to see what they have in common.  The category details are on the bottom half of the spreadsheet, and you can switch from one category to the next by clicking on the filter icons in the Category header of the spreadsheet.

My interpretation of the results are:

  • Category 1 - Low Activity with <8 Connections - during these times, we have less than 8 SQL connections, a lot of available memory, and low disk activity.  What’s interesting to me is that there’s not a consistent Hour of Day finding for this one.
  • Category 2 - Low Activity 8-13 Connections - during these times, we have a few active users, and we’re seeing some very minimal storage activity, but we’re still in great shape on memory.
  • Category 3 - Low Activity with C Drive Load - for some reason, about 11% of the time our server has low SQL Server activity, but the operating system drive is seeing some activity.  Maybe another application is running on our SQL Server, or maybe someone is remote desktopping into the system to do things.
  • Category 4 - Sundays 2pm-6pm - some kind of job must be running during this time frame, because SQL Server connections stay low but I/O goes up significantly.
  • Category 5 - High Connections, Heavy Load - about 5% of the time, this server has a high number of connections, heavy TempDB drive access, and high CPU loads.

Below category 5, each category represents 3% or less of the server’s time, so those wouldn’t concern me right away.

Bottom line: this server has very peaky demand cycles, only lasting for very short bursts and it doesn’t appear to be focused on a single time of the workday.  My priorities as a DBA would be:

  • Find out what’s happening on Sundays from 2pm-6pm, and ask the application owners if they want help to make that code to run faster
  • Look at the types of queries that are running in Category 5, the 5% of the time when the server is under heavy load.  My guess is that this is a new server with a new application, and not a lot of users are hitting it yet.  When the app catches on and gains users, this server won’t be able to handle the load due to the individual queries being tough on TempDB.  We need to find out why those queries are hammering TempDB, and either optimize them or speed up the TempDB drive.

Using SQL Server data mining in the cloud, I was able to figure this out in a matter of minutes - instead of manually slicing and dicing the numbers in Excel or pushing this data into my own SQL Server Analysis Server.  This would save me time and money, making me more productive and a better DBA.

But wait - there’s more.  That’s just one button in the ribbon.

Analyze Key Influencers

Let’s find out what else seems to be happening in the system at the same time our TempDB usage is high.  Back on the source data tab of the spreadsheet, click the Analyze Key Influencers button in the Analyze (in the Cloud) ribbon tab.  Choose “Physical Disk 2_T Disk Time” as the data to analyze, because in this system, TempDB is isolated on the T drive.

Step 5 - TempDB Influencers

Step 5 - TempDB Influencers

Click on the “Choose columns to be used for analysis” link and uncheck the other counters for Physical Disk 2_T” because we know they’ll be high at the same time Disk Time is high.  Those aren’t influencers.  Then click Run.

When the results come back, scroll over to the far right.  You’ll notice several sets of colored bar graph results.  Scroll down, because we want to find what influences TempDB activity to be high - indicated by the “Favors” column being a high number.  We want to find what favors high TempDB activity.

In our particular example, we can see that Category 8 is the highest TempDB load.  Innnnteresting - earlier, I’d discarded categories over 5 because they represented a low amount of time, but it looks like we’ll have to go back and revisit category 8. Category 8 was Friday mornings from midnight to 4am, so we might have something going on then.

Other factors include high activity on our log drive (L), medium CPU load (11-34%), and so on.  I bet TempDB’s logs are also on the L drive, shared with the rest of the databases, and I would have to verify that before further investigation.  If TempDB’s logs are on that L drive, then it would make sense that they’d have high activity when the TempDB data is being written.

In Two Words: Incredibly Awesome

I’ve barely scratched the surface of what this thing can do, and already I can see that Jamie was right - this thing is incredibly awesome.  This demo just showed Perfmon data - something interesting to me personally, but not to most of humanity.  Imagine what it can do in the hands of an accounting analyst or a sales manager with actual financial data, people who could raise sales or cut costs.  Very impressive.

I want to loudly congratulate the SQL Server Data Mining team for building something so cool and really pushing the edge of cloud analytics.  I get genuinely excited about where cloud-based databases are going, and I’m thrilled that Microsoft might bring this to market.  The faster we can analyze data with less setup time and less fixed infrastructure costs, the more powerful the DBA will become.

Performance Monitor Counters for SQL Server Part II…well actually Part III

Filed under: Analysis Services, Tuning and Optimization — KKline at 3:39 pm on Thursday, April 3, 2008

Hi everybody, I just wanted to go ahead and post the third part of the four part “screencast” series that I did through TechTarget.  Again, if you’ve ever struggled with knowing which PerfMon counters to track for SQL Server performance, then check out this series and get answers to questions including:

  • How effective is PerfMon with storage area networks (SANs)?
  • What’s the I/O difference between non-cached writes and cached writes?
  • How can I utilize PerfMon counters for tracking disk queue length?
  • Why shouldn’t I forget the effect of checkpoints on DASD reads and writes?
  • I hope it’s helpful to you and would welcome your feedback, so feel free to leave your comments.

     Thanks alot and you can find this third part of the series here titled “Monitor SQL Server disk I/O with PerfMon counters“.

    How can I clear the Analysis Services cache without restarting?

    Filed under: Analysis Services, Transact-SQL (T-SQL) — Ari Weil at 3:23 am on Thursday, November 15, 2007

    You can use an XMLA Script like the following (replacing “{YOUR SSAS DB ID HERE}” with the valid Database ID:

    <ClearCache xmlns=”http://schemas.microsoft.com/analysisservices/2003/engine”>
    <Object>
    <DatabaseID>{YOUR SSAS DB ID HERE}</DatabaseID>
    </Object>
    </ClearCache>

    The nice thing about this approach is you can test a specific statement’s execution by prefacing it with this command.

    Why does our SQL Server crash with multiple connections?

    Filed under: Analysis Services, Notification Services — Jason at 5:11 pm on Friday, October 5, 2007

    Q:  Our team has 4 people, 2 of which are web developers who use PHP to get data by using stored procedures from database. When the four of us use the system at the same time in the afternoon, SQL Server will come down. Also, at the same time, we can’t open ‘Enterprise Manager’. If we wait about 20 minutes, the system will be OK. We don’t know why this happens - why does the system lock? How can we check this out? (We are running SQL Server 2000)

    Jason Hall says:  There are a couple of things that may be going on.

    1) It is possible that your SQL Server or Windows OS license only allows a certain number of CALs (Client Access Licenses). If your license is capped at 4 connections, SQL Server will reject any connection made untill one of the connected users disconnects. I would check your SQL Server licensing and check your server configuraiton parameters to see if anything has been set to limit the number of concurrent connections.

    2) If there is nothing in the license or server properties that would be limiting connections it is possible that the system is so overloaded that a new connection cannot be processed.
    If you cannot even make a connection to the server than there is something seriously wrong with your database server. I would recommend running a diagnostic test of SQL Server to try and get a handle on what is causing these performance bottlenecks. Possibly some query is consuming so much CPU that it is causing the server to hang and reject new connections. Try running a Microsoft Profiler trace, this might help in isolating where the system hang is originating from.

    Technorati Tags:
    , , ,