Author Archive

Iterating Over Columns in the SSIS Script Component

Tuesday, May 3rd, 2011
Just recently a question was posed in the MSDN SSIS Forums about how to create a "generic" script component that would read all columns for each row passed through it in the Data Flow.  The standard impetus for this kind of behaviour is a logging or auditing step in the Data Flow.  Now - my first reaction is that you should probably be looking to code a custom component so that you aren't copying and pasting script code everywhere...
But I know that not everyone is comfortable coding a custom component from scratch - you have to be a code addict like me to find joy in that.  So here's a little "gateway drug" for those of you who are itching for the capability.  Perhaps a little taste of it will expose you to the script's shortcomings, and lead you to build your own lab to cook up a custom component to do this job properly.
Location, Location, Location
The big key here, as mentioned by Darren Green in an earlier post on the subject, is where to put code to do this.  It's not in one of the automatically generated stubs - you'll have to tread into slightly less familiar territory - the ProcessInput method override.  The standard stubs (PreExecute, PostExecute, and Input0_ProcessInputRow) are all useless for our purposes - you can safely delete them or ignore them.  Why?  PreExecute doesn't yet have any information about the buffer, and PostExecute is too late (obviously).  Input0_ProcessInputRow is called once per row... but with a carefully wrapped row object that presents the columns in nicely named properties.  There's no way to access the collection of columns.  Only ProcessInput has access to the buffer in a condition that allows iterating over the columns.
Paste This Code
public override void ProcessInput(int InputID, PipelineBuffer Buffer)
{
  bool fireAgain = true;
  ComponentMetaData.FireInformation(0, "", 
    Buffer.ColumnCount.ToString() + " columns", 
    "", 0, ref fireAgain);

  while (Buffer.NextRow())
  {
    for (int columnIndex = 0; 
      columnIndex < Buffer.ColumnCount; 
      columnIndex++)
    {
      string columnData = null;
      if (Buffer.IsNull(columnIndex))
      {
        columnData = "is NULL";
      }
      else
      {
        BufferColumn columnInfo = Buffer.GetColumnInfo(columnIndex);
        switch (columnInfo.DataType)
        {
          case DataType.DT_BOOL:
            columnData = Buffer.GetBoolean(columnIndex).ToString();
            break;

          case DataType.DT_WSTR:
            columnData += Buffer.GetString(columnIndex);
            break;

          // add code to support more data types here

          default:
            columnData = "";
            break;
        }
      }
      ComponentMetaData.FireInformation(0, "", 
        "Column " + columnIndex.ToString() + ": " + columnData, 
        "", 0, ref fireAgain);
    }
  }
  base.ProcessInput(InputID, Buffer);
}

What's Happening In There?
It doesn't take much explanation - but every little bit helps. 
Initially, a loop has to be constructed to iterate over all the rows we're given.  Do remember that this isn't ALL the rows coming through your Data Flow - it's just one buffer's worth.  This method will get called several times (unless you have very few rows in your flow).
The column then gets checked to see if it's NULL... because NULLs cause quite a problem when you don't expect to see them.
After that, a little inquiry is made to ask for some information about the column.  This bit of code can definitely be optimized out of the loop - it is a burden on the system to ask for it for each and every row!
The reason for getting a little information about the column is apparent in the next block - the switch statement that handles different datatypes differently.  You can extend the code to handle the data types you expect.
Finally, after the loop, don't forget to call the base ProcessInput method.  Why?  Press F12 on that call and you'll see - the base code handles marking the processing as being complete when you've finished seeing all the rows.
Variations On This Theme
The above code isn't the only way to get this done.  You can do away with manually detecting the column data type and simply call ToString on the buffer's indexer - as in:
string columnData = Buffer[columnIndex].ToString();

But do still beware of nulls and other odd results.  Using the "Get" methods specific to the data type do perform faster, and are safer in the long run.

Transforming SSIS’ sysssislog Entries Into Something More Readable

Tuesday, April 26th, 2011
The format of sysssislog in SSIS 2005 and 2008 isn't great for human consumption, so what follows is an attempt to reformat it for easier digestion.  It isn't perfect - and here's hoping that the new Denali (the next version of SQL Server) server-side toolset will improve the situation it as much as I think it will.
Raw sysssislog Table Listing
My difficulty is with the nature of the logging - it "bubbles up" the same message to each containing task. The result is that the same message is reported multiple times in the log, which doesn't make it very concise (for reading). There's no doubt that it does make it easier to see everything that happened to or within a specific task - but that's not typically how I read the logs.  Issuing a simple query such as:
SELECT id, event, starttime, source, message FROM sysssislog ORDER BY id

Results in a rowset like this (click to enlarge):
My difficulty with the above is that (in this case) there are two rows per "message" - and that's just a simple demo package.  More typically, I've got five or more nested containers, resulting in a lot of repetition.
The Distilled Query
My desire was to distill the log records such that there was only one row per "message" instead of several.  You can't simply group by the message, of course. You have to group only adjacent entries and also (preferably) still show the involvement of each level of the Task hierarchy, because that context information is still valuable.
What I've arrived at is the following, which will distill the last 1000 entries in the log into something less wordy.  It's still not ideal - tasks executing in parallel get mashed together.  Those of you who are experts in T-SQL and windowing functions will no doubt be able to optimize this.  Please post a comment if you do!
WITH 
  LOG_SUBSET AS
  (
    SELECT TOP 1000 id, REPLACE(event, 'User:', '') AS event, starttime, source, message
      FROM sysssislog
      ORDER BY id DESC
  ),
  ORDERED_LOG_SUBSET AS
  (
    SELECT TOP 1000 id, event, starttime, source, message 
      FROM LOG_SUBSET
      ORDER BY id
  ),
  EVENTTYPED_LOG AS
  (
    SELECT id, event, starttime, source, message,
      HASHBYTES('SHA1', CAST(starttime AS CHAR(20)) + event + message) AS eventtype_group_key
    FROM ORDERED_LOG_SUBSET
  ),
  CONTIGUOUS_EVENT_GROUPED_LOG AS
  (
    SELECT id, event, starttime, source, message, eventtype_group_key, 
      ROW_NUMBER() OVER (ORDER BY id) AS s1,
      ROW_NUMBER() OVER (PARTITION BY eventtype_group_key ORDER BY id) AS s2,
      ROW_NUMBER() OVER (ORDER BY id) - ROW_NUMBER() OVER (PARTITION BY eventtype_group_key ORDER BY id) AS sequence
    FROM EVENTTYPED_LOG
  ),
  GROUPED_LOG AS
  (
    SELECT id, event, starttime, source, message, eventtype_group_key, s1, s2, sequence,
      HASHBYTES('SHA1', CAST(eventtype_group_key AS CHAR(36)) + CAST(sequence AS CHAR(10))) AS event_group
    FROM CONTIGUOUS_EVENT_GROUPED_LOG
  )SELECT MIN(id) AS id, event, starttime, message,
    SUBSTRING(REPLACE(REPLACE(
      (
        SELECT ' < ' + source
           FROM GROUPED_LOG AS Y
          WHERE X.event_group = Y.event_group
          GROUP BY source
          ORDER BY MIN(id) DESC
            FOR XML PATH('')
      ), 
      '&lt;', '<'), '&amp;', '&'), 3, 2000) AS source_list
  FROM GROUPED_LOG AS X
  GROUP BY event, starttime, message, event_group
  ORDER BY MIN(id)

A sample of the resulting rowset (click to enlarge):
It's not fantastic, but it does improve my diagnostic tracing a little bit.
Thanks to information from Boneist and Tom Cooper on grouping, and Anith Sen on pivoting.

Transforming SSIS’ sysssislog Entries Into Something More Readable

Tuesday, April 26th, 2011
The format of sysssislog in SSIS 2005 and 2008 isn't great for human consumption, so what follows is an attempt to reformat it for easier digestion.  It isn't perfect - and here's hoping that the new Denali (the next version of SQL Server) server-side toolset will improve the situation it as much as I think it will.
Raw sysssislog Table Listing
My difficulty is with the nature of the logging - it "bubbles up" the same message to each containing task. The result is that the same message is reported multiple times in the log, which doesn't make it very concise (for reading). There's no doubt that it does make it easier to see everything that happened to or within a specific task - but that's not typically how I read the logs.  Issuing a simple query such as:
SELECT id, event, starttime, source, message FROM sysssislog ORDER BY id

Results in a rowset like this (click to enlarge):
My difficulty with the above is that (in this case) there are two rows per "message" - and that's just a simple demo package.  More typically, I've got five or more nested containers, resulting in a lot of repetition.
The Distilled Query
My desire was to distill the log records such that there was only one row per "message" instead of several.  You can't simply group by the message, of course. You have to group only adjacent entries and also (preferably) still show the involvement of each level of the Task hierarchy, because that context information is still valuable.
What I've arrived at is the following, which will distill the last 1000 entries in the log into something less wordy.  It's still not ideal - tasks executing in parallel get mashed together.  Those of you who are experts in T-SQL and windowing functions will no doubt be able to optimize this.  Please post a comment if you do!
WITH 
  LOG_SUBSET AS
  (
    SELECT TOP 1000 id, REPLACE(event, 'User:', '') AS event, starttime, source, message
      FROM sysssislog
      ORDER BY id DESC
  ),
  ORDERED_LOG_SUBSET AS
  (
    SELECT TOP 1000 id, event, starttime, source, message 
      FROM LOG_SUBSET
      ORDER BY id
  ),
  EVENTTYPED_LOG AS
  (
    SELECT id, event, starttime, source, message,
      HASHBYTES('SHA1', CAST(starttime AS CHAR(20)) + event + message) AS eventtype_group_key
    FROM ORDERED_LOG_SUBSET
  ),
  CONTIGUOUS_EVENT_GROUPED_LOG AS
  (
    SELECT id, event, starttime, source, message, eventtype_group_key, 
      ROW_NUMBER() OVER (ORDER BY id) AS s1,
      ROW_NUMBER() OVER (PARTITION BY eventtype_group_key ORDER BY id) AS s2,
      ROW_NUMBER() OVER (ORDER BY id) - ROW_NUMBER() OVER (PARTITION BY eventtype_group_key ORDER BY id) AS sequence
    FROM EVENTTYPED_LOG
  ),
  GROUPED_LOG AS
  (
    SELECT id, event, starttime, source, message, eventtype_group_key, s1, s2, sequence,
      HASHBYTES('SHA1', CAST(eventtype_group_key AS CHAR(36)) + CAST(sequence AS CHAR(10))) AS event_group
    FROM CONTIGUOUS_EVENT_GROUPED_LOG
  )SELECT MIN(id) AS id, event, starttime, message,
    SUBSTRING(REPLACE(REPLACE(
      (
        SELECT ' < ' + source
           FROM GROUPED_LOG AS Y
          WHERE X.event_group = Y.event_group
          GROUP BY source
          ORDER BY MIN(id) DESC
            FOR XML PATH('')
      ), 
      '&lt;', '<'), '&amp;', '&'), 3, 2000) AS source_list
  FROM GROUPED_LOG AS X
  GROUP BY event, starttime, message, event_group
  ORDER BY MIN(id)

A sample of the resulting rowset (click to enlarge):
It's not fantastic, but it does improve my diagnostic tracing a little bit.
Thanks to information from Boneist and Tom Cooper on grouping, and Anith Sen on pivoting.

Transforming SSIS’ sysssislog Entries Into Something More Readable

Tuesday, April 26th, 2011
The format of sysssislog in SSIS 2005 and 2008 isn't great for human consumption, so what follows is an attempt to reformat it for easier digestion.  It isn't perfect - and here's hoping that the new Denali (the next version of SQL Server) server-side toolset will improve the situation it as much as I think it will.
Raw sysssislog Table Listing
My difficulty is with the nature of the logging - it "bubbles up" the same message to each containing task. The result is that the same message is reported multiple times in the log, which doesn't make it very concise (for reading). There's no doubt that it does make it easier to see everything that happened to or within a specific task - but that's not typically how I read the logs.  Issuing a simple query such as:
SELECT id, event, starttime, source, message FROM sysssislog ORDER BY id

Results in a rowset like this (click to enlarge):
My difficulty with the above is that (in this case) there are two rows per "message" - and that's just a simple demo package.  More typically, I've got five or more nested containers, resulting in a lot of repetition.
The Distilled Query
My desire was to distill the log records such that there was only one row per "message" instead of several.  You can't simply group by the message, of course. You have to group only adjacent entries and also (preferably) still show the involvement of each level of the Task hierarchy, because that context information is still valuable.
What I've arrived at is the following, which will distill the last 1000 entries in the log into something less wordy.  It's still not ideal - tasks executing in parallel get mashed together.  Those of you who are experts in T-SQL and windowing functions will no doubt be able to optimize this.  Please post a comment if you do!
WITH 
  LOG_SUBSET AS
  (
    SELECT TOP 1000 id, REPLACE(event, 'User:', '') AS event, starttime, source, message
      FROM sysssislog
      ORDER BY id DESC
  ),
  ORDERED_LOG_SUBSET AS
  (
    SELECT TOP 1000 id, event, starttime, source, message 
      FROM LOG_SUBSET
      ORDER BY id
  ),
  EVENTTYPED_LOG AS
  (
    SELECT id, event, starttime, source, message,
      HASHBYTES('SHA1', CAST(starttime AS CHAR(20)) + event + message) AS eventtype_group_key
    FROM ORDERED_LOG_SUBSET
  ),
  CONTIGUOUS_EVENT_GROUPED_LOG AS
  (
    SELECT id, event, starttime, source, message, eventtype_group_key, 
      ROW_NUMBER() OVER (ORDER BY id) AS s1,
      ROW_NUMBER() OVER (PARTITION BY eventtype_group_key ORDER BY id) AS s2,
      ROW_NUMBER() OVER (ORDER BY id) - ROW_NUMBER() OVER (PARTITION BY eventtype_group_key ORDER BY id) AS sequence
    FROM EVENTTYPED_LOG
  ),
  GROUPED_LOG AS
  (
    SELECT id, event, starttime, source, message, eventtype_group_key, s1, s2, sequence,
      HASHBYTES('SHA1', CAST(eventtype_group_key AS CHAR(36)) + CAST(sequence AS CHAR(10))) AS event_group
    FROM CONTIGUOUS_EVENT_GROUPED_LOG
  )SELECT MIN(id) AS id, event, starttime, message,
    SUBSTRING(REPLACE(REPLACE(
      (
        SELECT ' < ' + source
           FROM GROUPED_LOG AS Y
          WHERE X.event_group = Y.event_group
          GROUP BY source
          ORDER BY MIN(id) DESC
            FOR XML PATH('')
      ), 
      '&lt;', '<'), '&amp;', '&'), 3, 2000) AS source_list
  FROM GROUPED_LOG AS X
  GROUP BY event, starttime, message, event_group
  ORDER BY MIN(id)

A sample of the resulting rowset (click to enlarge):
It's not fantastic, but it does improve my diagnostic tracing a little bit.
Thanks to information from Boneist and Tom Cooper on grouping, and Anith Sen on pivoting.

Transforming SSIS’ sysssislog Entries Into Something More Readable

Tuesday, April 26th, 2011
The format of sysssislog in SSIS 2005 and 2008 isn't great for human consumption, so what follows is an attempt to reformat it for easier digestion.  It isn't perfect - and here's hoping that the new Denali (the next version of SQL Server) server-side toolset will improve the situation it as much as I think it will.
Raw sysssislog Table Listing
My difficulty is with the nature of the logging - it "bubbles up" the same message to each containing task. The result is that the same message is reported multiple times in the log, which doesn't make it very concise (for reading). There's no doubt that it does make it easier to see everything that happened to or within a specific task - but that's not typically how I read the logs.  Issuing a simple query such as:
SELECT id, event, starttime, source, message FROM sysssislog ORDER BY id

Results in a rowset like this (click to enlarge):
My difficulty with the above is that (in this case) there are two rows per "message" - and that's just a simple demo package.  More typically, I've got five or more nested containers, resulting in a lot of repetition.
The Distilled Query
My desire was to distill the log records such that there was only one row per "message" instead of several.  You can't simply group by the message, of course. You have to group only adjacent entries and also (preferably) still show the involvement of each level of the Task hierarchy, because that context information is still valuable.
What I've arrived at is the following, which will distill the last 1000 entries in the log into something less wordy.  It's still not ideal - tasks executing in parallel get mashed together.  Those of you who are experts in T-SQL and windowing functions will no doubt be able to optimize this.  Please post a comment if you do!
WITH 
  LOG_SUBSET AS
  (
    SELECT TOP 1000 id, REPLACE(event, 'User:', '') AS event, starttime, source, message
      FROM sysssislog
      ORDER BY id DESC
  ),
  ORDERED_LOG_SUBSET AS
  (
    SELECT TOP 1000 id, event, starttime, source, message 
      FROM LOG_SUBSET
      ORDER BY id
  ),
  EVENTTYPED_LOG AS
  (
    SELECT id, event, starttime, source, message,
      HASHBYTES('SHA1', CAST(starttime AS CHAR(20)) + event + message) AS eventtype_group_key
    FROM ORDERED_LOG_SUBSET
  ),
  CONTIGUOUS_EVENT_GROUPED_LOG AS
  (
    SELECT id, event, starttime, source, message, eventtype_group_key, 
      ROW_NUMBER() OVER (ORDER BY id) AS s1,
      ROW_NUMBER() OVER (PARTITION BY eventtype_group_key ORDER BY id) AS s2,
      ROW_NUMBER() OVER (ORDER BY id) - ROW_NUMBER() OVER (PARTITION BY eventtype_group_key ORDER BY id) AS sequence
    FROM EVENTTYPED_LOG
  ),
  GROUPED_LOG AS
  (
    SELECT id, event, starttime, source, message, eventtype_group_key, s1, s2, sequence,
      HASHBYTES('SHA1', CAST(eventtype_group_key AS CHAR(36)) + CAST(sequence AS CHAR(10))) AS event_group
    FROM CONTIGUOUS_EVENT_GROUPED_LOG
  )SELECT MIN(id) AS id, event, starttime, message,
    SUBSTRING(REPLACE(REPLACE(
      (
        SELECT ' < ' + source
           FROM GROUPED_LOG AS Y
          WHERE X.event_group = Y.event_group
          GROUP BY source
          ORDER BY MIN(id) DESC
            FOR XML PATH('')
      ), 
      '&lt;', '<'), '&amp;', '&'), 3, 2000) AS source_list
  FROM GROUPED_LOG AS X
  GROUP BY event, starttime, message, event_group
  ORDER BY MIN(id)

A sample of the resulting rowset (click to enlarge):
It's not fantastic, but it does improve my diagnostic tracing a little bit.
Thanks to information from Boneist and Tom Cooper on grouping, and Anith Sen on pivoting.

Transforming SSIS’ sysssislog Entries Into Something More Readable

Tuesday, April 26th, 2011
The format of sysssislog in SSIS 2005 and 2008 isn't great for human consumption, so what follows is an attempt to reformat it for easier digestion.  It isn't perfect - and here's hoping that the new Denali (the next version of SQL Server) server-side toolset will improve the situation it as much as I think it will.
Raw sysssislog Table Listing
My difficulty is with the nature of the logging - it "bubbles up" the same message to each containing task. The result is that the same message is reported multiple times in the log, which doesn't make it very concise (for reading). There's no doubt that it does make it easier to see everything that happened to or within a specific task - but that's not typically how I read the logs.  Issuing a simple query such as:
SELECT id, event, starttime, source, message FROM sysssislog ORDER BY id

Results in a rowset like this (click to enlarge):
My difficulty with the above is that (in this case) there are two rows per "message" - and that's just a simple demo package.  More typically, I've got five or more nested containers, resulting in a lot of repetition.
The Distilled Query
My desire was to distill the log records such that there was only one row per "message" instead of several.  You can't simply group by the message, of course. You have to group only adjacent entries and also (preferably) still show the involvement of each level of the Task hierarchy, because that context information is still valuable.
What I've arrived at is the following, which will distill the last 1000 entries in the log into something less wordy.  It's still not ideal - tasks executing in parallel get mashed together.  Those of you who are experts in T-SQL and windowing functions will no doubt be able to optimize this.  Please post a comment if you do!
WITH 
  LOG_SUBSET AS
  (
    SELECT TOP 1000 id, REPLACE(event, 'User:', '') AS event, starttime, source, message
      FROM sysssislog
      ORDER BY id DESC
  ),
  ORDERED_LOG_SUBSET AS
  (
    SELECT TOP 1000 id, event, starttime, source, message 
      FROM LOG_SUBSET
      ORDER BY id
  ),
  EVENTTYPED_LOG AS
  (
    SELECT id, event, starttime, source, message,
      HASHBYTES('SHA1', CAST(starttime AS CHAR(20)) + event + message) AS eventtype_group_key
    FROM ORDERED_LOG_SUBSET
  ),
  CONTIGUOUS_EVENT_GROUPED_LOG AS
  (
    SELECT id, event, starttime, source, message, eventtype_group_key, 
      ROW_NUMBER() OVER (ORDER BY id) AS s1,
      ROW_NUMBER() OVER (PARTITION BY eventtype_group_key ORDER BY id) AS s2,
      ROW_NUMBER() OVER (ORDER BY id) - ROW_NUMBER() OVER (PARTITION BY eventtype_group_key ORDER BY id) AS sequence
    FROM EVENTTYPED_LOG
  ),
  GROUPED_LOG AS
  (
    SELECT id, event, starttime, source, message, eventtype_group_key, s1, s2, sequence,
      HASHBYTES('SHA1', CAST(eventtype_group_key AS CHAR(36)) + CAST(sequence AS CHAR(10))) AS event_group
    FROM CONTIGUOUS_EVENT_GROUPED_LOG
  )SELECT MIN(id) AS id, event, starttime, message,
    SUBSTRING(REPLACE(REPLACE(
      (
        SELECT ' < ' + source
           FROM GROUPED_LOG AS Y
          WHERE X.event_group = Y.event_group
          GROUP BY source
          ORDER BY MIN(id) DESC
            FOR XML PATH('')
      ), 
      '&lt;', '<'), '&amp;', '&'), 3, 2000) AS source_list
  FROM GROUPED_LOG AS X
  GROUP BY event, starttime, message, event_group
  ORDER BY MIN(id)

A sample of the resulting rowset (click to enlarge):
It's not fantastic, but it does improve my diagnostic tracing a little bit.
Thanks to information from Boneist and Tom Cooper on grouping, and Anith Sen on pivoting.

Transforming SSIS’ sysssislog Entries Into Something More Readable

Tuesday, April 26th, 2011
The format of sysssislog in SSIS 2005 and 2008 isn't great for human consumption, so what follows is an attempt to reformat it for easier digestion.  It isn't perfect - and here's hoping that the new Denali (the next version of SQL Server) server-side toolset will improve the situation it as much as I think it will.
Raw sysssislog Table Listing
My difficulty is with the nature of the logging - it "bubbles up" the same message to each containing task. The result is that the same message is reported multiple times in the log, which doesn't make it very concise (for reading). There's no doubt that it does make it easier to see everything that happened to or within a specific task - but that's not typically how I read the logs.  Issuing a simple query such as:
SELECT id, event, starttime, source, message FROM sysssislog ORDER BY id

Results in a rowset like this (click to enlarge):
My difficulty with the above is that (in this case) there are two rows per "message" - and that's just a simple demo package.  More typically, I've got five or more nested containers, resulting in a lot of repetition.
The Distilled Query
My desire was to distill the log records such that there was only one row per "message" instead of several.  You can't simply group by the message, of course. You have to group only adjacent entries and also (preferably) still show the involvement of each level of the Task hierarchy, because that context information is still valuable.
What I've arrived at is the following, which will distill the last 1000 entries in the log into something less wordy.  It's still not ideal - tasks executing in parallel get mashed together.  Those of you who are experts in T-SQL and windowing functions will no doubt be able to optimize this.  Please post a comment if you do!
WITH 
  LOG_SUBSET AS
  (
    SELECT TOP 1000 id, REPLACE(event, 'User:', '') AS event, starttime, source, message
      FROM sysssislog
      ORDER BY id DESC
  ),
  ORDERED_LOG_SUBSET AS
  (
    SELECT TOP 1000 id, event, starttime, source, message 
      FROM LOG_SUBSET
      ORDER BY id
  ),
  EVENTTYPED_LOG AS
  (
    SELECT id, event, starttime, source, message,
      HASHBYTES('SHA1', CAST(starttime AS CHAR(20)) + event + message) AS eventtype_group_key
    FROM ORDERED_LOG_SUBSET
  ),
  CONTIGUOUS_EVENT_GROUPED_LOG AS
  (
    SELECT id, event, starttime, source, message, eventtype_group_key, 
      ROW_NUMBER() OVER (ORDER BY id) AS s1,
      ROW_NUMBER() OVER (PARTITION BY eventtype_group_key ORDER BY id) AS s2,
      ROW_NUMBER() OVER (ORDER BY id) - ROW_NUMBER() OVER (PARTITION BY eventtype_group_key ORDER BY id) AS sequence
    FROM EVENTTYPED_LOG
  ),
  GROUPED_LOG AS
  (
    SELECT id, event, starttime, source, message, eventtype_group_key, s1, s2, sequence,
      HASHBYTES('SHA1', CAST(eventtype_group_key AS CHAR(36)) + CAST(sequence AS CHAR(10))) AS event_group
    FROM CONTIGUOUS_EVENT_GROUPED_LOG
  )SELECT MIN(id) AS id, event, starttime, message,
    SUBSTRING(REPLACE(REPLACE(
      (
        SELECT ' < ' + source
           FROM GROUPED_LOG AS Y
          WHERE X.event_group = Y.event_group
          GROUP BY source
          ORDER BY MIN(id) DESC
            FOR XML PATH('')
      ), 
      '&lt;', '<'), '&amp;', '&'), 3, 2000) AS source_list
  FROM GROUPED_LOG AS X
  GROUP BY event, starttime, message, event_group
  ORDER BY MIN(id)

A sample of the resulting rowset (click to enlarge):
It's not fantastic, but it does improve my diagnostic tracing a little bit.
Thanks to information from Boneist and Tom Cooper on grouping, and Anith Sen on pivoting.

Transforming SSIS’ sysssislog Entries Into Something More Readable

Tuesday, April 26th, 2011
The format of sysssislog in SSIS 2005 and 2008 isn't great for human consumption, so what follows is an attempt to reformat it for easier digestion.  It isn't perfect - and here's hoping that the new Denali (the next version of SQL Server) server-side toolset will improve the situation it as much as I think it will.
Raw sysssislog Table Listing
My difficulty is with the nature of the logging - it "bubbles up" the same message to each containing task. The result is that the same message is reported multiple times in the log, which doesn't make it very concise (for reading). There's no doubt that it does make it easier to see everything that happened to or within a specific task - but that's not typically how I read the logs.  Issuing a simple query such as:
SELECT id, event, starttime, source, message FROM sysssislog ORDER BY id

Results in a rowset like this (click to enlarge):
My difficulty with the above is that (in this case) there are two rows per "message" - and that's just a simple demo package.  More typically, I've got five or more nested containers, resulting in a lot of repetition.
The Distilled Query
My desire was to distill the log records such that there was only one row per "message" instead of several.  You can't simply group by the message, of course. You have to group only adjacent entries and also (preferably) still show the involvement of each level of the Task hierarchy, because that context information is still valuable.
What I've arrived at is the following, which will distill the last 1000 entries in the log into something less wordy.  It's still not ideal - tasks executing in parallel get mashed together.  Those of you who are experts in T-SQL and windowing functions will no doubt be able to optimize this.  Please post a comment if you do!
WITH 
  LOG_SUBSET AS
  (
    SELECT TOP 1000 id, REPLACE(event, 'User:', '') AS event, starttime, source, message
      FROM sysssislog
      ORDER BY id DESC
  ),
  ORDERED_LOG_SUBSET AS
  (
    SELECT TOP 1000 id, event, starttime, source, message 
      FROM LOG_SUBSET
      ORDER BY id
  ),
  EVENTTYPED_LOG AS
  (
    SELECT id, event, starttime, source, message,
      HASHBYTES('SHA1', CAST(starttime AS CHAR(20)) + event + message) AS eventtype_group_key
    FROM ORDERED_LOG_SUBSET
  ),
  CONTIGUOUS_EVENT_GROUPED_LOG AS
  (
    SELECT id, event, starttime, source, message, eventtype_group_key, 
      ROW_NUMBER() OVER (ORDER BY id) AS s1,
      ROW_NUMBER() OVER (PARTITION BY eventtype_group_key ORDER BY id) AS s2,
      ROW_NUMBER() OVER (ORDER BY id) - ROW_NUMBER() OVER (PARTITION BY eventtype_group_key ORDER BY id) AS sequence
    FROM EVENTTYPED_LOG
  ),
  GROUPED_LOG AS
  (
    SELECT id, event, starttime, source, message, eventtype_group_key, s1, s2, sequence,
      HASHBYTES('SHA1', CAST(eventtype_group_key AS CHAR(36)) + CAST(sequence AS CHAR(10))) AS event_group
    FROM CONTIGUOUS_EVENT_GROUPED_LOG
  )SELECT MIN(id) AS id, event, starttime, message,
    SUBSTRING(REPLACE(REPLACE(
      (
        SELECT ' < ' + source
           FROM GROUPED_LOG AS Y
          WHERE X.event_group = Y.event_group
          GROUP BY source
          ORDER BY MIN(id) DESC
            FOR XML PATH('')
      ), 
      '&lt;', '<'), '&amp;', '&'), 3, 2000) AS source_list
  FROM GROUPED_LOG AS X
  GROUP BY event, starttime, message, event_group
  ORDER BY MIN(id)

A sample of the resulting rowset (click to enlarge):
It's not fantastic, but it does improve my diagnostic tracing a little bit.
Thanks to information from Boneist and Tom Cooper on grouping, and Anith Sen on pivoting.

Transforming SSIS’ sysssislog Entries Into Something More Readable

Tuesday, April 26th, 2011
The format of sysssislog in SSIS 2005 and 2008 isn't great for human consumption, so what follows is an attempt to reformat it for easier digestion.  It isn't perfect - and here's hoping that the new Denali (the next version of SQL Server) server-side toolset will improve the situation it as much as I think it will.
Raw sysssislog Table Listing
My difficulty is with the nature of the logging - it "bubbles up" the same message to each containing task. The result is that the same message is reported multiple times in the log, which doesn't make it very concise (for reading). There's no doubt that it does make it easier to see everything that happened to or within a specific task - but that's not typically how I read the logs.  Issuing a simple query such as:
SELECT id, event, starttime, source, message FROM sysssislog ORDER BY id

Results in a rowset like this (click to enlarge):
My difficulty with the above is that (in this case) there are two rows per "message" - and that's just a simple demo package.  More typically, I've got five or more nested containers, resulting in a lot of repetition.
The Distilled Query
My desire was to distill the log records such that there was only one row per "message" instead of several.  You can't simply group by the message, of course. You have to group only adjacent entries and also (preferably) still show the involvement of each level of the Task hierarchy, because that context information is still valuable.
What I've arrived at is the following, which will distill the last 1000 entries in the log into something less wordy.  It's still not ideal - tasks executing in parallel get mashed together.  Those of you who are experts in T-SQL and windowing functions will no doubt be able to optimize this.  Please post a comment if you do!
WITH 
  LOG_SUBSET AS
  (
    SELECT TOP 1000 id, REPLACE(event, 'User:', '') AS event, starttime, source, message
      FROM sysssislog
      ORDER BY id DESC
  ),
  ORDERED_LOG_SUBSET AS
  (
    SELECT TOP 1000 id, event, starttime, source, message 
      FROM LOG_SUBSET
      ORDER BY id
  ),
  EVENTTYPED_LOG AS
  (
    SELECT id, event, starttime, source, message,
      HASHBYTES('SHA1', CAST(starttime AS CHAR(20)) + event + message) AS eventtype_group_key
    FROM ORDERED_LOG_SUBSET
  ),
  CONTIGUOUS_EVENT_GROUPED_LOG AS
  (
    SELECT id, event, starttime, source, message, eventtype_group_key, 
      ROW_NUMBER() OVER (ORDER BY id) AS s1,
      ROW_NUMBER() OVER (PARTITION BY eventtype_group_key ORDER BY id) AS s2,
      ROW_NUMBER() OVER (ORDER BY id) - ROW_NUMBER() OVER (PARTITION BY eventtype_group_key ORDER BY id) AS sequence
    FROM EVENTTYPED_LOG
  ),
  GROUPED_LOG AS
  (
    SELECT id, event, starttime, source, message, eventtype_group_key, s1, s2, sequence,
      HASHBYTES('SHA1', CAST(eventtype_group_key AS CHAR(36)) + CAST(sequence AS CHAR(10))) AS event_group
    FROM CONTIGUOUS_EVENT_GROUPED_LOG
  )SELECT MIN(id) AS id, event, starttime, message,
    SUBSTRING(REPLACE(REPLACE(
      (
        SELECT ' < ' + source
           FROM GROUPED_LOG AS Y
          WHERE X.event_group = Y.event_group
          GROUP BY source
          ORDER BY MIN(id) DESC
            FOR XML PATH('')
      ), 
      '&lt;', '<'), '&amp;', '&'), 3, 2000) AS source_list
  FROM GROUPED_LOG AS X
  GROUP BY event, starttime, message, event_group
  ORDER BY MIN(id)

A sample of the resulting rowset (click to enlarge):
It's not fantastic, but it does improve my diagnostic tracing a little bit.
Thanks to information from Boneist and Tom Cooper on grouping, and Anith Sen on pivoting.

More Changes for the Dimension Merge SCD?

Monday, March 14th, 2011
A while back, I ran a contest to find a new name for the SCD component hosted on CodePlex.  The results were the best "lemons into lemonade" situation I could have expected.  Now I've got some other news I can share that should extend the reach and capabilities of every Data Warehouse ETL developer that uses Integration Services. 
This Time, No Lemons... Porkbellies Instead
One of the problems I'd always had with offering a component on CodePlex was that it meant some enthusiastic would-be users couldn't end up using it in their production environments.  The reasons they were unable to use the component weren't technical, they were business-related:
  • Open-source project software was avoided due to (valid) longevity concerns.
  • In-house development capabilities weren't sufficient to provide support, should a problem occur.
  • "Untrusted" vendor code wasn't permitted on production servers.
The "easy" solution?  Offer support for the component myself, building up that trust level and support network.  But I've got a very nice day job, thank you, and not nearly enough free time - not "easy" for me, I'm afraid.  The bottom line is that offering support myself doesn't make sense for me, or a good business proposition for those potential users.  I just don't have the economies of scale to work with, or the track record.
So if I can't do it, who can turn those porkbellies into tasty bacon?
Heat Up the Frying Pan
One of the component's ardent enthusiasts for a few years now has been Joe Salvatore.  He's helped with testing and has provided feedback on improvements for several releases.  Just recently, he started with a new consulting company, and apparently the guy is pretty vocal.  ;)  I don't know the whole story, but I can imagine Joe did a demonstration or two, some quick talking, and twisted someone's arm.  Questionable karaoke footage could have factored into the equation as well.
So who's attention did Joe attract that could smoke and cure a little raw porkbelly into something you can actually sink your teeth into?  Brian Knight and his crew at Pragmatic Works, that's who.  This is the Brian Knight that's written quite a few books, has a successful consulting company, has contributed voluminous amounts of time to the community by serving on the PASS board, co-founding SQL Saturdays and SQLServerCentral.com.  You know Brian's all about the bacon for the SQL community.
Soft or Crispy?
If you're looking for a dimension table loading experience in Integration Services that's fast, flexible, and maintainable, you can't look to in-the-box Microsoft tools today.  They're continuing to stand pat with what they have in that department - arguably a very servicable tool in the SCD Wizard.  But if that's not good enough for you, and you're not keen on re-inventing the wheel, you now have very viable choices.  How do you like your 3rd party components?  Open-source or commercial?  Community forums or paid support?  This partnership gets you what you need, the way you want it.
Pragmatic Works is working on improving the component in ways that work for them and their customers; they'll be including it in their Task Factory set of tools.  That's a great option for the more "conservative" organizations out there.  But if that's not you, you're not left out in the cold.  The free-to-use open-source project on CodePlex is independent, and will continue on its own - that's the beauty of open source and business working together.  Brian knows there's a real gap in the Dimension Merge SCD story today - the lack of vendor support.  Even if he didn't change a stitch of code, providing that support adds value for his customers that they're willing to pay for.  There's no need for any kind of exclusivity in this arrangement - the value is self-evident.
Order Up!
I'd like to give a hearty thanks to Brian Knight and everyone that works with him.  I'll be helping them out as much as I can to explain the spaghetti code they're trying to understand, but I'm confident they'll be providing a Dimension Merge SCD that works even better and easier than what I've made so far - and they'll stand behind it.  Brian didn't have to even talk to me to get this done - he could have freely downloaded the code and used it as he wished.  But he's a community-minded guy.  He reached out to me for my support and I really appreciate it.  Thanks, Brian!