Introduction
Last August I looked to add the LINQ library to a SQL Server Integration Services (SSIS) Script Task. Why? Two reasons: First, because if you’ve looked any LINQ, the features of it, the language, and most importantly why it exists, you know that it is the bomb for allowing you to do many things when working with data. These features are well advertised - it works with all types of data (XML, SQL, Excel, objects, etc.) and accords to the developer the opportunity to focus on the data rather than learning a new API or language, allows one to remove the “plumbing” out of whatever you may be connecting to, has built in type-checking, and others. The Linq syntax is very simple, straightforward, and concise, and I thought that this would be perfect to use in SSIS. After all, don’t we connect to a myriad of data sources in our various connection managers? Isn't SSIS all about connecting to "disparate" data sources? Wouldn’t it be nice to have only one line worth of code to connect to SQL Server in an SSIS script component? Second, because “it’s there” and I wanted to do some investigation. So this blog is about SSIS and LINQ.
And, if you happened to catch a previous post on "Why I Blog", this blog fits the category of research and experimentation. I’m just geeking here -this is neither a tutorial, a how-to, or a lesson.
Get your groove on
Set up the data source first, so we don’t have to go back and do this later. Add this to SQL Server:
USE [Test]
GO
/****** Object: View [dbo].[vData] Script Date: 03/06/2010 08:42:16 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE VIEW [dbo].[vData]
as
SELECT *
FROM
(SELECT cast(x.ID as varchar (20)) as ID, x.Guid_ID
FROM
(
SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY z.ID) AS ID,
CAST(NEWID() as varchar (100)) as Guid_ID
FROM sys.syscolumns x, sys.syscolumns y, sys.syscolumns z
ORDER BY 1
) x
) y
GO
Create a table where we will land our result set:
USE [Test]
GO
/****** Object: Table [dbo].[LinqTest] Script Date: 03/06/2010 09:17:34 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[LinqTest](
[ID] [varchar](20) NULL,
[GUID_ID] [varchar](100) NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
Let’s open up the SQL Server 2008 Business Intelligence Development Studio (BIDS) and create a new SSIS project. Add a script task to the Control Flow, open it, right click on references, and add System.Data.Linq. (If yours is grayed out, then your project isn't set up with the .Net 3.5 framework. Go ahead and do Project>>Properties, and change the Target Framework. Now you should be able to add it). See Figure 1.
Figure 1 BIDS with Linq library
Right click on the project and select Linq to SQL Classes, and add this to the project. Mine says DataClasses2.dbml, and added automatically are two other files underneath it, DataClasses2.dbml.layout and DataClasses2.designer.cs. This will allow us to map the entity, and let .Net create the classes, properties, and methods for our entity. This is the Object Relational Designer ORM tool. Now all we need to do is find an entity (table or view). Here, I run into a problem though, because when I try to add an entity from SQL Server (what I want to map to) I get in my Database Explorer only Data Connections; there is no "Server Explorer", and it gives me the option to only add the following:
Figure 2 No SQL Server, no Server Explorer in VSTA. Ugh.
This is a problem. I don't want to add a database file to SQL Server Express, akin to when you are working with ASPNET database, ASPSTATE, or one of the other databases used in web development. No, I want to connect to SQL Server and access the view that we just created. I'm going to have to revisit this, because at first go I don't see how I can do this. There is on option to access my SQL Servers in the script task. So, the next best thing is to fire up Visual Studio, create a class, and make a custom component that exposes Linq; then I can reference this library in SSIS. Let's try this route.
Create a new class, and do the same thing that we just did – add Linq, and add the Linq classes. Open up the DataClasses1.dbml file, open Server Explorer, find you view that we just created, and drag the view into the workspace. It should look like this:
Figure 3. The Object Relational Designer
Now, create a new class and add the following code.
using System.Data.Linq;
using System.Data.Linq.Mapping;
using System.Data;
using System.Collections.Generic;
using System.Reflection;
using System.Linq;
using System.Linq.Expressions;
using System.ComponentModel;
using System;
namespace Console
{
public class DataRepository
{
public static IEnumerable<vData> GetData()
{
DataClasses1DataContext db = new DataClasses1DataContext();
IEnumerable<vData> query = from c in db.vDatas
select c;
return query;
}
}
}
This is the class that creates the method that we’ll reference to our view from above, vData, that implements IEnumerable interface. This interface is very important in SSIS because we can consume any data that uses this. It’s a good idea to learn this interface by the way, because it is used throughout SSIS behind the scenes. I created my class name hoping to use a partial class, but it didn’t work. Something else to investigate. The reason that I do this is because I don’t want to change any of the code that the designer created. Why? Right on. If I do, then when I go change the entity it spins up all new classes, and overwrites my code. Notice the warnings not to change the code in Figure 4.
Notice the simplicity of the Linq query above. Here I’m just fetching all of the columns in the view; available is a bunch of other TSQL-comparable operations, so check that out in the Linq to SQL documentation for more on particulars with the query language. Finally, we return a vData object.
At this point, compile the project, create a strong key, and GAC this baby. If you don’t know how to do this, go see my ever-so-popular blog entitled Create a Custom Assembly in SSIS in another post. (As a shortcut, by the way, simply drag the .dll into C:\Windows\Assembly after creating your strong key name – no need to do the command line stuff).
Figure 4. The Data Classes Designer
I thought this was an SSIS blog entry
Ok we’re finally here. Let’s open up BIDS now and go back to that project that we created. Create a variable now in SSIS of type object, and then add the following code to your SSIS script task. Now, open up your script task and add a reference to the component that you just created. My name is “Console.dll” (a bad name, by the way I know):
Figure 5. Addition of the custom assembly
Add the following code to your script task. Note: do not copy the namespace here, because yours is different than the below. Pluck out the using clauses and the remainder of the code and you should be ok.
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Data.SqlClient;
using Console;
using System.Collections.Generic;
namespace ST_5ca62ff84d714027842f55eb01cbe963.csproj
{
[System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks
.ScriptTask.VSTARTScriptObjectModelBase
{
#region VSTA generated code
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion
public void Main()
{
IEnumerable<vData> query = DataRepository.GetData();
Dts.Variables["RecordsetVar"].Value = query;
SqlConnection _conn = DefineConnection();
string _sql;
foreach (Console.vData item in Dts.Variables["RecordsetVar"].Value
as System.Collections.Generic.IEnumerable<Console.vData>)
{
_sql = "INSERT INTO LinqTest VALUES ('" + item.ID + "'," + "'"
+ item.Guid_ID + "')";
ExecSQL(_conn, _sql);
}
Dts.TaskResult = (int)ScriptResults.Success;
}
private static void ExecSQL(SqlConnection _conn, string _sql)
{
SqlCommand cmd = new SqlCommand(_sql, _conn);
cmd.CommandTimeout = 0;
cmd.ExecuteNonQuery();
}
private SqlConnection DefineConnection()
{
SqlConnection _conn = (SqlConnection)Dts.Connections["OLE"]
.AcquireConnection(null);
_conn = new SqlConnection(_conn.ConnectionString);
_conn.Open();
return _conn;
}
}
}
Run the project and then view the result in SSMS and view the results.
Figure 6. Result set in SQL Server
Conclusion
Some observations here. First, one of our goals was to avoid the “plumbing” of making connections to data that is one of the features of Linq, but we’ve just broken this rule by adding all of the above to consume this data source. Look at what we had to do in order to get to our data - I tried to add my variable of type object to all sorts of places (Foreach container for example) but had a difficult time getting to my dataset, and for now this was the only way that I could figure out how to get to my data, besides simply throwing it to a message box or opening up a TextWriter and sending the results to a text file. This is an area for more investigation as well. Second, the speed of this operation is slow, much slower than simply connecting to the data via the SSIS stock connections, which I anticipated. Third, because I had to create a custom assembly, this workaround indicates to me that Microsoft did not want SSIS developers consuming data from an Object Relational Designer.
Linq is very cool – with regards to using it in SSIS, this excursion indicates that it is not an option to go to in this context. I will continue with experimentation on the subject, so look for a future post down the road if you are curious about this implementation.
Thanks for reading,
Lee Everest, M.S.
Flickr Tags:
Linq in SSIS
