XML Support Overview

From SQLServerPedia

Jump to: navigation, search

See Also: Main_Page - Business Intelligence

Extensible Markup Language (XML) is another approach to data interchange. Recommended by World Wide Web Consortium (W3C) and strongly supported by Microsoft (among many other vendors) it promises to be the way data will be exchanged for many years to come. Similar to HTML, which makes up the skeleton of web pages, XML consists of tags. Unlike HTML, with XML you can define your own tags for handling different data.

SQL Server 2000 is one of the first database engines to provide native XML support. Now, all you have to do is provide an additional clause in SELECT Statement and data will be automatically given to you in XML format.

You can also configure SQL Server and Internet Information Server (IIS), so that users can query SQL Server databases straight from their web browser. Native XML support in SQL Server is a very powerful feature. It virtually lets the companies exchange the data over the web without any client-side programming. Unfortunately there are some major security issues with XML support as provided with SQL Server 2000. These issues will allow your users to learn your database structure, modify it or even delete your database altogether if they have appropriate permissions. Therefore, you have to be extra careful when configuring security and allowing access to SQL Server through IIS.

Along with running SELECT statements with FOR,the XML clause to read the SQL Server data, you can also read XML documents with T-SQL. This is accomplished by the OPENXML function. OPENXML lets a developer read an XML document and treat it as a relational data source, or as it is sometimes referred to a ROWSET provider. Starting with SQL Server 7 you could write queries against other relational databases, and other supported OLE DB providers. Now you can also query XML files, although the procedure is different.

Let's look at a quick example before discussing the syntax elements. The following query reads the XML string we provide into SQL Server memory (with sp_preparedocument), transforms a portion of this string into a rowset, and then unloads the string from memory (with SP_XML_REMOVEDOCUMENT):

DECLARE @idoc int  
DECLARE @doc varchar(1000)  
SET @doc ='  <northwind>  <Customer CustomerID="VINET" ContactName="Paul Forrester" Company="Centrum">     <Order CustomerID="VINET" EmployeeID="5" >        <OrderDetail OrderID="10248" ProductID="11" Quantity="12"/>        <OrderDetail OrderID="10248" ProductID="42" Quantity="10"/>     </Order>  </Customer>  <Customer CustomerID="LILAS" ContactName="Carl Gladwell" Company="Rollodex">     <Order CustomerID="LILAS" EmployeeID="3">        <OrderDetail OrderID="10283" ProductID="72" Quantity="3"/>     </Order>  v/Customer>  </northwind>'  
--Create an internal representation of the XML document.  
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc  
-- Execute a SELECT statement that uses the OPENXML rowset provider.  
SELECT    *
  FROM       OPENXML (@idoc, 'northwind/Customer/Order',1)
  WITH (Customer_ID  varchar(10) '@CustomerID',
  Employee_ID varchar(20) '@EmployeeID',
  Contact_Name varchar(20) '../@ContactName')  
  EXEC sp_xml_removedocument @idoc

Results:

Customer_ID Employee_ID          Contact_Name   
----------- -------------------- --------------------   
VINET       5                    Paul Forrester  
LILAS       3                    Carl Gladwell
(2 row(s) affected)

The "idoc" parameter provides the XML document to be queried. The developer has to provide the XML string. Once the XML string is specified and passed to sp_xml_preparedocument, SQL Server will generate internal representation of this string that can be queried.

The "rowpattern" parameter specifies the query context or the innermost tag that needs to be queried. In my example if the row pattern is "Northwind/Customer/Order". That is why the query output will show attributes of Orders tag without further adjustments. Anything above Orders tag, on the other hand for example the Customers tag attributes have to be retrieved with "../" notation. So when I needed to retrieve the contact name attribute, I had to use

Contact_Name varchar(20) '../@ContactName'.


Similarly, if we wish to retrieve the company attribute along with the contact name, all we have to do is extend the FROM clause of the query as follows:

FROM       OPENXML (@idoc, 'northwind/Customer/Order',1)
          WITH (Customer_ID  varchar(10) '@CustomerID',
		  Employee_ID varchar(20) '@EmployeeID',
		  Contact_Name varchar(20) '../@ContactName',
		  Company_Name varchar(15) '../@Company')


Results:

Customer_ID Employee_ID Contact_Name         Company_Name   
----------- ----------- -------------------- ---------------   
VINET       5           Paul Forrester       Centrum  
LILAS       3           Carl Gladwell        Rollodex
(2 row(s) affected)


As you can tell, schema declaration ("with" clause) provides formatting of the output. In other words it specifies the column names and aliases for column names. Notice that XML attributes are preceded with the @ sign. Keep in mind that XML is case sensitive, so had we provided @employeeId instead of @EmployeeID we would get NULL values in the Employee_ID column.

What if we need to retrieve attributes that are nested more than one level deep? That's easily done by adding another layer of "../". Let's change the query rowpattern to "northwind/Customer/Order/OrderDetail" and rerun the query with a slightly different schema declaration:

-- using the same XML string as in the previous query  
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc    
SELECT    *
  FROM       OPENXML (@idoc, 'northwind/Customer/Order/OrderDetail',1)
  WITH (Order_ID int '@OrderID',
  Customer_ID  varchar(10) '../@CustomerID',
  Employee_ID varchar(10) '../@EmployeeID',
  Contact_Name varchar(20) '../../@ContactName',
  Company_Name varchar(15) '../../@Company')  
EXEC sp_xml_removedocument @idoc


Results:

Order_ID    Customer_ID Employee_ID Contact_Name         Company_Name   
----------- ----------- ----------- -------------------- ---------------   
10248       VINET       5           Paul Forrester       Centrum  
10248       VINET       5           Paul Forrester       Centrum  
10283       LILAS       3           Carl Gladwell        Rollodex
(3 row(s) affected)


The "flags" parameter specifies whether the query is identifying attributes or elements within an XML document. This is important, because when retrieving elements we will no longer have to use an @ sign when defining the element to be retrieved. Consider following example:

DECLARE @idoc int  
DECLARE @doc varchar(1000)  
SET @doc ='  <RevealNet>  <Product>  <ProductName>Transact SQL </ProductName></Product>  <Product>  <ProductName>Oracle PL / SQL</ProductName></Product>  </RevealNet>'    
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc    
ELECT    *  FROM
       OPENXML (@idoc, 'RevealNet/Product',2)
	   WITH (Product_Name varchar(20) 'ProductName')    
EXEC sp_xml_removedocument @idoc


Results:

Product_Name   
--------------------   
Transact SQL  Oracle PL / SQL
(2 row(s) affected)


Notice that this query would not have worked if we had added @ with "ProductName" - it would simply return NULL values. Since we provided column mapping, whether the flag is set to 1 or 2 does not really matter. However, had we not specified column mapping this query would return NULL values with flags = 1:

-- using the same XML string as above  
SELECT *  
        FROM OPENXML (@idoc, 'RevealNet/Product',1)
        WITH (ProductName varchar(20))


Results:

ProductName   
--------------------   
NULL  
NULL
(2 row(s) affected)


The Table name option can be used if there is a table with the column names you wish to see in the output in the same database. If that is indeed the case only column names are imported from the table - table data is not queried. For example, following query uses the structure of discounts table from pubs database:

DECLARE @idoc int  
DECLARE @doc varchar(1000)  
SET @doc ='  <discounts>  <discount discounttype="new customer" stor_id="1234" lowqty="2" highqty="10"   discount="0">  </discount>  </discounts>'    
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc      
SELECT    *
  FROM       OPENXML (@idoc, 'discounts/discount',1)          
  WITH discounts    
EXEC sp_xml_removedocument @idoc


Results:

discounttype                         stor_id lowqty highqty discount   ------------------------------------ ------- ------ ------- --------   new customer                         1234    2      10      .00
(1 row(s) affected)


If you don't specify the column mapping in the SELECT statement and you're not using an existing table then an "edge table" is generated. This table can be thought of as a metadata table for the parsed XML documents. For example, following query builds an edge table for my XML string:

DECLARE @idoc int  
DECLARE @doc varchar(1000)  
SET @doc ='  <RevealNet><Product>  <ProductName>Transact SQL </ProductName></Product>  <Product>  <ProductName>Oracle PL / SQL</ProductName></Product>  </RevealNet>'
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc
SELECT *  FROM OPENXML (@idoc, 'RevealNet/Product',2)


Results (formatted for readability):

Id parent id node type local name prefix namespaceuri datatype Prev Text
2 0 1 Product NULL NULL NULL NULL NULL
3 2 1 ProductName NULL NULL NULL NULL NULL
6 3 3 #text NULL NULL NULL NULL Transact SQL
4 0 1 Product NULL NULL NULL 2 NULL
5 4 1 ProductName NULL NULL NULL NULL NULL
7 5 3 #text NULL NULL NULL NULL Oracle PL / SQL

The edge table provides names of each tag as well as parent / child relationships among the tags. Node type column specifies whether each node is an element (1), an attribute (2) or a string of text (3). If the node is text then its value is placed in text column.

XML support in SQL Server 2000 provides a relatively easy alternative to transferring data over the web. With FOR XML extension to the SELECT statement (discussed in SQL Primer) developers can easily translate SQL Server data into XML that can be published on the web. With OPENXML function a client application can pick up the published XML string and convert it back to rows and columns. In addition, SQL Server 2000 lets you configure web access to your data through local Internet Information Server.