Tuesday, April 12, 2011

SQL Server 2005 XML Programming

Understanding the SQL Server 2005 XML Data Type XML has been widely adopted as a platform-independent mechanism for representing data, and is also commonly used to exchange data between disparate and loosely-coupled systems, such as B2B applications and workflow solutions. More recently, XML has been used to represent semi-structured (as well as unstructured) data such as documents and emails. If information in these models has to be queried, then XML is probably the simplest way to represent such information.
For example, if documents are represented in XML, it is very easy to write a DOM or XPATH query to extract the contents of a section titled, for example, "Recipe". Many applications also require other pieces of information that typically reside in a database. For example, consider an application that allows a user to personalize the look and feel of an application. It is not uncommon to store the preferences of the user against the user record in the database. XML lends itself well to storing such unstructured information to cope with an ever expanding list of user preferences.
When we store this kind of semi-structured or unstructured information as XML in the database, it is not always feasible to extract the content to the application tier and then process the XML using XML parsers. If the database platform is able to provide native XML processing capabilities then that is a massive advantage. In that way we can also take advantage of other database capabilities such as query optimization, indexing, backups and, most importantly of all, interoperability with relational data.
Fortunately, SQL Server 2005 now natively supports an XML data type and enterprise applications that rely heavily on XML processing can take advantage of these native capabilities. This article, the first in a series that defines and critically analyzes XML support in SQL Server 2005, will provide:
  • A brief overview of the basic techniques for getting XML into and out of SQL Server. This section provides basic code examples and descriptions, together with links for further reading.
  • An introduction to the new XML data type, how to define XML columns and variables and load them with data
  • How to define typed XML columns using XML schema
In Part II, I will describe how to query the content of the XML data type using XQuery.
XML Support in SQL Server 2000 SQL Server 2000 provided both server-side and client-side XML support and it is useful to briefly review these capabilities in order to fully appreciate how SQL Server 2005 extends and improves them. We will focus on the server-side capabilities in this article, since XML management in the database is a relatively new phenomenon. For further information, MSDN provides a very useful Survey of SQL Server 2000 XML Features article.
SQL Server 2000 Server-Side XML The major elements of server-side XML support in SQL Server 2000 are as follows:
  1. The creation of XML fragments from relational data using the FOR XML extension to the SELECT statement
  2. The ability to shred XML data, using the OPENXML function, so that it can be imported into relational tables.
  3. Storing XML data natively in the database.
FOR XML The FOR XML extension allows the creation of XML from relational data. It supports several "modifiers" that dictate the shape of the resulting XML fragment. Following is the full syntax of the FOR XML clause:
FOR XML {RAW | AUTO | EXPLICIT}
  [, XMLDATA]
  [, ELEMENTS]
  [, BINARY BASE64]
The following table explains each of the available options:
Clause
Description
XML When specified, the results of a query are returned as an XML document. One of the three options, from RAW, AUTO and EXPLICIT, must be specified
RAW Transforms each row in the result set into an XML element with a generic identifier <row /> as the element tag
AUTO Returns query results in a simple, nested XML tree. For each table in the FROM clause, for which there is at least one column listed in the SELECT clause, an XML element is created. The columns are mapped to the appropriate element attributes
EXPLICIT Specifies the shape of the resulting XML tree explicitly. Using this mode, queries must be written in a particular way so that additional information about the nesting is specified
XMLDATA Returns the schema, but does not add the root element to the result
ELEMENTS Specifies that the columns are returned as sub-elements to the table element. Otherwise, they are mapped as attributes
BINARY BASE64 Specifies that the query returns the binary data in binary base64-encoded format
The following query uses AUTO mode and returns as an XML fragment information about a particular author. Note that we are using the pubs sample database that is shipped with SQL Server2000.
USE pubs
GO
SELECT au_id, au_lname, au_fname
  FROM authors
  WHERE au_id = '409-56-7008'
  FOR XML AUTO, ELEMENTS
This query returns the following output:

OPENXML
The OPENXML function provides a rowset view over an XML document. Following is the full syntax of the OPENXML command:
OPENXML(idoc int [in],rowpattern nvarchar[in],[flags byte[in]])
[WITH (SchemaDeclaration | TableName)]
Basically, the input to OPENXML is a "handle" to an XML document that is generated by making a call to the sp_xml_preparedocument procedure, and then a "row pattern" which is an XPATH used to identify the nodes in the XML document that are to be processed as rows. We can also specify the mapping information between the XML document and the relational rowset that is generated. The following example shows a simple usage of OPENXML:
DECLARE @xmlDoc NVARCHAR(200)
DECLARE @handle INT
SET @xmlDoc = N'
  <authors>
    <au_id>409-56-7008</au_id>
    <au_lname>Bennet</au_lname>
    <au_fname>Abraham</au_fname>
  </authors>
'
EXEC sp_xml_preparedocument @handle OUTPUT, @xmlDoc
SELECT * FROM OPENXML (@handle, '/authors', 2) WITH 
  (au_id VARCHAR(20),
   au_lname NVARCHAR(20),
   au_fname NVARCHAR(20)
  )
EXEC sp_xml_removedocument @handle
The nodes are processed based on the rowset schema declaration provided after the WITH clause. The "2" in the query represents that we want to use element-centric mapping. The output of the above command is as follows:

You can refer to the full syntax of OPENXML, along with other complex examples, in the OPENXML
section of SQL Server 2000 Books Online. Storing XML in the Database
SQL Server 2000 also allows you to store XML documents and fragments in the database. However, since SQL Server 2000 does not support an XML data type, the XML needs to be stored in a TEXT, NTEXT or IMAGE column. When there is a need to process this XML, applications have to retrieve this XML into the application tier and then process the same using the MSXML APIs.
SQL Server 2000 Client Side XML
Client-side XML support in SQL Server 2000 comes in the form of SQLXML. Following is a brief summary of the main SQLXML technologies:
  • XML Views, which provide a bidirectional mapping between XML schemas and underlying relation tables. Essentially, mapping annotations are added to the XML Schema that defines your particular business object. This produces an an XML View that allows you to present the underlying data in a hierarchical, semi-structured format. A nice example of this is given in the previously-cited Survey of SQL Server 2000 XML Features article. XML views also support querying using XPATH and packaging the result as XML.
  • Creation of XML Templates that allow creation of dynamic sections in XML documents. FOR XML queries can be embedded within the XML document, and/or XPATH expressions over mapping queries. When the template is executed, the query block is replaced with the result of the query.
In SQL Server 2000, there are two ways to access SQLXML functionality:
  1. Via a SQLXMLOLEDB Provider that exposes SQLXML functionality through ADO
  2. Via HTTP access through an ISAPI filter. Using a configuration tool, you can set up a web site to receive incoming requests to execute XML templates, FOR XML and XPATH statements, and then apply transformations to the results
SQL Server 2005 XML Enhancements Although SQL Server 2000 provides quite extensive XML support, as described, the major drawback is that it is not native, and also does not provide all of the features that traditional XML processing requires. Some of the limitations are:
  1. No support for XML schemas in the database for validating XML documents
  2. XML is only stored as text and thus cannot be queried using XPATH syntax. The only option is to extract the XML to the application tier using ADO (or other data access APIs) and perform manipulations
  3. Results generating XML cannot be stored in variables inside of procedures and functions. The only option available is to stream it to the client tier using FOR XML
  4. Support only for XPATH 1.0 queries. The poses restrictions on some of the types of queries that can be written
However, in spite of these limitations, the XML features of SQL Server 2000 are extensively used and there are many enterprise applications that actively depend on it.
SQL Server 2005 builds removes these limitations and offers XML as a full-fledged native XML data type along with querying capabilities. Enhancements have also been made to the FOR XML syntax (see, for example, What’s new in FOR XML in SQL Server 2005). SQLXML 4.0 also brings with it many improvements. In this article, we will focus on the basics of the new XML data type.
The XML Data Type The XML data type is a built-in data type similar to other built-in types in SQL Server 2005. You can define XML columns, XML variables, XML parameters for procedures and functions, and can also return XML as return values from functions. Internally, the XML data type is stored in a proprietary binary format. The maximum permissible size of the XML content is 2GB.
Defining XML Variables and Columns To declare a variable of type XML is as simple as this:
DECLARE @xmlDoc XML In order to create an XML column (using untyped XML – more on this later), we could use the following:
CREATE TABLE rssFeeds
(
  feedName VARCHAR(20) NOT NULL PRIMARY KEY,
  generatedDate DATETIME DEFAULT GETDATE(),
  feedXML XML
)
Loading Data into XML Columns and Variables Having defined an XML variable or column, we can store information in it via the following methods:
  1. Assigning a string to the XML variable / column
  2. Using the output of the SELECT…FOR XML clause
  3. Using bulkload
Simple String Assignment
We can load a string into our @xmlDoc variable as follows:
SET @xmlDoc = '<contact><name>Bob Smith</name></contact>'
SELECT @xmlDoc
Note that string content is implicitly converted to the XML type. You can also explicitly convert the string to XML as shown:
SET @xmlDoc = CAST ('<contact><name>Bob Smith</name></contact>' AS XML)
SELECT @xmlDoc
In each example, we then select this XML content. If you run this example in SQL Server Management Studio with the "results to grid" option enabled, the XML output will be as shown below:

Notice that the XML content is hyperlinked and if you click this link, a new window opens and shows the entire XML document. This is a great improvement over SQL Server 2000 Query Analyzer, which would show the results of XML queries as a single line of output and also truncate the results if the result size is greater than 8192 bytes.
We can insert regular string content into the feedXML column of our rssFeeds table using the following method:
INSERT INTO rssFeeds (feedName, feedXML) VALUES
  (‘MSDN’, ‘<feedXML/>’)
Using FOR XML
In the following example, we initialize the @xmlDOC variable using a SELECT…FOR XML statement:
USE AdventureWorks
GO
DECLARE @xmlDoc XML
SET @xmlDoc =
  (
  SELECT name, salesYTD FROM sales.salesterritory
  FOR XML AUTO, ELEMENTS
  )
SELECT @xmlDoc
The output of the above command is as follows (only partial results are shown. To see the full results, you need to click on the hyperlink in Management Studio):

Streaming results from FOR XML statements directly into XML variables is a great improvement over SQL Server 2000 (which did not allow this). Also note that the above example returns an XML fragment and not an XML document. If you need to return an XML document, you can use the new ROOT directive in the FOR XML command as shown:
USE AdventureWorks
GO
DECLARE @xmlDoc XML
SET @xmlDoc =
  (
  SELECT name, salesYTD FROM sales.salesterritory
  FOR XML AUTO, ROOT ('territories'), ELEMENTS
  )
SELECT @xmlDoc
Executing the above command produces the following output (again, only partial results are shown):
Bulkloading XML data using the OPENROWSET Function
SQL Server 2005 provides extensions to the OPENROWSET
function to bulk load an XML document from the file system. The following example shows how to bulkload our @xmlDoc variable using this technique:
DECLARE @xmlDoc XML
SET @xmlDoc = (
  SELECT * FROM OPENROWSET (
    BULK 'C:\Temp\rss.xml', SINGLE_CLOB
  ) AS xmlData
)
SELECT @xmlDoc
The RSS.XML file that is referred to in the above query is a sample RSS XML that I generated from the MSDN site. To do this, you can query the MSDN RSS site at: http://msdn.microsoft.com/rss.xml and then save results as XML from Internet Explorer.
The output of the above command is as follows:

Similarly, the following code uses the OPENROWSET command to load an XML document into our feedXML column:
GO
INSERT INTO rssFeeds (feedName, feedXML)
SELECT 'MSDNRSS', xmlData FROM
  (
  SELECT * FROM OPENROWSET
  (BULK 'C:\Temp\rss.xml', SINGLE_CLOB)
  AS xmlData
  ) AS feed (xmlData)
GO
SELECT * FROM rssFeeds
The following output is generated:

Typing XML
Up to this point, the XML content that we have used is referred to as untyped XML. An untyped XML variable/column is one that is not associated with a schema. If we associate a schema with an XML type, it is then referred to as typed XML. Once an XML variable/column is typed using a schema, any data that is inserted into it is automatically validated against the schema, and SQL Server 2005 raises an error if the data does not conform to the defined schema rules.
It is usually recommended to associate schemas to XML content in SQL Server, since query optimization can use the information and also it provides better operational semantics. For example, we can perform arithmetic operations on numbers, but not strings. If no schema is attached to an XML, then all values are considered to be strings. MSDN
provides further information about schemas and their maintenance. Creating typed XML involves two basic steps: 1) Create an XML schema and 2) Register it in the database.
Creating a Schema Using tools such as Visual Studio 2005 and XML Spy, creating a schema is pretty simple. The following link contains some excellent tutorials on this topic: Creating Schemas in Visual Studio. Using the tool of your choice, create the following schema definition:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema
   id="Contacts"
   targetNamespace="http://www.srisamp.com/contacts.xsd"
   elementFormDefault="qualified"
   xmlns="http://www.srisamp.com/contacts.xsd"
   xmlns:mstns="http://www.srisamp.com/contacts.xsd"
   xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:complexType name="contact">
        <xs:sequence>
            <xs:element name="name" type="xs:string" />
        </xs:sequence>
    </xs:complexType>
    <xs:element name="contacts">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="contact" type="contact" />
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>
This schema defines an XML document of the form:
<contacts><contact><name></name></contact></contacts>
Save this schema in a file called ContactsSchema.xsd in the C:\Temp folder.
Registering a Schema in SQL Server 2005 You register a schema in SQL Server 2005 using the CREATE XML SCHEMA COLLECTION command. The following script loads our XML schema into the database.
IF EXISTS (SELECT * FROM sys.xml_schema_collections
      WHERE [name] = 'MyContacts')
   DROP XML SCHEMA COLLECTION MyContacts
GO
DECLARE @contactsSchema XML
SET @contactsSchema =
   (
      SELECT * FROM OPENROWSET
         (BULK 'C:\Temp\ContactsSchema.xsd', SINGLE_CLOB)
            AS xmlData
   )
CREATE XML SCHEMA COLLECTION MyContacts AS @contactsSchema
GO
In the following code, we type an XML variable to the MyContacts schema and then assign both valid and invalid content to the XML (by invalid, we mean that the content does not adhere to the schema):
DECLARE @xmlDoc XML (MyContacts)
BEGIN TRY
   -- This entry conforms to our schema rules
   SET @xmlDoc = '
      <contacts xmlns="http://www.srisamp.com/contacts.xsd">
         <contact>
            <name>Bob Smith</name>
         </contact>
      </contacts>
   '
   SELECT @xmlDoc
 
   -- This one does not conform to the schema
   SET @xmlDoc = '
      <contacts xmlns="http://www.srisamp.com/contacts.xsd">
         <contact>
            <names>Bob Smith</names>
         </contact>
      </contacts>
   '
   SELECT @xmlDoc
END TRY
BEGIN CATCH
   PRINT 'Error at line '+CAST(error_line() AS VARCHAR)+'. Message: '+error_message()
END CATCH
Note how we have declared the XML variable. The schema reference is specified in parentheses after the variable. We have also used the new TRY…CATCH error handling syntax of SQL Server 2005 to capture the error and display the corresponding message.
When you execute the above snippet, the following output is generated:

You should also receive the following error message (I have wrapped the error message to make it readable):


Conclusion That brings us to the end of the first part of this series on exploring the XML features of SQL Server 2005. In the subsequent articles, we will see how to query XML data using the new XQuery language and also how to migrate XML data from existing SQL Server 2000 systems into SQL Server 2005. We will also see how a client application can work with the new XML data type in SQL Server 2005.

No comments:

Post a Comment