Wednesday, February 23, 2011

Reading and Writing XML in .NET 2.0




"Using XmlReaderSettings, XmlReader, and the Static Create Methods"


It must be tough for companies that develop software for working with XML. No sooner do they get a product out of the door, the World Wide Web Consortium (W3C) changes the recommendations and standards so that their product is out of date. Yet the manufacturers still have to maintain backward compatibility with their previous releases, while attempting to encompass all the new standards. We've seen this several times before in Microsoft's XML product space, and the process shows little sign of stabilizing yet.

OK, so the base specification for XML itself, version 1.0, is complete, stable and implemented in almost all products now. But recent advances in technologies such as XML Query Language (XQuery - see http://www.w3.org/XML/Query) and the XML Information Set (XML InfoSet - see http://www.w3.org/TR/xml-infoset/) require changes to core classes in the System.Xml namespace with each release of the Framework, to keep up with evolving standards.

When version 1.0 of the .NET Framework was introduced, it brought with it a whole raft of new techniques for working with XML. This included a new pull-model parser, the XmlReader, new XML document objects such as XmlDocument, XmlDataDocument and XPathDocument, new classes for working with schemas, and a brand new XSL-T processor. Now, at the time of writing, version 2.0 has just appeared (this article is based on the Beta 2 release). And after the preamble above, you won't be surprised to learn that there are a great many changes in the release compared to version 1.x.

In this series of three articles, we'll look in detail at how the new features of the XmlReader and XmlWriter classes in version 2.0 of the .NET Framework can be used to read and write XML documents, and interact with the new XML document store objects. This includes:

  • The new "settings" classes and static Create methods for XmlReader and XmlWriter
  • Creating and using an XmlReader to read and validate XML documents and fragments
  • Two of the useful new features of the XmlReader class
  • Creating and using an XmlWriter to write XML documents and fragments
  • Some useful new features of the XmlWriter class
  • How the XmlReader and XmlWriter can be used with the XmlDocument class
  • Some of the useful new features of the XmlDocument class

Along the way, we'll look into the issues involved in using the new classes, the reasoning behind the changes, and how the new features simplify your code and provide better overall efficiency for your applications. This first article concentrates on the XmlReader class, and how the new XmlReaderSettings class makes it easy to create XmlReader instances with specific properties such as validation and access control for use in your applications.

The New "Settings" Classes for XmlReader and XmlWriter

To read or write XML in version 1.x, you can create an instance of a class that inherits from XmlReader or XmlWriter, such as XmlTextReader or XmlTextWriter, and then set various properties before using that reader or writer. The XmlReader and XmlWriter classes are abstract, and so you cannot create instances of them directly. And each time you need a reader or writer, you have to go through the same process of creating an instance and setting the properties.

In version 2.0, the fundamental technique for creating readers and writers has changed. There are two new classes named XmlReaderSettings and XmlWriterSettings that you use as a "factory" to generate instances of readers and writers on demand, without having to repeatedly set their properties. This has several benefits in that it:

  • Reduces the code you have to write
  • Allows the framework to make optimizations in the reader or writer based on the settings, for example omitting validation support if this is not required
  • Provides classes that can execute more efficiently in circumstances where the extra features are not required
  • Allows you to create instances of the abstract base classes, rather than having to instantiate classes that inherit from XmlReader or XmlWriter
  • Allows the XmlReader and XmlWriter to be extended in future releases without breaking your code, and therefore removes the need for multiple concrete implementations aimed at different scenarios

The version 2.0 XmlReader and XmlWriter classes expose a new Static/Shared method in version 2.0 called Create, which allows you to create instances by specifying an XmlReaderSettings or XmlWriterSettingsclass instance that defines the behaviour you want. We'll look at how this works with the XmlReaderin this article, and XmlWriterin the net article.

However, first, it's useful to see how the XmlReader and XmlWriter fit into the whole scheme of things in .NET version 2.0. Figure 1 shows the main data flows that involve the three types of XML document store and manipulation classes in System.Xml 2.0 and its subsidiary namespaces. You can see that theXmlReader and XmlWriter are a fundamental part of the flow when reading XML into, and saving it from other classes such as the document stores.

Figure 1 - How the XmlReader and XmlWriter can be used with the XML Document Stores in v2.0

Not shown here are other areas where the XmlReader and XmlWriter are used, for example when reading XML using the SQLXML technology in SQL Server via an ADO.NET Command instance, or reading and writing XML with the new XslCompiledTransform class that performs XSL-T transformations. And, of course, you can use the methods of the XmlReader and XmlWriter classes directly to read and expose nodes from an XML document, or to create new XML documents.

The XmlReaderSettings Class

The XmlReaderSettings class is used to specify the behavior you want for XmlReader instances that you will create and use in your code. Figure 2 shows a schematic overview of the XmlReaderSettings class. You can see that the set of properties available is broadly similar to that you will be used to in the version 1.xXmlReader class. You can specify a range of properties that control the way XML is handled: including ignoring white-space and processing instructions, specifying the schema validation type and conformance level, preventing DTDs from being processed, and closing the underlying input stream automatically when the reader is closed.

Figure 2 - The XmlReaderSettings Class

There are also properties that return the current line number and character offset when reading a document, and the ability to switch on and off strict checking of the characters in the input stream (for example characters that are outside the legal range for XML documents). The XmlReaderSettings class also exposes a reference to an XmlResolver that is used to safely read external schemas, DTDs and entities; plus a reference to an ICredentials collection that contains the network credentials to be presented to the server when accessing a remote document.


To resolve namespaces within the XML document, the XmlReaderSettings class also exposes a reference to an XmlNameTable. This is basically a collection of name/value pairs that specify the namespace prefixes and the corresponding namespace identifier declarations.

You can also read an XML stream that doesn't contain the <?xml version="1.0"?> declaration, and read fragments of XML that are not - on their own - valid documents. You specify the conformance level, so that the reader will accept input that is not actually a complete XML document, for example a fragment that contains un-declared namespace prefixes.


Some of the ways that you can use the XmlReaderSettings class are discussed next. We'll look at:

  • Creating an XmlReader with the XmlReaderSettings class
  • Validating XML with the XmlReaderSettings and XmlReader classes
  • Handling XML validation errors
  • Using a custom handler to trap XML validation errors and warnings
  • Reading fragments of XML with an XmlReader
  • Validating fragments of XML with an XmlReader
  • Using an XmlResolver to limit access to resources
  • Wrapping or "pipelining" XmlReader instances

The example page shown in Figure 3 demonstrates most of the features listed above. You can run or download all of the samples at our Website at http://www.daveandal.net/articles/readwritexml/. This first example, named readersettings.aspx, allows you to turn on and off validation (including using a custom validation handler and trapping validation warnings), set the conformance level for a document or a fragment, and use an XmlResolver to limit access to the XML disk file. It also demonstrates reading typed values, as you'll see later in the article. There is a [view source] link at the bottom of the page that you can use to see the source code, which is fully commented to help you understand how it all works.

Figure 3 - The Example Page that Demonstrates Using the XmlReaderSettings Class

Creating an XmlReader with the XmlReaderSettings Class

To create an XmlReader instance, you first instantiate an instance of the XmlReaderSettings class, set the properties you want, and then call the Create method of the XmlReader class. For example, this code creates an XmlReader that closes the underlying input stream when the reader is closed, ignores comments in the XML document, and reads the XML disk file named myfile.xml:

Dim rs As New XmlReaderSettings()
rs.CloseInput = True
rs.IgnoreComments = True
Dim xr As XmlReader = XmlReader.Create("C:\temp\myfile.xml", rs)

Other overloads of the Create method allow you to generate an XmlReader over a Stream, or wrap an existing TextReader or XmlReader which is then used as the input to the new XmlReader. You can also pass an XmlParserContext instance as the third parameter of the Create method, which allows you to declare the namespaces and prefixes used in the document, and specify the language and the white-space handling options that the reader will use when reading the XML. Finally, you can use the Create method without specifying an XmlReaderSettings instance if you just want to create a single instance of an XmlReader, and set the various properties of the reader directly afterwards.

The example page shown in Figure 3 provides a drop-down list where you can select from a range of XML disk files. It also declares a variable to hold an XmlParserContext instance, which is populated if you select the option to read an XML fragment instead of a complete and well-formed XML document. The XmlReader is then created using the static Create method against the XML file you select in the drop-down list:

Dim xpc As XmlParserContext = Nothing
...
' create and populate the XmlParserContext here if reading an XML fragment
...
Dim xr As XmlReader = Nothing
Dim sPath As String = Server.MapPath("data/" & lstDocument.SelectedItem.Text)
xr = XmlReader.Create(sPath, rs, xpc)

If there is an error creating the XmlReader, for example a security exception or if the XML file or stream you specify does not exist, the exception is raised when you call the Create method. Therefore you should always use a Try..Catch construct to trap any such errors.

Validating XML with the XmlReaderSettings and XmlReader Classes

One of the stranger features in version 1.x of the System.Xml implementation is that you have to use a special class, XmlValidatingReader, to validate an XML document. And you have to create this XmlValidatingReader from an existing XmlReader instance. This is because validation adds an overhead to the reader class that wastes resources if validation is not required (although the readers do check that the document is well-formed).

In version 2.0, you can validate a document directly when using an XmlReader. A range of properties on the XmlReaderSettings class allow you to specify one or more external XML schemas or DTDs using the XmlSchemaSet class (a collection of XmlSchema instances), and these are applied to the XML as it is read - depending on the settings you specify for the ValidationType and ValidationFlags property. The ValidationFlags property is combination of flag values from the XmlSchemaValidationFlags enumeration, as shown earlier in Figure 2. This enumeration contains five values:

  • None: none of the validation flags are active - this is the default
  • ProcessIdentityConstraints: all constraints specified by xs:ID, xs:IDREF, xs:key, xs:keyref, xs:unique elements in the document are processed
  • ProcessInlineSchema: any inline schema within the document is processed
  • ProcessSchemaLocation: any elements that specify external schema locations, such as xsi:schemaLocation, xsi:noNamespaceSchemaLocation, are processed
  • ReportValidationWarnings: any warnings encountered during validation are detected, and the corresponding validation events will be raised.

To enable validation in an XmlReaderSettings class, before you create the XmlReader instances you need from it, you must perform two tasks. The first is to create an XmlSchemaSet and assign the schemas that will be used for validating the XML to it (unless the XML document contains an inline schema). In the example page we use an XML document that refrences two schemas - one that defines the main elements in the document and one that defines the reviewed element with the namespace prefix "rv". This is the standard and valid XML document:

<?xml version="1.0" encoding="utf-8"?>
<root xmlns="http://myns/slidesdemo" xmlns:rv="http://myns/slidesdemo/reviewdate">
<session name="All about XML">
  <slides>
    <slide position="1">
      <title>Agenda</title>
      <rv:reviewed>2004-05-10T00:00:00</rv:reviewed>
    </slide>
    <slide position="2">
      <title>Introduction</title>
      <rv:reviewed>2003-10-22T00:00:00</rv:reviewed>
    </slide>
    <slide position="3">
      <title>Code Examples</title>
      <rv:reviewed>2004-03-02T00:00:00</rv:reviewed>
    </slide>
  </slides>
</session>
</root>

You can see the two namespace declarations in the root element, and these are used in the targetNamespace attribute of the two schemas. So we need to add both of these schemas to the XmlSchemaSet, and then assign the XmlSchemaSet to the Schema property of the XmlReaderSettings instance:

Dim ss As New XmlSchemaSet()
ss.Add("http://myns/slidesdemo", Server.MapPath("data/schema/slides.xsd"))
ss.Add("http://myns/slidesdemo/reviewdate", Server.MapPath("data/schema/slidesrev.xsd"))
rs.Schemas = ss

Then we turn on validation by setting the ValidationType and specifying the ValidationFlags we want to be active. In this case, we've specified that validation should be carried out against an XML schema, though you could use ValidationType.Auto, in which case the reader will detect which type of schema or DTD is being used:

rs.ValidationType = ValidationType.Schema
rs.ValidationFlags = (rs.ValidationFlags + XmlSchemaValidationFlags.ProcessSchemaLocation)

Handling XML Validation Errors and Warnings

Now any validation error will raise an XmlSchemaException when the XML is read. So you can handle this error to find out what happened, either when loading another object with the XmlReader (for example passing it to the Load method of an XmlDocument instance), or when reading individual nodes directly. In the example page, we've previously created a StringBuilder to hold the results of processing the XML disk file, and it can be populated with the validation error details like this:

Try
  While xr.Read()
    ' ... handle and display XML document content here ...
  End While
Catch xsx As XmlSchemaException
  ' document failed validation against schema so display details
  builder.Append("<p><b>ERROR validating XML document against schema:</b><br />")
  builder.Append("Message = " & xsx.Message & "<br />")
  builder.Append("LineNumber = " & xsx.LineNumber.ToString())
  builder.Append(" &nbsp; LinePosition = " & xsx.LinePosition.ToString() & "</p>")
  ...

Figure 4 shows the result of validating an XML document that contains invalid content. This document contains the element <slideposition="two">, which is invalid because the data type defined in the schema for this element is xs:unsignedByte. Notice that processing of the XML document stops when the error is encountered (if you do not tick the first checkbox in the page, it will read the XML without validating it and you'll be able to see the values of all the nodes).

Figure 4 - Validating a Document with an XmlReaderSettings and XmlReader Class

However, the XmlReader may also raise other types of exception when reading the XML document, for example if the file becomes unavailable or the input stream is disrupted. In this case, you should also include a generic error handler section, and remember to close the XmlReaderas well when you have finished using it:

  ...
Catch ex As Exception
  ' error reading document so display details
  builder.Append("<p><b>ERROR reading XML document:</b><br />")
  builder.Append("Message = " & ex.Message & "</p>")
Finally
  Try
    xr.Close()
  Catch
  End Try
End Try

Another approach is to use a Using construct, now available in VB.NET as well as C#, to ensure that the reader is correctly disposed when you have finished with it. You don’t have to remember to call Close in this case, though it's still good practice to do so. For example:

Using xr As XmlReader = XmlReader.Create("test.xml", rs)
  ' ... use the XmlReader here ...
  ' ... still good practice to call Close when complete ...    
End Using

Using a Custom Handler to Trap XML Validation Errors and Warnings

Trapping validation errors, as shown above, is useful, but sometimes you want to handle validation errors yourself, without having processing stop when the first one is encountered. As in version 1.x, you can add a custom handler to the ValidationEventHandler property of the XmlReader (in version 2.0, this is done via the XmlReaderSettings class), which is called when any validation error is raised. In VB.NET, you can use the following to specify the event handler named MyValidationHandler for this event:

AddHandler rs.ValidationEventHandler, AddressOf MyValidationHandler

In C#, you would use:

rs.ValidationEventHandler += MyValidationHandler;

A simple event handler is used in the example page, which adds details of the validation error to the StringBuilder so that they can be displayed in the page afterwards. And, because we are handling the validation event ourselves, processing of the XML document continues when each error is detected:

Sub MyValidationHandler(ByVal sender As Object, ByVal e As ValidationEventArgs)
  ' display error details
  builder.Append("<p><b>ValidationEventHandler detected an error:</b><br />")
  builder.Append("Message = " & e.Message & "<br />")
  builder.Append("Severity = " & e.Severity.ToString() & " &nbsp; ")
  ' get line number and character offset from exception
  builder.Append("LineNumber = " & e.Exception.LineNumber.ToString() & " &nbsp; ")
  builder.Append("LinePosition = " & e.Exception.LinePosition.ToString() & "</p>")
End Sub


By default, only validation errors are reported when you validate an XML document. However, validation can also raise warnings that indicate a problem with the XML, but do not necessarily mean it is invalid. A prime example is when you are reading a fragment of XML that does not contain the matching namespace declaration. To see these warnings, you must handle the validation event yourself, as demonstrated in the previous section, and also turn on validation warnings by setting the ReportValidationWarnings flag in the ValidationFlags property of the XmlReaderSettings instance before you create the XmlReader:

rs.ValidationFlags = (rs.ValidationFlags _
                   + XmlSchemaValidationFlags.ReportValidationWarnings)

Now the custom event handler can report the validation warnings as well as validation errors. When a warning is encountered, the value of the Severity property of the ValidationEventArgs instance passed to the event handler will be "Warning".


Table of Contents
Click Here!

Article: Moving a Document to the SharePoint 2010 Records Center
In this article, we are going to build on the solution we have been working with over the last couple of weeks. You can review Part 1 and/or Part II at your convenience. In this installment, we are going to set up a simple one-step approval for the form and move it to a SharePoint 2010 Records Center once the form processing is complete.
Moving a Document to the SharePoint 2010 Records Center
Learn to set up a simple one-step approval for the form and move it to a SharePoint 2010 Records Center once the form processing is complete. >>

Using the Event Handler in SharePoint 2010
As organizations increase their use of SharePoint, users need more customized forms of solutions to address their requirements. Learn how to handle such requests. >>

The Definitive Guide to Windows Phone 7
The upcoming Windows Phone 7, announced by Microsoft in Spain in February, is unlike any previous mobile Windows version. >>
MARKETPLACE
Image Ad $112B in Federal IT Opportunities Download a free summary of Input’s new report: Federal IT Forecast 2010-2015. www.INPUT.com
Image Ad Business On Main: Online Community Free Online Tools and Resources To Help Start Or Grow Your Business. Join Today! www.BusinessOnMain.com
Image Ad Network Management Software Discover, Map, Monitor & Manage all network devices, Apps, AD, Services, etc. Try Free/Trial Edition www.OpManager.com

  • Part 1 - Using XmlReaderSettings, XmlReader, and the Static Create Methods
  • Part 2 - Using XmlWriterSettings, XmlWriter, and the Static Create Methods
  • Part 3 - Loading and Persisting XML with an XML Document Store Object


Reading Fragments of XML with an XmlReader

The XmlReader, by default, expects all XML documents to be well-formed. However, there are occasions when you want to read fragments of XML that may not be strictly well-formed, and also be able to validate these where possible. To read fragments of XML, you set the ConformanceLevel property of the XmlReaderSettings instance to ConformanceLevel.Fragment before you create the XmlReader(s):

rs.ConformanceLevel = ConformanceLevel.Fragment

However, XML fragments do not usually contain enough information for the XmlReader to be able to read the document. They may not contain the required namespace declarations, or the <?xml...?> declaration that defines the language, encoding and white-space treatment required for the document. In other words, the context for reading the document may well be missing.

To get round this, you will usually have to provide the missing information by creating and populating an appropriate XmlParserContext instance. This process starts by adding a new NameTable to hold the namespace declarations to the XmlReaderSettings and then creating a new XmlNamespaceManager over this. You then add the required namespaces to the XmlNamespaceManager:

rs.NameTable = New NameTable()
Dim nsm As New XmlNamespaceManager(rs.NameTable)
nsm.AddNamespace("rv", "http://myns/slidesdemo/reviewdate")

Then you can create the new XmlParserContext using the XmlNamespaceManager, and optionally include the language and white-space handling values you want. And, to specify the encoding of the document, you just set the Encoding property of the XmlParserContext instance to an appropriate encoding class instance:

Dim xpc As XmlParserContext = New XmlParserContext(rs.NameTable, _
                                  nsm, "en", XmlSpace.Default)
xpc.Encoding = New UTF8Encoding()

Then you can create the XmlReader from the XmlReaderSettings instance using the overload of the static Create method that accepts an XmlParserContext instance:

Dim xr As XmlReader = XmlReader.Create("C:\temp\myfile.xml", rs, xpc)

Now you can read XML fragments that match the settings in the XmlParserContext. The example page we've been using so far allows you to specify the following XML fragment as the source, and turn on fragment conformance, using code like that we've just been discussing. Notice that - with the exception of the reviewed element - the fragment does not contain any namespace declarations or prefixes. The namespace prefix on the reviewed element is acceptable because we create the NameTable containing this namespace declaration as part of the XmlParserContext we use to read this fragment:

<slides>
  <slide position="1">
    <title>Agenda</title>
    <rv:reviewed>2004-05-10T00:00:00</rv:reviewed>
  </slide>
  <slide position="2">
    <title>Introduction</title>
    <rv:reviewed>2003-10-22T00:00:00</rv:reviewed>
  </slide>
</slides>

Figure 5 shows the result, and you can see the contents of the XML fragment listed above. If you turn off fragment checking and try to read this fragment (whereupon the appropriate XmlParserContext is not created), you'll see that an error is raised because the "rv" prefix is not declared. 

Figure 5 - Reading an XML Fragment with an XmlReaderSettings and XmlReader Class

Validating Fragments of XML with an XmlReader

Validation is also supported for XML fragments, as you can see if you turn on validation in the example page. You can select an invalid fragment and try reading this to see the effects. The invalid fragment contains the element <rv:reviewed>yes</rv:reviewed>, which is illegal because the schema for this section of XML (slidesrev.xsd in the schema data\subfolder) defines this element as an xs:dateTimetype. Figure 6 shows the results.

Figure 6 - Validating an XML Fragment with an XmlReaderSettings and XmlReader Class

However, when you read fragments of XML, you often find that validation warnings are encountered. We specified that warnings should be raised by setting the ReportValidationWarnings flag in the ValidationFlags property of the XmlReaderSettings instance in our example when a custom error handler is used. If you set the checkboxes in the example page for validation, custom validation error handling and warnings reporting, as well as the fragment conformance option, you'll see these warnings appear when you attempt to read the slides-invalid-fragment.xml file - as shown in Figure 7.

Figure 7 - Displaying Validation Warnings and Errors for an XML Fragment

Using an XmlResolver to Limit Access to Resources

The final feature that the example we've been using so far demonstrates is how you can control access to resources when using an XmlReader. This could be useful if, for example, you want to limit access to a particular folder or set of XML disk files. By default, the XmlReader uses an XmlResolver that is created internally to resolve references, URLs and paths to the resources it uses. However, you can create your own XmlResolver instance and use this to set the XmlResolver property of the XmlReaderSettings instance before you create your XmlReader(s).

The first step is to create a PermissionSet that defines the permissions you will demand when the XmlReader tries to access a resource. By specifying PermissionState.None in the constructor, you indicate that no permission demand will be made - and so access will fail. Note that you must import the System.Security and System.Security.Permissions namespaces when writing code to control access to resources like this:

Dim ps As New PermissionSet(PermissionState.None)

Now you can create individual permissions, and add them to the PermissionSet. In the example page, we want to be able to access the folder named data that contains the XML disk files, and so we create a FileIOPermission instance that gives read access to this folder:

Dim fpdata As New FileIOPermission(FileIOPermissionAccess.Read, Server.MapPath("./data/"))
ps.AddPermission(fpdata)

Then we can create a new XmlSecureResolver (a class that inherits from XmlResolver) and specify this permission set, then use it to set the XmlResolver property of the XmlReaderSettings instance we're using:

rs.XmlResolver = New XmlSecureResolver(New XmlUrlResolver, ps)

If you run the example page, and set the checkbox to block access to all folders, you'll find that an error is displayed - as shown in Figure 8. This is because the code in the example page does not add the FileIOPermission to the PermissionSet unless you also set the "Allow access..." checkbox as well.

Figure 8 - Preventing Access to Resources with an XmlSecureResolver

This error is trapped by the Try..Catch construct around the call to the Create method of the XmlReaderSettings instance. We specifically catch instances of a SecurityException, and display the message then exit from the routine. The SecurityException class exposes a range of properties that describe the exception, but we're only using the Message property in our example page:

Try
  ' ... create the XmlReader using the XmlReaderSettings ...
Catch secx As SecurityException
  builder.Append("<p><b>ERROR creating XmlReader:</b><br />")
  builder.Append("Message = " & secx.Message & "</p>")
  Label1.Text &= builder.ToString()
  Return
Catch ex As Exception
  ' ... handle exceptions for other errors here ...
End Try

If you now set the checkbox to allow access to the data folder, the XmlReader is able to read the XML file and display the contents as it does when using its default XmlResolver.

Wrapping or "Pipelining" XmlReader Instances

One of the options when you create an XmlReader or XmlWriter using the static Create methods is to specify as the source (the first parameter of the Create method) another XmlReader or XmlWriter, or an existing TextReader or TextWriter. You can create a new XmlReader instance over an existing XmlReader or TextReader, and create a new XmlWriter instance over an existing XmlWriter or TextWriter.

This process is called wrapping or pipelining, and allows you to add new features to an existing reader or writer as you create a new instance from it. For example, you can add validation support to an XmlReader created over an existing XmlReader that does not validate the incoming XML, or even over a TextReader that is already referencing an XML document. Notice, however, that you cannot remove features that are already enabled on the source reader or writer. This could, if permitted, prevent the source reader or writer from behaving correctly.

We provide an example named pipelinereaders.aspx that demonstrates wrapping an XmlReader with another XmlReader. It starts by creating an XmlReader using an XmlReaderSettings instance in the same way as the previous example, but only sets a few properties of the XmlReaderSettings. The XmlReader is created over the same invalid XML document as you saw in the previous example:

' create an XmlReaderSettings instance and set some properties
Dim rs1 As New XmlReaderSettings()
rs1.CloseInput = True
rs1.IgnoreComments = True
rs1.IgnoreWhitespace = True

' declare a variable to hold an XmlReader
Dim xr As XmlReader = Nothing
Try
  ' create the XmlReader using this first XmlReaderSettings instance
  Dim sPath As String = Server.MapPath("data/slides-invalid-content.xml")
  xr = XmlReader.Create(sPath, rs1)
  builder.Append("Created non-validating XmlReader<br />")
Catch ex As Exception
  ' ... display error details here ...
End Try

Now a new XmlReaderSettings instance is created. By layering over an existing XmlReader, the new XmlReaderwill assume the settings of the existing XmlReader, which you can add to through the new XmlReaderSettings instance. In this case we'll add validation to the new XmlReader.


The next section of code shows the new XmlReaderSettings instance being created, and the validation features set in the same way as we did in the previous example. This includes adding a custom event handler to the ValidationEventHandler event of the XmlReaderSettings instance:

' create an new XmlReaderSettings instance and set some properties
Dim rs2 As New XmlReaderSettings()
' create and populate an XmlSchemaSet instance
Dim ss As New XmlSchemaSet()
ss.Add("http://myns/slidesdemo", Server.MapPath("data/schema/slides.xsd"))
ss.Add("http://myns/slidesdemo/reviewdate", Server.MapPath("data/schema/slidesrev.xsd"))
' add XmlSchemaSet to XmlReaderSettings and turn on validation
rs2.Schemas = ss
rs2.ValidationType = ValidationType.Schema
rs2.ValidationFlags = (rs2.ValidationFlags + XmlSchemaValidationFlags.ProcessSchemaLocation)
' add a custom handler for validation events
AddHandler rs.ValidationEventHandler, AddressOf MyValidationHandler

Now we create a new XmlReader using the new XmlReaderSettings instance, by specifying the original XmlReader as the first parameter of the Create method. Then we call a separate routine named ShowReadToMethods to display some values from the XML document:

' declare a variable to hold the validating XmlReader
Dim vxr As XmlReader = Nothing
Try
  ' create XmlReader using XmlReaderSettings instance and existing non-validating XmlReader
  vxr = XmlReader.Create(xr, rs2)
  ' display a couple of values from the invalid XML document
  ShowReadToMethods(vxr)
Catch ex As Exception
  ' ... display error details here ...
End Try

The ShowReadToMethodsroutine uses the new ReadToXxx methods of the XmlReader class, so we'll look at this code in the next section when we examine these methods in more detail. In the meantime, Figure 9 shows the result. You can see that the document has been validated as it was being read and displayed, and that processing does not stop when the first validation error is encountered. The output in the page shows each reader being created, the values of some nodes in the document, and the messages generated by the custom validation handler we specified when we created the second XmlReaderSettings instance.

Figure 9 - Wrapping One XmlReader with another XmlReader that Performs Validation

Two Useful New Features of the XmlReader Class

As well as the use of the static Create methods and the "settings" classes we've just described, the XmlReader in version 2.0 of System.Xml provides other new features and opportunities. The two we'll look at here are:

  • Reading up to specific elements or fragments
  • Reading typed values from an XML document

Reading Up To Specific Elements or Fragments

When reading XML documents with an XmlReader where you want to locate a specific element or attribute node, one of the most laborious and inefficient parts of the process is actually reading up to that node. In version 2.0, the XmlReader exposes some new methods that you can use. These are the ReadToDescendant, ReadToFollowing and ReadToNextSibling methods, which allow you to easily skip over nodes and content until you arrive at the element node you require.

The example page named pipelinereaders.aspx we used in the previous section demonstrates some of these methods. After creating the XmlReader that performs validation, the code calls a routine named ShowReadToMethods, passing in the XmlReader. This listing shows the ShowReadToMethods routine in full. You can see from this how easy it is to navigate through a document using these new methods:

Sub ShowReadToMethods(ByVal vxr As XmlReader)

  ' move to the first descendant slide element
  builder.Append("Executing the ReadToDescendant(""slide"") method<br />")
  If vxr.ReadToDescendant("slide") Then
    builder.Append("Found element '" & vxr.Name)
    ' display the value of the position attribute
    vxr.MoveToAttribute("position")
    builder.Append("' with position attribute = '" & vxr.Value & "'<br />")
  Else
    builder.Append("Cannot execute the <b>ReadToDescendant</b> method.<br />")
  End If

  ' move to the next slide element
  builder.Append("Executing the ReadToNextSibling(""slide"") method<br />")
  If vxr.ReadToNextSibling("slide") Then
    builder.Append("Found element '" & vxr.Name)
    ' display the value of the position attribute
    vxr.MoveToAttribute("position")
    builder.Append("' with position attribute = '" & vxr.Value & "'<br />")
  Else
    builder.Append("Cannot execute the <b>ReadToNextSibling</b> method.<br />")
  End If

  ' move back to element so that ReadToDescendant can be called next
  vxr.MoveToElement()

  ' move to the title element
  builder.Append("Executing the ReadToDescendant(""title"") method<br />")
  If vxr.ReadToDescendant("title") Then
    builder.Append("Found element '" & vxr.Name)
    ' display the value of the element
    vxr.Read()
    builder.Append("' with value = '" & vxr.Value & "'<br />")
  Else
    builder.Append("Cannot execute the <b>ReadToDescendant</b> method.<br />")
  End If

  ' move to the third slide element
  builder.Append("Executing the ReadToFollowing(""slide"") method<br />")
  If vxr.ReadToFollowing("slide") Then
    builder.Append("Found element '" & vxr.Name)
    ' display the value of the position attribute
    vxr.MoveToAttribute("position")
    builder.Append("' with position attribute = '" & vxr.Value & "'<br />")
  Else
    builder.Append("Cannot execute the <b>ReadToFollowing</b> method.<br />")
  End If

  ' move back to element so that ReadToDescendant can be called next
  vxr.MoveToElement()

  ' move to the reviewed element
  builder.Append("Executing the ReadToDescendant(""reviewed"", _
                 ""http://myns/slidesdemo/reviewdate"") method<br />")
  ' NOTE: could have used just "rv:reviewed" here instead
  If vxr.ReadToDescendant("reviewed", "http://myns/slidesdemo/reviewdate") Then
    builder.Append("Found element '" & vxr.Name)
    ' display the value of the element
    vxr.Read()
    builder.Append("' with value = '" & vxr.Value & "'<br />")
  Else
    builder.Append("Cannot execute the <b>ReadToDescendant</b> method.<br />")
  End If

End Sub

You can see from this that the ReadToXxx methods return a Boolean value that indicates if they managed to move to the specified nodes in the document. The routine displays a message in the page before each call to the ReadToDescendant, ReadToFollowing and ReadToNextSibling methods, and the name and value of the node it moved to if the method succeeds (for the slide elements that have no value, it displays the value of the position attribute instead). If it cannot perform the move, the routine displays a message to this effect.

If you look back at Figure 9, you'll see the results. The code starts by moving to the first descendant slide element using ReadToDescendant("slide"), and then to the next slide element by calling ReadToNextSibling("slide"). This element has an invalid value for its position attribute, as indicated by the text generated by the custom validation handler included in the page. Next, the code calls the MoveToElement method so that the reader is positioned on the slide element itself, and not on the child text node, before calling ReadToDescendant("title") to move to the title element within this slide element.

At this point, the only way to get back to previous level in the node hierarchy, to be able to move to the next slide element, is to call ReadToFollowing("slide"). This method moves through the document in the order that the nodes appear in the XML, rather than in a hierarchical manner. Notice that, on the way there, the reader has to read the reviewed child element of the current slide element, which also contains an invalid value - as shown by the second validation message in the page.

After displaying the value of the position attribute of the third slide element, the code calls MoveToElement to get back to the element node, and then ReadToDescendant("reviewed","http://myns/slidesdemo/reviewdate") to get to the reviewed element. The reviewed element is in a separate namespace and has the prefix "rv", and so we specify the namespace URI as well as the local name of the element. Alternatively, as noted in the comments in the code, we could specify the qualified name of the element instead - using the more compact form ReadToDescendant("rv:reviewed").

Reading Typed Values from an XML Document

The XML Infoset model effectively views XML documents as typed data - often as the equivalent of rowsets such as you'd find in an ADO.NET DataTable or DataSet. This is achieved by layering the schema over the XML so that each node (element or attribute) is exposed as an instance of the relevant data type. In the System.Xml classes, this means standard CLR types such as String, Int32, Boolean, DateTime, etc. To allow you to access documents as typed data, the XmlReader exposes a series of new methods named ReadContentAsXxx and ReadElementContentAsXxx, where Xxx is the name of the data type. There is also a generic ReadValueAs method, where you specify the data type of the node that you want to query.


The example page we used at the start of this article reads some values from the XML document as CLR typed instances using the ReadContentAsXxx and ReadElementContentAsXxx methods. It reads the value of the position attribute on each slide element as an Int32 integer value (these are defined in the schema as of type xs:unsignedByte) using the ReadContentAsInt method of the XmlReader class. It also reads the value of the reviewed elements for each slide, which are defined in the schema as of type xs:dateTime) as DateTime instances using the ReadElementContentAsDateTime method.

After creating the XmlReader, the code calls the Read method repeatedly (until it returns False), so that each node is read from the XML document in turn. If the current node is an element, and this is the start tag, the name and the value type name (as returned by the ValueType property) are added to the StringBuilder that will display the results after the complete document has been processed. However, if validation is enabled for the XmlReader (the checkbox named chkValidate will be set in this case in our example), the schema will expose the values as the correct data types and so we can use the appropriate method to extract the value as a CLR data-typed instance. We do this for the reviewed element, using the ReadElementContentAsDateTime method:

While xr.Read()
  If xr.IsStartElement() Then
    builder.Append("Element Name: " & xr.Name)
    builder.Append(" &nbsp; ValueType: " & xr.ValueType.ToString() & "<br />")
    If chkValidate.Checked And xr.LocalName = "reviewed" Then
      Dim dt As DateTime = xr.ReadElementContentAsDateTime()
      builder.Append("Element Typed value: " & dt.ToString() & "<br />")
    End If
    ...

Now the code checks if the current element node has any attributes. If so, it iterates through them in the same way as you would in System.Xml version 1.x when using an XmlTextReader. The name, value type and value of each one can then be displayed. However, when validation is enabled and the current attribute is named position, the code can call the ReadContentAsInt method of the XmlReader to get the value as an Int32 type as well:

    ...
    If xr.HasAttributes Then
      While xr.MoveToNextAttribute()
        builder.Append(" - Attribute Name: " & xr.Name)
        builder.Append(" &nbsp; ValueType: " & xr.ValueType.ToString())
        builder.Append(" &nbsp; Value: '" & xr.Value & "'")
        If chkValidate.Checked And xr.LocalName = "position" Then
          Dim pos As Int32 = xr.ReadContentAsInt()
          builder.Append(" &nbsp; Typed value: " & pos.ToString())
        End If
        builder.Append("<br />")
      End While
    End If
  End If
  ...

Finally, the code checks to see if the current node is the child text node that contains the value of an element (XmlNodeType.Text). Elements in an XML document have their value stored in a child node, and so this must be handled separately when using an XmlReader.  In this case there is no node name (the parent element node contains the name), but the value can be extracted and displayed:

  ...
  If xr.NodeType = XmlNodeType.Text Then
    builder.Append("Element String Value: '" & xr.Value & "'" & "<br />")
  End If
End While

Figure 10 shows the readersettings.aspx example page displaying the XML content when validation is disabled and when it is enabled. You can see the CLR data type names returned by the ValueType property, and the typed value that is obtained by calling the appropriate ReadContentAsXxx method when validation is enabled. The position attribute actually appears as a System.Byte type, but there is no ReadContentAsByte method so the ReadContentAsInt method is used instead to return an Int32 instance. And the reviewed element appears as a DateTime type as expected, but notice that calling any of the ReadElementContentAsXxxx methods consumes (i.e. reads) the element value - the child text node - so it does not appear when the code checks for nodes of type XmlNodeType.Text at the end of the iteration loop.

Figure 10 - Reading Values from an XML Document as CLR Typed Instances

Summary

In this series of three articles, we explore how the new features of the XmlReader and XmlWriter classes in version 2.0 of the .NET Framework can be used to read and write XML documents, and interact with the new XML document store objects. In this first article, we've concentrated on the XmlReader class, and the new XmlReaderSettings class that makes it easy to generate single or multiple instances of XmlReader with a range of useful properties. We looked at:

  • The new "settings" classes and static Create methods for XmlReader and XmlWriter
  • Creating and using an XmlReader to read XML documents and fragments
  • Two of the useful new features of the XmlReader class

The XmlReaderSettings and XmlWriterSettings classes hold a wide range of settings that you may need to apply when you create an XmlReader or an XmlWriter. In conjunction with the new static Create methods of XmlReader and XmlWriter, they allow you to store these settings for use whenever you need to create a reader or writer, saving time and making the whole process a lot more transparent and efficient.

The XmlReaderSettings class provides features that allow you to specify the general behavior of the XmlReader(s) you create, such as reading or ignoring DTDs, schemas, white-space, comments, etc. It also provides features to add validation for XML documents or fragments of XML, control access to resources, add credentials for accessing remote or secured resources, and more.

The XmlReader class itself also exposes several useful new features. In particular, in this article, we looked at how navigation in a document is improved through the new ReadTo methods, and how you can now access the content of the XML as CLR typed values.

In the next article, we'll move on to look at the XmlWriter class, and the corresponding XmlWriterSettings class, to see how they make it easier to create and user writers in version 2.0 of the .NET Framework.

No comments:

Post a Comment