Reading Fragments of XML with an XmlReader
The XmlReader, by default, expects all XML documents to be well-formed. However, there are occasions when you want to read fragments of XML that may not be strictly well-formed, and also be able to validate these where possible. To read fragments of XML, you set the ConformanceLevel property of the XmlReaderSettings instance to ConformanceLevel.Fragment before you create the XmlReader(s):
rs.ConformanceLevel = ConformanceLevel.Fragment
However, XML fragments do not usually contain enough information for the XmlReader to be able to read the document. They may not contain the required namespace declarations, or the <?xml...?> declaration that defines the language, encoding and white-space treatment required for the document. In other words, the context for reading the document may well be missing.
To get round this, you will usually have to provide the missing information by creating and populating an appropriate XmlParserContext instance. This process starts by adding a new NameTable to hold the namespace declarations to the XmlReaderSettings and then creating a new XmlNamespaceManager over this. You then add the required namespaces to the XmlNamespaceManager:
rs.NameTable = New NameTable()
Dim nsm As New XmlNamespaceManager(rs.NameTable)
nsm.AddNamespace("rv", "http://myns/slidesdemo/reviewdate")
Then you can create the new XmlParserContext using the XmlNamespaceManager, and optionally include the language and white-space handling values you want. And, to specify the encoding of the document, you just set the Encoding property of the XmlParserContext instance to an appropriate encoding class instance:
Dim xpc As XmlParserContext = New XmlParserContext(rs.NameTable, _
nsm, "en", XmlSpace.Default)
xpc.Encoding = New UTF8Encoding()
Then you can create the XmlReader from the XmlReaderSettings instance using the overload of the static Create method that accepts an XmlParserContext instance:
Dim xr As XmlReader = XmlReader.Create("C:\temp\myfile.xml", rs, xpc)
Now you can read XML fragments that match the settings in the XmlParserContext. The example page we've been using so far allows you to specify the following XML fragment as the source, and turn on fragment conformance, using code like that we've just been discussing. Notice that - with the exception of the reviewed element - the fragment does not contain any namespace declarations or prefixes. The namespace prefix on the reviewed element is acceptable because we create the NameTable containing this namespace declaration as part of the XmlParserContext we use to read this fragment:
<slides>
<slide position="1">
<title>Agenda</title>
<rv:reviewed>2004-05-10T00:00:00</rv:reviewed>
</slide>
<slide position="2">
<title>Introduction</title>
<rv:reviewed>2003-10-22T00:00:00</rv:reviewed>
</slide>
</slides>
Figure 5 shows the result, and you can see the contents of the XML fragment listed above. If you turn off fragment checking and try to read this fragment (whereupon the appropriate XmlParserContext is not created), you'll see that an error is raised because the "rv" prefix is not declared.
Figure 5 - Reading an XML Fragment with an XmlReaderSettings and XmlReader Class
Validating Fragments of XML with an XmlReader
Validation is also supported for XML fragments, as you can see if you turn on validation in the example page. You can select an invalid fragment and try reading this to see the effects. The invalid fragment contains the element <rv:reviewed>yes</rv:reviewed>, which is illegal because the schema for this section of XML (slidesrev.xsd in the schema data\subfolder) defines this element as an xs:dateTimetype. Figure 6 shows the results.
Figure 6 - Validating an XML Fragment with an XmlReaderSettings and XmlReader Class
However, when you read fragments of XML, you often find that validation warnings are encountered. We specified that warnings should be raised by setting the ReportValidationWarnings flag in the ValidationFlags property of the XmlReaderSettings instance in our example when a custom error handler is used. If you set the checkboxes in the example page for validation, custom validation error handling and warnings reporting, as well as the fragment conformance option, you'll see these warnings appear when you attempt to read the slides-invalid-fragment.xml file - as shown in Figure 7.
Figure 7 - Displaying Validation Warnings and Errors for an XML Fragment
Using an XmlResolver to Limit Access to Resources
The final feature that the example we've been using so far demonstrates is how you can control access to resources when using an XmlReader. This could be useful if, for example, you want to limit access to a particular folder or set of XML disk files. By default, the XmlReader uses an XmlResolver that is created internally to resolve references, URLs and paths to the resources it uses. However, you can create your own XmlResolver instance and use this to set the XmlResolver property of the XmlReaderSettings instance before you create your XmlReader(s).
The first step is to create a PermissionSet that defines the permissions you will demand when the XmlReader tries to access a resource. By specifying PermissionState.None in the constructor, you indicate that no permission demand will be made - and so access will fail. Note that you must import the System.Security and System.Security.Permissions namespaces when writing code to control access to resources like this:
Dim ps As New PermissionSet(PermissionState.None)
Now you can create individual permissions, and add them to the PermissionSet. In the example page, we want to be able to access the folder named data that contains the XML disk files, and so we create a FileIOPermission instance that gives read access to this folder:
Dim fpdata As New FileIOPermission(FileIOPermissionAccess.Read, Server.MapPath("./data/"))
ps.AddPermission(fpdata)
Then we can create a new XmlSecureResolver (a class that inherits from XmlResolver) and specify this permission set, then use it to set the XmlResolver property of the XmlReaderSettings instance we're using:
rs.XmlResolver = New XmlSecureResolver(New XmlUrlResolver, ps)
If you run the example page, and set the checkbox to block access to all folders, you'll find that an error is displayed - as shown in Figure 8. This is because the code in the example page does not add the FileIOPermission to the PermissionSet unless you also set the "Allow access..." checkbox as well.
Figure 8 - Preventing Access to Resources with an XmlSecureResolver
This error is trapped by the Try..Catch construct around the call to the Create method of the XmlReaderSettings instance. We specifically catch instances of a SecurityException, and display the message then exit from the routine. The SecurityException class exposes a range of properties that describe the exception, but we're only using the Message property in our example page:
Try
' ... create the XmlReader using the XmlReaderSettings ...
Catch secx As SecurityException
builder.Append("<p><b>ERROR creating XmlReader:</b><br />")
builder.Append("Message = " & secx.Message & "</p>")
Label1.Text &= builder.ToString()
Return
Catch ex As Exception
' ... handle exceptions for other errors here ...
End Try
If you now set the checkbox to allow access to the data folder, the XmlReader is able to read the XML file and display the contents as it does when using its default XmlResolver.
Wrapping or "Pipelining" XmlReader Instances
One of the options when you create an XmlReader or XmlWriter using the static Create methods is to specify as the source (the first parameter of the Create method) another XmlReader or XmlWriter, or an existing TextReader or TextWriter. You can create a new XmlReader instance over an existing XmlReader or TextReader, and create a new XmlWriter instance over an existing XmlWriter or TextWriter.
This process is called wrapping or pipelining, and allows you to add new features to an existing reader or writer as you create a new instance from it. For example, you can add validation support to an XmlReader created over an existing XmlReader that does not validate the incoming XML, or even over a TextReader that is already referencing an XML document. Notice, however, that you cannot remove features that are already enabled on the source reader or writer. This could, if permitted, prevent the source reader or writer from behaving correctly.
We provide an example named pipelinereaders.aspx that demonstrates wrapping an XmlReader with another XmlReader. It starts by creating an XmlReader using an XmlReaderSettings instance in the same way as the previous example, but only sets a few properties of the XmlReaderSettings. The XmlReader is created over the same invalid XML document as you saw in the previous example:
' create an XmlReaderSettings instance and set some properties
Dim rs1 As New XmlReaderSettings()
rs1.CloseInput = True
rs1.IgnoreComments = True
rs1.IgnoreWhitespace = True
' declare a variable to hold an XmlReader
Dim xr As XmlReader = Nothing
Try
' create the XmlReader using this first XmlReaderSettings instance
Dim sPath As String = Server.MapPath("data/slides-invalid-content.xml")
xr = XmlReader.Create(sPath, rs1)
builder.Append("Created non-validating XmlReader<br />")
Catch ex As Exception
' ... display error details here ...
End Try
Now a new XmlReaderSettings instance is created. By layering over an existing XmlReader, the new XmlReaderwill assume the settings of the existing XmlReader, which you can add to through the new XmlReaderSettings instance. In this case we'll add validation to the new XmlReader.
The next section of code shows the new XmlReaderSettings instance being created, and the validation features set in the same way as we did in the previous example. This includes adding a custom event handler to the ValidationEventHandler event of the XmlReaderSettings instance:
' create an new XmlReaderSettings instance and set some properties
Dim rs2 As New XmlReaderSettings()
' create and populate an XmlSchemaSet instance
Dim ss As New XmlSchemaSet()
ss.Add("http://myns/slidesdemo", Server.MapPath("data/schema/slides.xsd"))
ss.Add("http://myns/slidesdemo/reviewdate", Server.MapPath("data/schema/slidesrev.xsd"))
' add XmlSchemaSet to XmlReaderSettings and turn on validation
rs2.Schemas = ss
rs2.ValidationType = ValidationType.Schema
rs2.ValidationFlags = (rs2.ValidationFlags + XmlSchemaValidationFlags.ProcessSchemaLocation)
' add a custom handler for validation events
AddHandler rs.ValidationEventHandler, AddressOf MyValidationHandler
Now we create a new XmlReader using the new XmlReaderSettings instance, by specifying the original XmlReader as the first parameter of the Create method. Then we call a separate routine named ShowReadToMethods to display some values from the XML document:
' declare a variable to hold the validating XmlReader
Dim vxr As XmlReader = Nothing
Try
' create XmlReader using XmlReaderSettings instance and existing non-validating XmlReader
vxr = XmlReader.Create(xr, rs2)
' display a couple of values from the invalid XML document
ShowReadToMethods(vxr)
Catch ex As Exception
' ... display error details here ...
End Try
The ShowReadToMethodsroutine uses the new ReadToXxx methods of the XmlReader class, so we'll look at this code in the next section when we examine these methods in more detail. In the meantime, Figure 9 shows the result. You can see that the document has been validated as it was being read and displayed, and that processing does not stop when the first validation error is encountered. The output in the page shows each reader being created, the values of some nodes in the document, and the messages generated by the custom validation handler we specified when we created the second XmlReaderSettings instance.
Figure 9 - Wrapping One XmlReader with another XmlReader that Performs Validation
Two Useful New Features of the XmlReader Class
As well as the use of the static Create methods and the "settings" classes we've just described, the XmlReader in version 2.0 of System.Xml provides other new features and opportunities. The two we'll look at here are:
- Reading up to specific elements or fragments
- Reading typed values from an XML document
Reading Up To Specific Elements or Fragments
When reading XML documents with an XmlReader where you want to locate a specific element or attribute node, one of the most laborious and inefficient parts of the process is actually reading up to that node. In version 2.0, the XmlReader exposes some new methods that you can use. These are the ReadToDescendant, ReadToFollowing and ReadToNextSibling methods, which allow you to easily skip over nodes and content until you arrive at the element node you require.
The example page named pipelinereaders.aspx we used in the previous section demonstrates some of these methods. After creating the XmlReader that performs validation, the code calls a routine named ShowReadToMethods, passing in the XmlReader. This listing shows the ShowReadToMethods routine in full. You can see from this how easy it is to navigate through a document using these new methods:
Sub ShowReadToMethods(ByVal vxr As XmlReader)
' move to the first descendant slide element
builder.Append("Executing the ReadToDescendant(""slide"") method<br />")
If vxr.ReadToDescendant("slide") Then
builder.Append("Found element '" & vxr.Name)
' display the value of the position attribute
vxr.MoveToAttribute("position")
builder.Append("' with position attribute = '" & vxr.Value & "'<br />")
Else
builder.Append("Cannot execute the <b>ReadToDescendant</b> method.<br />")
End If
' move to the next slide element
builder.Append("Executing the ReadToNextSibling(""slide"") method<br />")
If vxr.ReadToNextSibling("slide") Then
builder.Append("Found element '" & vxr.Name)
' display the value of the position attribute
vxr.MoveToAttribute("position")
builder.Append("' with position attribute = '" & vxr.Value & "'<br />")
Else
builder.Append("Cannot execute the <b>ReadToNextSibling</b> method.<br />")
End If
' move back to element so that ReadToDescendant can be called next
vxr.MoveToElement()
' move to the title element
builder.Append("Executing the ReadToDescendant(""title"") method<br />")
If vxr.ReadToDescendant("title") Then
builder.Append("Found element '" & vxr.Name)
' display the value of the element
vxr.Read()
builder.Append("' with value = '" & vxr.Value & "'<br />")
Else
builder.Append("Cannot execute the <b>ReadToDescendant</b> method.<br />")
End If
' move to the third slide element
builder.Append("Executing the ReadToFollowing(""slide"") method<br />")
If vxr.ReadToFollowing("slide") Then
builder.Append("Found element '" & vxr.Name)
' display the value of the position attribute
vxr.MoveToAttribute("position")
builder.Append("' with position attribute = '" & vxr.Value & "'<br />")
Else
builder.Append("Cannot execute the <b>ReadToFollowing</b> method.<br />")
End If
' move back to element so that ReadToDescendant can be called next
vxr.MoveToElement()
' move to the reviewed element
builder.Append("Executing the ReadToDescendant(""reviewed"", _
""http://myns/slidesdemo/reviewdate"") method<br />")
' NOTE: could have used just "rv:reviewed" here instead
If vxr.ReadToDescendant("reviewed", "http://myns/slidesdemo/reviewdate") Then
builder.Append("Found element '" & vxr.Name)
' display the value of the element
vxr.Read()
builder.Append("' with value = '" & vxr.Value & "'<br />")
Else
builder.Append("Cannot execute the <b>ReadToDescendant</b> method.<br />")
End If
End Sub
You can see from this that the ReadToXxx methods return a Boolean value that indicates if they managed to move to the specified nodes in the document. The routine displays a message in the page before each call to the ReadToDescendant, ReadToFollowing and ReadToNextSibling methods, and the name and value of the node it moved to if the method succeeds (for the slide elements that have no value, it displays the value of the position attribute instead). If it cannot perform the move, the routine displays a message to this effect.
If you look back at Figure 9, you'll see the results. The code starts by moving to the first descendant slide element using ReadToDescendant("slide"), and then to the next slide element by calling ReadToNextSibling("slide"). This element has an invalid value for its position attribute, as indicated by the text generated by the custom validation handler included in the page. Next, the code calls the MoveToElement method so that the reader is positioned on the slide element itself, and not on the child text node, before calling ReadToDescendant("title") to move to the title element within this slide element.
At this point, the only way to get back to previous level in the node hierarchy, to be able to move to the next slide element, is to call ReadToFollowing("slide"). This method moves through the document in the order that the nodes appear in the XML, rather than in a hierarchical manner. Notice that, on the way there, the reader has to read the reviewed child element of the current slide element, which also contains an invalid value - as shown by the second validation message in the page.
After displaying the value of the position attribute of the third slide element, the code calls MoveToElement to get back to the element node, and then ReadToDescendant("reviewed","http://myns/slidesdemo/reviewdate") to get to the reviewed element. The reviewed element is in a separate namespace and has the prefix "rv", and so we specify the namespace URI as well as the local name of the element. Alternatively, as noted in the comments in the code, we could specify the qualified name of the element instead - using the more compact form ReadToDescendant("rv:reviewed").
Reading Typed Values from an XML Document
The XML Infoset model effectively views XML documents as typed data - often as the equivalent of rowsets such as you'd find in an ADO.NET DataTable or DataSet. This is achieved by layering the schema over the XML so that each node (element or attribute) is exposed as an instance of the relevant data type. In the System.Xml classes, this means standard CLR types such as String, Int32, Boolean, DateTime, etc. To allow you to access documents as typed data, the XmlReader exposes a series of new methods named ReadContentAsXxx and ReadElementContentAsXxx, where Xxx is the name of the data type. There is also a generic ReadValueAs method, where you specify the data type of the node that you want to query.
The example page we used at the start of this article reads some values from the XML document as CLR typed instances using the ReadContentAsXxx and ReadElementContentAsXxx methods. It reads the value of the position attribute on each slide element as an Int32 integer value (these are defined in the schema as of type xs:unsignedByte) using the ReadContentAsInt method of the XmlReader class. It also reads the value of the reviewed elements for each slide, which are defined in the schema as of type xs:dateTime) as DateTime instances using the ReadElementContentAsDateTime method.
After creating the XmlReader, the code calls the Read method repeatedly (until it returns False), so that each node is read from the XML document in turn. If the current node is an element, and this is the start tag, the name and the value type name (as returned by the ValueType property) are added to the StringBuilder that will display the results after the complete document has been processed. However, if validation is enabled for the XmlReader (the checkbox named chkValidate will be set in this case in our example), the schema will expose the values as the correct data types and so we can use the appropriate method to extract the value as a CLR data-typed instance. We do this for the reviewed element, using the ReadElementContentAsDateTime method:
While xr.Read()
If xr.IsStartElement() Then
builder.Append("Element Name: " & xr.Name)
builder.Append(" ValueType: " & xr.ValueType.ToString() & "<br />")
If chkValidate.Checked And xr.LocalName = "reviewed" Then
Dim dt As DateTime = xr.ReadElementContentAsDateTime()
builder.Append("Element Typed value: " & dt.ToString() & "<br />")
End If
...
Now the code checks if the current element node has any attributes. If so, it iterates through them in the same way as you would in System.Xml version 1.x when using an XmlTextReader. The name, value type and value of each one can then be displayed. However, when validation is enabled and the current attribute is named position, the code can call the ReadContentAsInt method of the XmlReader to get the value as an Int32 type as well:
...
If xr.HasAttributes Then
While xr.MoveToNextAttribute()
builder.Append(" - Attribute Name: " & xr.Name)
builder.Append(" ValueType: " & xr.ValueType.ToString())
builder.Append(" Value: '" & xr.Value & "'")
If chkValidate.Checked And xr.LocalName = "position" Then
Dim pos As Int32 = xr.ReadContentAsInt()
builder.Append(" Typed value: " & pos.ToString())
End If
builder.Append("<br />")
End While
End If
End If
...
Finally, the code checks to see if the current node is the child text node that contains the value of an element (XmlNodeType.Text). Elements in an XML document have their value stored in a child node, and so this must be handled separately when using an XmlReader. In this case there is no node name (the parent element node contains the name), but the value can be extracted and displayed:
...
If xr.NodeType = XmlNodeType.Text Then
builder.Append("Element String Value: '" & xr.Value & "'" & "<br />")
End If
End While
Figure 10 shows the readersettings.aspx example page displaying the XML content when validation is disabled and when it is enabled. You can see the CLR data type names returned by the ValueType property, and the typed value that is obtained by calling the appropriate ReadContentAsXxx method when validation is enabled. The position attribute actually appears as a System.Byte type, but there is no ReadContentAsByte method so the ReadContentAsInt method is used instead to return an Int32 instance. And the reviewed element appears as a DateTime type as expected, but notice that calling any of the ReadElementContentAsXxxx methods consumes (i.e. reads) the element value - the child text node - so it does not appear when the code checks for nodes of type XmlNodeType.Text at the end of the iteration loop.
Figure 10 - Reading Values from an XML Document as CLR Typed Instances
Summary
In this series of three articles, we explore how the new features of the XmlReader and XmlWriter classes in version 2.0 of the .NET Framework can be used to read and write XML documents, and interact with the new XML document store objects. In this first article, we've concentrated on the XmlReader class, and the new XmlReaderSettings class that makes it easy to generate single or multiple instances of XmlReader with a range of useful properties. We looked at:
- The new "settings" classes and static Create methods for XmlReader and XmlWriter
- Creating and using an XmlReader to read XML documents and fragments
- Two of the useful new features of the XmlReader class
The XmlReaderSettings and XmlWriterSettings classes hold a wide range of settings that you may need to apply when you create an XmlReader or an XmlWriter. In conjunction with the new static Create methods of XmlReader and XmlWriter, they allow you to store these settings for use whenever you need to create a reader or writer, saving time and making the whole process a lot more transparent and efficient.
The XmlReaderSettings class provides features that allow you to specify the general behavior of the XmlReader(s) you create, such as reading or ignoring DTDs, schemas, white-space, comments, etc. It also provides features to add validation for XML documents or fragments of XML, control access to resources, add credentials for accessing remote or secured resources, and more.
The XmlReader class itself also exposes several useful new features. In particular, in this article, we looked at how navigation in a document is improved through the new ReadTo methods, and how you can now access the content of the XML as CLR typed values.
In the next article, we'll move on to look at the XmlWriter class, and the corresponding XmlWriterSettings class, to see how they make it easier to create and user writers in version 2.0 of the .NET Framework.