"Using XmlReaderSettings, XmlReader, and the Static Create Methods"It must be tough for companies that develop software for working with XML. No sooner do they get a product out of the door, the World Wide Web Consortium (W3C) changes the recommendations and standards so that their product is out of date. Yet the manufacturers still have to maintain backward compatibility with their previous releases, while attempting to encompass all the new standards. We've seen this several times before in Microsoft's XML product space, and the process shows little sign of stabilizing yet. OK, so the base specification for XML itself, version 1.0, is complete, stable and implemented in almost all products now. But recent advances in technologies such as XML Query Language (XQuery - see http://www.w3.org/XML/Query) and the XML Information Set (XML InfoSet - see http://www.w3.org/TR/xml-infoset/) require changes to core classes in the System.Xml namespace with each release of the Framework, to keep up with evolving standards. When version 1.0 of the .NET Framework was introduced, it brought with it a whole raft of new techniques for working with XML. This included a new pull-model parser, the XmlReader, new XML document objects such as XmlDocument, XmlDataDocument and XPathDocument, new classes for working with schemas, and a brand new XSL-T processor. Now, at the time of writing, version 2.0 has just appeared (this article is based on the Beta 2 release). And after the preamble above, you won't be surprised to learn that there are a great many changes in the release compared to version 1.x. In this series of three articles, we'll look in detail at how the new features of the XmlReader and XmlWriter classes in version 2.0 of the .NET Framework can be used to read and write XML documents, and interact with the new XML document store objects. This includes:
Along the way, we'll look into the issues involved in using the new classes, the reasoning behind the changes, and how the new features simplify your code and provide better overall efficiency for your applications. This first article concentrates on the XmlReader class, and how the new XmlReaderSettings class makes it easy to create XmlReader instances with specific properties such as validation and access control for use in your applications. The New "Settings" Classes for XmlReader and XmlWriterTo read or write XML in version 1.x, you can create an instance of a class that inherits from XmlReader or XmlWriter, such as XmlTextReader or XmlTextWriter, and then set various properties before using that reader or writer. The XmlReader and XmlWriter classes are abstract, and so you cannot create instances of them directly. And each time you need a reader or writer, you have to go through the same process of creating an instance and setting the properties. In version 2.0, the fundamental technique for creating readers and writers has changed. There are two new classes named XmlReaderSettings and XmlWriterSettings that you use as a "factory" to generate instances of readers and writers on demand, without having to repeatedly set their properties. This has several benefits in that it:
The version 2.0 XmlReader and XmlWriter classes expose a new Static/Shared method in version 2.0 called Create, which allows you to create instances by specifying an XmlReaderSettings or XmlWriterSettingsclass instance that defines the behaviour you want. We'll look at how this works with the XmlReaderin this article, and XmlWriterin the net article. However, first, it's useful to see how the XmlReader and XmlWriter fit into the whole scheme of things in .NET version 2.0. Figure 1 shows the main data flows that involve the three types of XML document store and manipulation classes in System.Xml 2.0 and its subsidiary namespaces. You can see that theXmlReader and XmlWriter are a fundamental part of the flow when reading XML into, and saving it from other classes such as the document stores. Figure 1 - How the XmlReader and XmlWriter can be used with the XML Document Stores in v2.0 Not shown here are other areas where the XmlReader and XmlWriter are used, for example when reading XML using the SQLXML technology in SQL Server via an ADO.NET Command instance, or reading and writing XML with the new XslCompiledTransform class that performs XSL-T transformations. And, of course, you can use the methods of the XmlReader and XmlWriter classes directly to read and expose nodes from an XML document, or to create new XML documents. The XmlReaderSettings ClassThe XmlReaderSettings class is used to specify the behavior you want for XmlReader instances that you will create and use in your code. Figure 2 shows a schematic overview of the XmlReaderSettings class. You can see that the set of properties available is broadly similar to that you will be used to in the version 1.xXmlReader class. You can specify a range of properties that control the way XML is handled: including ignoring white-space and processing instructions, specifying the schema validation type and conformance level, preventing DTDs from being processed, and closing the underlying input stream automatically when the reader is closed. Figure 2 - The XmlReaderSettings Class There are also properties that return the current line number and character offset when reading a document, and the ability to switch on and off strict checking of the characters in the input stream (for example characters that are outside the legal range for XML documents). The XmlReaderSettings class also exposes a reference to an XmlResolver that is used to safely read external schemas, DTDs and entities; plus a reference to an ICredentials collection that contains the network credentials to be presented to the server when accessing a remote document. To resolve namespaces within the XML document, the XmlReaderSettings class also exposes a reference to an XmlNameTable. This is basically a collection of name/value pairs that specify the namespace prefixes and the corresponding namespace identifier declarations. You can also read an XML stream that doesn't contain the <?xml version="1.0"?> declaration, and read fragments of XML that are not - on their own - valid documents. You specify the conformance level, so that the reader will accept input that is not actually a complete XML document, for example a fragment that contains un-declared namespace prefixes. Some of the ways that you can use the XmlReaderSettings class are discussed next. We'll look at:
The example page shown in Figure 3 demonstrates most of the features listed above. You can run or download all of the samples at our Website at http://www.daveandal.net/articles/readwritexml/. This first example, named readersettings.aspx, allows you to turn on and off validation (including using a custom validation handler and trapping validation warnings), set the conformance level for a document or a fragment, and use an XmlResolver to limit access to the XML disk file. It also demonstrates reading typed values, as you'll see later in the article. There is a [view source] link at the bottom of the page that you can use to see the source code, which is fully commented to help you understand how it all works. Figure 3 - The Example Page that Demonstrates Using the XmlReaderSettings Class Creating an XmlReader with the XmlReaderSettings ClassTo create an XmlReader instance, you first instantiate an instance of the XmlReaderSettings class, set the properties you want, and then call the Create method of the XmlReader class. For example, this code creates an XmlReader that closes the underlying input stream when the reader is closed, ignores comments in the XML document, and reads the XML disk file named myfile.xml: Dim rs As New XmlReaderSettings() rs.CloseInput = True rs.IgnoreComments = True Dim xr As XmlReader = XmlReader.Create("C:\temp\myfile.xml", rs) Other overloads of the Create method allow you to generate an XmlReader over a Stream, or wrap an existing TextReader or XmlReader which is then used as the input to the new XmlReader. You can also pass an XmlParserContext instance as the third parameter of the Create method, which allows you to declare the namespaces and prefixes used in the document, and specify the language and the white-space handling options that the reader will use when reading the XML. Finally, you can use the Create method without specifying an XmlReaderSettings instance if you just want to create a single instance of an XmlReader, and set the various properties of the reader directly afterwards. The example page shown in Figure 3 provides a drop-down list where you can select from a range of XML disk files. It also declares a variable to hold an XmlParserContext instance, which is populated if you select the option to read an XML fragment instead of a complete and well-formed XML document. The XmlReader is then created using the static Create method against the XML file you select in the drop-down list: Dim xpc As XmlParserContext = Nothing ... ' create and populate the XmlParserContext here if reading an XML fragment ... Dim xr As XmlReader = Nothing Dim sPath As String = Server.MapPath("data/" & lstDocument.SelectedItem.Text) xr = XmlReader.Create(sPath, rs, xpc) If there is an error creating the XmlReader, for example a security exception or if the XML file or stream you specify does not exist, the exception is raised when you call the Create method. Therefore you should always use a Try..Catch construct to trap any such errors. Validating XML with the XmlReaderSettings and XmlReader ClassesOne of the stranger features in version 1.x of the System.Xml implementation is that you have to use a special class, XmlValidatingReader, to validate an XML document. And you have to create this XmlValidatingReader from an existing XmlReader instance. This is because validation adds an overhead to the reader class that wastes resources if validation is not required (although the readers do check that the document is well-formed). In version 2.0, you can validate a document directly when using an XmlReader. A range of properties on the XmlReaderSettings class allow you to specify one or more external XML schemas or DTDs using the XmlSchemaSet class (a collection of XmlSchema instances), and these are applied to the XML as it is read - depending on the settings you specify for the ValidationType and ValidationFlags property. The ValidationFlags property is combination of flag values from the XmlSchemaValidationFlags enumeration, as shown earlier in Figure 2. This enumeration contains five values:
To enable validation in an XmlReaderSettings class, before you create the XmlReader instances you need from it, you must perform two tasks. The first is to create an XmlSchemaSet and assign the schemas that will be used for validating the XML to it (unless the XML document contains an inline schema). In the example page we use an XML document that refrences two schemas - one that defines the main elements in the document and one that defines the reviewed element with the namespace prefix "rv". This is the standard and valid XML document: <?xml version="1.0" encoding="utf-8"?> <root xmlns="http://myns/slidesdemo" xmlns:rv="http://myns/slidesdemo/reviewdate"> <session name="All about XML"> <slides> <slide position="1"> <title>Agenda</title> <rv:reviewed>2004-05-10T00:00:00</rv:reviewed> </slide> <slide position="2"> <title>Introduction</title> <rv:reviewed>2003-10-22T00:00:00</rv:reviewed> </slide> <slide position="3"> <title>Code Examples</title> <rv:reviewed>2004-03-02T00:00:00</rv:reviewed> </slide> </slides> </session> </root> You can see the two namespace declarations in the root element, and these are used in the targetNamespace attribute of the two schemas. So we need to add both of these schemas to the XmlSchemaSet, and then assign the XmlSchemaSet to the Schema property of the XmlReaderSettings instance: Dim ss As New XmlSchemaSet() ss.Add("http://myns/slidesdemo", Server.MapPath("data/schema/slides.xsd")) ss.Add("http://myns/slidesdemo/reviewdate", Server.MapPath("data/schema/slidesrev.xsd")) rs.Schemas = ss Then we turn on validation by setting the ValidationType and specifying the ValidationFlags we want to be active. In this case, we've specified that validation should be carried out against an XML schema, though you could use ValidationType.Auto, in which case the reader will detect which type of schema or DTD is being used: rs.ValidationType = ValidationType.Schema rs.ValidationFlags = (rs.ValidationFlags + XmlSchemaValidationFlags.ProcessSchemaLocation) Handling XML Validation Errors and WarningsNow any validation error will raise an XmlSchemaException when the XML is read. So you can handle this error to find out what happened, either when loading another object with the XmlReader (for example passing it to the Load method of an XmlDocument instance), or when reading individual nodes directly. In the example page, we've previously created a StringBuilder to hold the results of processing the XML disk file, and it can be populated with the validation error details like this: Try While xr.Read() ' ... handle and display XML document content here ... End While Catch xsx As XmlSchemaException ' document failed validation against schema so display details builder.Append("<p><b>ERROR validating XML document against schema:</b><br />") builder.Append("Message = " & xsx.Message & "<br />") builder.Append("LineNumber = " & xsx.LineNumber.ToString()) builder.Append(" LinePosition = " & xsx.LinePosition.ToString() & "</p>") ... Figure 4 shows the result of validating an XML document that contains invalid content. This document contains the element <slideposition="two">, which is invalid because the data type defined in the schema for this element is xs:unsignedByte. Notice that processing of the XML document stops when the error is encountered (if you do not tick the first checkbox in the page, it will read the XML without validating it and you'll be able to see the values of all the nodes). Figure 4 - Validating a Document with an XmlReaderSettings and XmlReader Class However, the XmlReader may also raise other types of exception when reading the XML document, for example if the file becomes unavailable or the input stream is disrupted. In this case, you should also include a generic error handler section, and remember to close the XmlReaderas well when you have finished using it: ... Catch ex As Exception ' error reading document so display details builder.Append("<p><b>ERROR reading XML document:</b><br />") builder.Append("Message = " & ex.Message & "</p>") Finally Try xr.Close() Catch End Try End Try Another approach is to use a Using construct, now available in VB.NET as well as C#, to ensure that the reader is correctly disposed when you have finished with it. You don’t have to remember to call Close in this case, though it's still good practice to do so. For example: Using xr As XmlReader = XmlReader.Create("test.xml", rs) ' ... use the XmlReader here ... ' ... still good practice to call Close when complete ... End Using Using a Custom Handler to Trap XML Validation Errors and WarningsTrapping validation errors, as shown above, is useful, but sometimes you want to handle validation errors yourself, without having processing stop when the first one is encountered. As in version 1.x, you can add a custom handler to the ValidationEventHandler property of the XmlReader (in version 2.0, this is done via the XmlReaderSettings class), which is called when any validation error is raised. In VB.NET, you can use the following to specify the event handler named MyValidationHandler for this event: AddHandler rs.ValidationEventHandler, AddressOf MyValidationHandler In C#, you would use: rs.ValidationEventHandler += MyValidationHandler; A simple event handler is used in the example page, which adds details of the validation error to the StringBuilder so that they can be displayed in the page afterwards. And, because we are handling the validation event ourselves, processing of the XML document continues when each error is detected: Sub MyValidationHandler(ByVal sender As Object, ByVal e As ValidationEventArgs) ' display error details builder.Append("<p><b>ValidationEventHandler detected an error:</b><br />") builder.Append("Message = " & e.Message & "<br />") builder.Append("Severity = " & e.Severity.ToString() & " ") ' get line number and character offset from exception builder.Append("LineNumber = " & e.Exception.LineNumber.ToString() & " ") builder.Append("LinePosition = " & e.Exception.LinePosition.ToString() & "</p>") End Sub By default, only validation errors are reported when you validate an XML document. However, validation can also raise warnings that indicate a problem with the XML, but do not necessarily mean it is invalid. A prime example is when you are reading a fragment of XML that does not contain the matching namespace declaration. To see these warnings, you must handle the validation event yourself, as demonstrated in the previous section, and also turn on validation warnings by setting the ReportValidationWarnings flag in the ValidationFlags property of the XmlReaderSettings instance before you create the XmlReader: rs.ValidationFlags = (rs.ValidationFlags _ + XmlSchemaValidationFlags.ReportValidationWarnings) Now the custom event handler can report the validation warnings as well as validation errors. When a warning is encountered, the value of the Severity property of the ValidationEventArgs instance passed to the event handler will be "Warning".
|
Wednesday, February 23, 2011
Reading and Writing XML in .NET 2.0
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment