I spent this afternoon writing some classes to do XML processing similar to SAX (Simple API for XML). It's not a full SAX implementation, but it does operate in a similar fashion. WARNING: This is a very simple implementation, there are lots of things that aren't handled. The basic idea behind SAX is that the parser does not build up any data structures, instead it calls methods in a handler for the start and end of each element, and for the characters between tags. The simplest use (in Suneido) would be: xr = new XmlReader This won't do much since it will use the default handler class - XmlContentHandler, which doesn't do anything. Here is a very simple XmlContentHandlerSample: XmlContentHandler (We don't have to derive from XmlContentHandler if we're going to implement all the methods, but it's a good idea.) We can now use this with: xr = new XmlReader (if your handler uses instance variables, you'd have to create an instance, e.g. new MyHandler) If xmltext was: <body><tag color="red" size="12">chars</tag><solo /></body> we would get the following output: START body #() For simplicity, the attributes are passed as a Suneido object, rather than an instance of an Attribute class. Notice that we get START and END for the solo tag even though it did not have separate opening and closing tags. ImplementationWe'll start with XmlReader - the main component. We initialize the content handler to the default XmlContentHandler. class We allow setting a different content handler. SetContentHandler(contentHandler) And now the main part. Parse consists of one large loop. Each iteration processes characters (if present) and then a tag. Processed text is removed from the beginning of text and when the text is empty, we're done. Parse(text) A private method is used to process the attributes for a tag. It uses the built-in Scanner class to simplify reading quoted strings. attributes(s) Errors are handled by throwing exceptions. A real SAX implementation would also have an error handler similar to the content handler. XmlContentHandler is simple: class It's often useful to output Xml in a similar fashion. Here is XmlWriter, a content handler that simply puts the XML back into a string: XmlContentHandler The only complication here is detecting empty elements and outputting <tag /> instead of <tag></tag>. To handle this, StartElement saves the tag in .element and then EndElement checks for it. The GetText method allows retrieving the text. For example: xr = new XmlReader A good improvement would be to "pretty print" the XML - adding newlines and indenting to make it more human readable. These classes will be in stdlib in the next release. Until then, you can download sax.zip and use LibraryView > Import Records to load it into a library. This also includes simple XmlReaderTest and XmlWriterTest. Let me know what you think. I haven't done much testing, so if you find any problems, please report them in the User Group. Next, I'd like to write a simple version of XML-RPC (XML remote procedure calls). (Using these classes to process the XML, of course!) ReferencesSAX2 - Processing XML Efficiently With Java, David Brownell, O'Reilly, 2002 |
