public class XMLParser extends Object
Parser class used to parse an XML document into a DOM object (Element). This code was originally developed to parse HTML and as a result isn't as strict as most XML parsers and can parse many HTML documents out of the box. The parser is mostly stateful (although it does have an event callback API as well), its modeled closely to the Java DOM API's.
In this sample an XML hierarchy is displayed using a Tree
:
Constructor and Description |
---|
XMLParser()
Constructs the XMLParser
|
Modifier and Type | Method and Description |
---|---|
void |
addCharEntitiesRange(String[] symbols,
int startcode)
Adds the given symbols array to the user defined char entities table with the startcode provided as the code of the first string, startcode+1 for the second etc.
|
void |
addCharEntity(String symbol,
int code)
Adds the given symbol and code to the user defined char entities table
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
|
protected void |
attribute(String tag,
String attributeName,
String value)
Invoked for every attribute value of the givne tag
This callback method is invoked only on the eventParser.
|
protected String |
convertCharEntity(String charEntity)
Converts a char entity to the matching character.
|
protected Element |
createNewElement(String name)
Creates a new element.
|
protected Element |
createNewTextElement(String text)
Creates a new text element.
|
protected void |
endTag(String tag)
Invoked when a tag ends
This callback method is invoked only on the eventParser.
|
void |
eventParser(Reader r)
The event parser requires deriving this class and overriding callback
methods to work effectively.
|
protected String |
getSupportedStandardName()
Returns a string identifying the document type this parser supports.
|
boolean |
isCaseSensitive()
Sets the parser to be case sensitive and retain case, otherwise it will convert all data to lower case
|
protected boolean |
isEmptyTag(String tagName)
Checks whether the specified tag is an empty tag
|
protected boolean |
isSupported(Element element)
Returns true if this element is supported, false otherwise
In XMLParser this always returns true, but subclasses can determine if an element is supported in their context according to its name etc.
|
protected boolean |
isWhiteSpace(char ch)
Checks if the specified character is a white space or not.
|
protected void |
notifyError(int errorId,
String tag,
String attribute,
String value,
String description)
A utility method used to notify an error to the ParserCallback and throw an IllegalArgumentException if parsingError returned false
|
Element |
parse(Reader is)
This is the entry point for parsing a document and the only non-private member method in this class
|
protected Element |
parseCommentOrXMLDeclaration(Reader is,
String endTag)
This utility method is used to parse comments and XML declarations in the XML.
|
protected Element |
parseTag(Reader is)
This method collects the tag name and all of its attributes.
|
protected void |
parseTagContent(Element element,
Reader is)
Parses tags content, accumulating text and child elements .
|
void |
setCaseSensitive(boolean caseSensitive)
Sets the parser to be case sensitive and retain case, otherwise it will convert all data to lower case
|
void |
setIncludeWhitespacesBetweenTags(boolean include) |
void |
setParserCallback(ParserCallback parserCallback)
Sets the specified callback to serve as the callback for parsing errors
|
protected boolean |
shouldEvaluate(Element element)
Checks if this element should be evaluated by the parser
This can be overriden by subclasses to skip certain elements
|
protected boolean |
startTag(String tag)
Invoked when a tag is opened, this method should return true to process
the tag or return false to skip the tag.
|
protected void |
textElement(String text)
Invoked when the event parser encounters a text element.
|
protected String getSupportedStandardName()
public void addCharEntity(String symbol, int code)
symbol
- The symbol to addcode
- The symbol's codepublic void addCharEntitiesRange(String[] symbols, int startcode)
symbols
- The symbols to addstartcode
- The symbol's codeprotected String convertCharEntity(String charEntity)
charEntity
- The char entity to convertpublic Element parse(Reader is)
is
- The InputStream containing the XMLprotected Element createNewElement(String name)
name
- The new element's nameprotected Element createNewTextElement(String text)
text
- The new element's textpublic void setIncludeWhitespacesBetweenTags(boolean include)
public void eventParser(Reader r) throws IOException
r
- the reader from which the data should be parsedIOException
- if an exception is thrown by the readerprotected void textElement(String text)
text
- the text encounteredprotected boolean startTag(String tag)
tag
- the tag nameprotected void endTag(String tag)
tag
- the tag nameprotected void attribute(String tag, String attributeName, String value)
tag
- the tag nameprotected void parseTagContent(Element element, Reader is) throws IOException
element
- The current parent elementis
- The InputStream containing the XMLIOException
- if an I/O error in the stream is encounteredprotected boolean isWhiteSpace(char ch)
ch
- The character to checkprotected Element parseTag(Reader is) throws IOException
is
- The InputStream containing the XMLIOException
- if an I/O error in the stream is encounteredprotected Element parseCommentOrXMLDeclaration(Reader is, String endTag) throws IOException
is
- The inputstreamendTag
- The endtag to look forIOException
protected boolean isEmptyTag(String tagName)
tagName
- The tag name to checkprotected void notifyError(int errorId, String tag, String attribute, String value, String description)
errorId
- The error ID, one of the ERROR_* constants in ParserCallbacktag
- The tag in which the error occured (Can be null for non-tag related errors)attribute
- The attribute in which the error occured (Can be null for non-attribute related errors)value
- The value in which the error occured (Can be null for non-value related errors)description
- A verbal description of the errorIllegalArgumentException
- If the parser callback returned false on this errorprotected boolean isSupported(Element element)
element
- The element to checkprotected boolean shouldEvaluate(Element element)
element
- The element to checkpublic void setParserCallback(ParserCallback parserCallback)
parserCallback
- The callback to use for parsing errorspublic boolean isCaseSensitive()
public void setCaseSensitive(boolean caseSensitive)
caseSensitive
- the caseSensitive to set