This class adds implementation for normalizeDocument method.
It acts as if the document was going through a save and load cycle, putting
the document in a "normal" form. The actual result depends on the features being set
and governing what operations actually take place. See setNormalizationFeature for details.
Noticeably this method normalizes Text nodes, makes the document "namespace wellformed",
according to the algorithm described below in pseudo code, by adding missing namespace
declaration attributes and adding or changing namespace prefixes, updates the replacement
tree of EntityReference nodes, normalizes attribute values, etc.
Mutation events, when supported, are generated to reflect the changes occuring on the
document.
See Namespace normalization for details on how namespace declaration attributes and prefixes
are normalized.
NOTE: There is an initial support for DOM revalidation with XML Schema as a grammar.
The tree might not be validated correctly if entityReferences, CDATA sections are
present in the tree. The PSVI information is not exposed, normalized data (including element
default content is not available).
DEBUG
protected static final boolean DEBUG
Debug namespace fix up algorithm
DEBUG_EVENTS
protected static final boolean DEBUG_EVENTS
Debug document handler events
DEBUG_ND
protected static final boolean DEBUG_ND
Debug normalize document
EMPTY_STRING
public static final org.apache.xerces.xni.XMLString EMPTY_STRING
Empty string to pass to the validator. *
PREFIX
protected static final String PREFIX
prefix added by namespace fixup algorithm should follow a pattern "NS" + index
abort
public static final RuntimeException abort
If the user stops the process, this exception will be thrown.
fAttributeList
protected final Vector fAttributeList
list of attributes
fCurrentNode
protected org.w3c.dom.Node fCurrentNode
for setting the PSVI
fErrorHandler
protected org.w3c.dom.DOMErrorHandler fErrorHandler
error handler. may be null.
fLocalNSBinder
protected final org.apache.xerces.xni.NamespaceContext fLocalNSBinder
Stores all namespace bindings on the current element
fLocator
protected final DOMLocatorImpl fLocator
DOM Locator - for namespace fixup algorithm
fNamespaceContext
protected final org.apache.xerces.xni.NamespaceContext fNamespaceContext
The namespace context of this document: stores namespaces in scope
fNamespaceValidation
protected boolean fNamespaceValidation
fPSVI
protected boolean fPSVI
fQName
protected final org.apache.xerces.xni.QName fQName
fSymbolTable
protected SymbolTable fSymbolTable
symbol table
fValidationHandler
protected RevalidationHandler fValidationHandler
Validation handler represents validator instance.
addNamespaceDecl
protected final void addNamespaceDecl(String prefix,
String uri,
ElementImpl element)
Adds a namespace attribute or replaces the value of existing namespace
attribute with the given prefix and value for URI.
In case prefix is empty will add/update default namespace declaration.
characters
public void characters(org.apache.xerces.xni.XMLString text,
org.apache.xerces.xni.Augmentations augs)
throws org.apache.xerces.xni.XNIException
Character content.
text
- The content.augs
- Additional information that may include infoset augmentations
comment
public void comment(org.apache.xerces.xni.XMLString text,
org.apache.xerces.xni.Augmentations augs)
throws org.apache.xerces.xni.XNIException
A comment.
text
- The text in the comment.augs
- Additional information that may include infoset augmentations
doctypeDecl
public void doctypeDecl(String rootElement,
String publicId,
String systemId,
org.apache.xerces.xni.Augmentations augs)
throws org.apache.xerces.xni.XNIException
Notifies of the presence of the DOCTYPE line in the document.
rootElement
- The name of the root element.publicId
- The public identifier if an external DTD or null
if the external DTD is specified using SYSTEM.systemId
- The system identifier if an external DTD, null
otherwise.augs
- Additional information that may include infoset augmentations
emptyElement
public void emptyElement(org.apache.xerces.xni.QName element,
org.apache.xerces.xni.XMLAttributes attributes,
org.apache.xerces.xni.Augmentations augs)
throws org.apache.xerces.xni.XNIException
An empty element.
element
- The name of the element.attributes
- The element attributes.augs
- Additional information that may include infoset augmentations
endCDATA
public void endCDATA(org.apache.xerces.xni.Augmentations augs)
throws org.apache.xerces.xni.XNIException
The end of a CDATA section.
augs
- Additional information that may include infoset augmentations
endDocument
public void endDocument(org.apache.xerces.xni.Augmentations augs)
throws org.apache.xerces.xni.XNIException
The end of the document.
augs
- Additional information that may include infoset augmentations
endElement
public void endElement(org.apache.xerces.xni.QName element,
org.apache.xerces.xni.Augmentations augs)
throws org.apache.xerces.xni.XNIException
The end of an element.
element
- The name of the element.augs
- Additional information that may include infoset augmentations
endGeneralEntity
public void endGeneralEntity(String name,
org.apache.xerces.xni.Augmentations augs)
throws org.apache.xerces.xni.XNIException
This method notifies the end of a general entity.
Note: This method is not called for entity references
appearing as part of attribute values.
name
- The name of the entity.augs
- Additional information that may include infoset augmentations
expandEntityRef
protected final void expandEntityRef(org.w3c.dom.Node parent,
org.w3c.dom.Node reference)
getDocumentSource
public org.apache.xerces.xni.parser.XMLDocumentSource getDocumentSource()
Returns the document source.
ignorableWhitespace
public void ignorableWhitespace(org.apache.xerces.xni.XMLString text,
org.apache.xerces.xni.Augmentations augs)
throws org.apache.xerces.xni.XNIException
Ignorable whitespace. For this method to be called, the document
source must have some way of determining that the text containing
only whitespace characters should be considered ignorable. For
example, the validator can determine if a length of whitespace
characters in the document are ignorable based on the element
content model.
text
- The ignorable whitespace.augs
- Additional information that may include infoset augmentations
isAttrValueWF
public static final void isAttrValueWF(org.w3c.dom.DOMErrorHandler errorHandler,
DOMErrorImpl error,
DOMLocatorImpl locator,
org.w3c.dom.NamedNodeMap attributes,
org.w3c.dom.Attr a,
String value,
boolean xml11Version)
NON-DOM: check if attribute value is well-formed
isCDataWF
public static final void isCDataWF(org.w3c.dom.DOMErrorHandler errorHandler,
DOMErrorImpl error,
DOMLocatorImpl locator,
String datavalue,
boolean isXML11Version)
Check if CDATA section is well-formed
datavalue
- isXML11Version
- = true if XML 1.1
isCommentWF
public static final void isCommentWF(org.w3c.dom.DOMErrorHandler errorHandler,
DOMErrorImpl error,
DOMLocatorImpl locator,
String datavalue,
boolean isXML11Version)
NON-DOM: check if value of the comment is well-formed
datavalue
- isXML11Version
- = true if XML 1.1
isXMLCharWF
public static final void isXMLCharWF(org.w3c.dom.DOMErrorHandler errorHandler,
DOMErrorImpl error,
DOMLocatorImpl locator,
String datavalue,
boolean isXML11Version)
NON-DOM: check for valid XML characters as per the XML version
datavalue
- isXML11Version
- = true if XML 1.1
normalizeNode
protected org.w3c.dom.Node normalizeNode(org.w3c.dom.Node node)
This method acts as if the document was going through a save
and load cycle, putting the document in a "normal" form. The actual result
depends on the features being set and governing what operations actually
take place. See setNormalizationFeature for details. Noticeably this method
normalizes Text nodes, makes the document "namespace wellformed",
according to the algorithm described below in pseudo code, by adding missing
namespace declaration attributes and adding or changing namespace prefixes, updates
the replacement tree of EntityReference nodes,normalizes attribute values, etc.
node
- Modified node or null. If node is returned, we need
to normalize again starting on the node returned.
processingInstruction
public void processingInstruction(String target,
org.apache.xerces.xni.XMLString data,
org.apache.xerces.xni.Augmentations augs)
throws org.apache.xerces.xni.XNIException
A processing instruction. Processing instructions consist of a
target name and, optionally, text data. The data is only meaningful
to the application.
Typically, a processing instruction's data will contain a series
of pseudo-attributes. These pseudo-attributes follow the form of
element attributes but are
not parsed or presented
to the application as anything other than text. The application is
responsible for parsing the data.
target
- The target.data
- The data or null if none specified.augs
- Additional information that may include infoset augmentations
reportDOMError
public static final void reportDOMError(org.w3c.dom.DOMErrorHandler errorHandler,
DOMErrorImpl error,
DOMLocatorImpl locator,
String message,
short severity,
String type)
Reports a DOM error to the user handler.
If the error is fatal, the processing will be always aborted.
setDocumentSource
public void setDocumentSource(org.apache.xerces.xni.parser.XMLDocumentSource source)
Sets the document source.
startCDATA
public void startCDATA(org.apache.xerces.xni.Augmentations augs)
throws org.apache.xerces.xni.XNIException
The start of a CDATA section.
augs
- Additional information that may include infoset augmentations
startDocument
public void startDocument(org.apache.xerces.xni.XMLLocator locator,
String encoding,
org.apache.xerces.xni.NamespaceContext namespaceContext,
org.apache.xerces.xni.Augmentations augs)
throws org.apache.xerces.xni.XNIException
The start of the document.
locator
- The document locator, or null if the document
location cannot be reported during the parsing
of this document. However, it is strongly
recommended that a locator be supplied that can
at least report the system identifier of the
document.encoding
- The auto-detected IANA encoding name of the entity
stream. This value will be null in those situations
where the entity encoding is not auto-detected (e.g.
internal entities or a document entity that is
parsed from a java.io.Reader).namespaceContext
- The namespace context in effect at the
start of this document.
This object represents the current context.
Implementors of this class are responsible
for copying the namespace bindings from the
the current context (and its parent contexts)
if that information is important.augs
- Additional information that may include infoset augmentations
startElement
public void startElement(org.apache.xerces.xni.QName element,
org.apache.xerces.xni.XMLAttributes attributes,
org.apache.xerces.xni.Augmentations augs)
throws org.apache.xerces.xni.XNIException
The start of an element.
element
- The name of the element.attributes
- The element attributes.augs
- Additional information that may include infoset augmentations
startGeneralEntity
public void startGeneralEntity(String name,
org.apache.xerces.xni.XMLResourceIdentifier identifier,
String encoding,
org.apache.xerces.xni.Augmentations augs)
throws org.apache.xerces.xni.XNIException
This method notifies the start of a general entity.
Note: This method is not called for entity references
appearing as part of attribute values.
name
- The name of the general entity.identifier
- The resource identifier.encoding
- The auto-detected IANA encoding name of the entity
stream. This value will be null in those situations
where the entity encoding is not auto-detected (e.g.
internal entities or a document entity that is
parsed from a java.io.Reader).augs
- Additional information that may include infoset augmentations
textDecl
public void textDecl(String version,
String encoding,
org.apache.xerces.xni.Augmentations augs)
throws org.apache.xerces.xni.XNIException
Notifies of the presence of a TextDecl line in an entity. If present,
this method will be called immediately following the startEntity call.
Note: This method will never be called for the
document entity; it is only called for external general entities
referenced in document content.
Note: This method is not called for entity references
appearing as part of attribute values.
version
- The XML version, or null if not specified.encoding
- The IANA encoding name of the entity.augs
- Additional information that may include infoset augmentations
updateQName
protected final void updateQName(org.w3c.dom.Node node,
org.apache.xerces.xni.QName qname)
xmlDecl
public void xmlDecl(String version,
String encoding,
String standalone,
org.apache.xerces.xni.Augmentations augs)
throws org.apache.xerces.xni.XNIException
Notifies of the presence of an XMLDecl line in the document. If
present, this method will be called immediately following the
startDocument call.
version
- The XML version.encoding
- The IANA encoding name of the document, or null if
not specified.standalone
- The standalone value, or null if not specified.augs
- Additional information that may include infoset augmentations