org.apache.xerces.dom

Class DOMNormalizer

Implemented Interfaces:
org.apache.xerces.xni.XMLDocumentHandler

public class DOMNormalizer
extends java.lang.Object
implements org.apache.xerces.xni.XMLDocumentHandler

This class adds implementation for normalizeDocument method. It acts as if the document was going through a save and load cycle, putting the document in a "normal" form. The actual result depends on the features being set and governing what operations actually take place. See setNormalizationFeature for details. Noticeably this method normalizes Text nodes, makes the document "namespace wellformed", according to the algorithm described below in pseudo code, by adding missing namespace declaration attributes and adding or changing namespace prefixes, updates the replacement tree of EntityReference nodes, normalizes attribute values, etc. Mutation events, when supported, are generated to reflect the changes occuring on the document. See Namespace normalization for details on how namespace declaration attributes and prefixes are normalized. NOTE: There is an initial support for DOM revalidation with XML Schema as a grammar. The tree might not be validated correctly if entityReferences, CDATA sections are present in the tree. The PSVI information is not exposed, normalized data (including element default content is not available).
Version:
$Id: DOMNormalizer.java 380043 2006-02-23 05:23:19Z mrglavas $
Authors:
Elena Litani, IBM
Neeraj Bajaj, Sun Microsystems, inc.

Nested Class Summary

protected class
DOMNormalizer.XMLAttributesProxy

Field Summary

protected static boolean
DEBUG
Debug namespace fix up algorithm
protected static boolean
DEBUG_EVENTS
Debug document handler events
protected static boolean
DEBUG_ND
Debug normalize document
static org.apache.xerces.xni.XMLString
EMPTY_STRING
Empty string to pass to the validator.
protected static String
PREFIX
prefix added by namespace fixup algorithm should follow a pattern "NS" + index
static RuntimeException
abort
If the user stops the process, this exception will be thrown.
protected DOMNormalizer.XMLAttributesProxy
fAttrProxy
protected Vector
fAttributeList
list of attributes
protected DOMConfigurationImpl
fConfiguration
protected org.w3c.dom.Node
fCurrentNode
for setting the PSVI
protected CoreDocumentImpl
fDocument
protected org.w3c.dom.DOMErrorHandler
fErrorHandler
error handler.
protected org.apache.xerces.xni.NamespaceContext
fLocalNSBinder
Stores all namespace bindings on the current element
protected DOMLocatorImpl
fLocator
DOM Locator - for namespace fixup algorithm
protected org.apache.xerces.xni.NamespaceContext
fNamespaceContext
The namespace context of this document: stores namespaces in scope
protected boolean
fNamespaceValidation
protected boolean
fPSVI
protected org.apache.xerces.xni.QName
fQName
protected SymbolTable
fSymbolTable
symbol table
protected RevalidationHandler
fValidationHandler
Validation handler represents validator instance.

Constructor Summary

DOMNormalizer()

Method Summary

protected void
addNamespaceDecl(String prefix, String uri, ElementImpl element)
Adds a namespace attribute or replaces the value of existing namespace attribute with the given prefix and value for URI.
void
characters(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)
Character content.
void
comment(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)
A comment.
void
doctypeDecl(String rootElement, String publicId, String systemId, org.apache.xerces.xni.Augmentations augs)
Notifies of the presence of the DOCTYPE line in the document.
void
emptyElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes, org.apache.xerces.xni.Augmentations augs)
An empty element.
void
endCDATA(org.apache.xerces.xni.Augmentations augs)
The end of a CDATA section.
void
endDocument(org.apache.xerces.xni.Augmentations augs)
The end of the document.
void
endElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs)
The end of an element.
void
endGeneralEntity(String name, org.apache.xerces.xni.Augmentations augs)
This method notifies the end of a general entity.
protected void
expandEntityRef(org.w3c.dom.Node parent, org.w3c.dom.Node reference)
org.apache.xerces.xni.parser.XMLDocumentSource
getDocumentSource()
Returns the document source.
void
ignorableWhitespace(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)
Ignorable whitespace.
static void
isAttrValueWF(org.w3c.dom.DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, org.w3c.dom.NamedNodeMap attributes, org.w3c.dom.Attr a, String value, boolean xml11Version)
NON-DOM: check if attribute value is well-formed
static void
isCDataWF(org.w3c.dom.DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, String datavalue, boolean isXML11Version)
Check if CDATA section is well-formed
static void
isCommentWF(org.w3c.dom.DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, String datavalue, boolean isXML11Version)
NON-DOM: check if value of the comment is well-formed
static void
isXMLCharWF(org.w3c.dom.DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, String datavalue, boolean isXML11Version)
NON-DOM: check for valid XML characters as per the XML version
protected void
namespaceFixUp(ElementImpl element, AttributeMap attributes)
protected void
normalizeDocument(CoreDocumentImpl document, DOMConfigurationImpl config)
Normalizes document.
protected org.w3c.dom.Node
normalizeNode(org.w3c.dom.Node node)
This method acts as if the document was going through a save and load cycle, putting the document in a "normal" form.
void
processingInstruction(String target, org.apache.xerces.xni.XMLString data, org.apache.xerces.xni.Augmentations augs)
A processing instruction.
static void
reportDOMError(org.w3c.dom.DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, String message, short severity, String type)
Reports a DOM error to the user handler.
void
setDocumentSource(org.apache.xerces.xni.parser.XMLDocumentSource source)
Sets the document source.
void
startCDATA(org.apache.xerces.xni.Augmentations augs)
The start of a CDATA section.
void
startDocument(org.apache.xerces.xni.XMLLocator locator, String encoding, org.apache.xerces.xni.NamespaceContext namespaceContext, org.apache.xerces.xni.Augmentations augs)
The start of the document.
void
startElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes, org.apache.xerces.xni.Augmentations augs)
The start of an element.
void
startGeneralEntity(String name, org.apache.xerces.xni.XMLResourceIdentifier identifier, String encoding, org.apache.xerces.xni.Augmentations augs)
This method notifies the start of a general entity.
void
textDecl(String version, String encoding, org.apache.xerces.xni.Augmentations augs)
Notifies of the presence of a TextDecl line in an entity.
protected void
updateQName(org.w3c.dom.Node node, org.apache.xerces.xni.QName qname)
void
xmlDecl(String version, String encoding, String standalone, org.apache.xerces.xni.Augmentations augs)
Notifies of the presence of an XMLDecl line in the document.

Field Details

DEBUG

protected static final boolean DEBUG
Debug namespace fix up algorithm
Field Value:
false

DEBUG_EVENTS

protected static final boolean DEBUG_EVENTS
Debug document handler events
Field Value:
false

DEBUG_ND

protected static final boolean DEBUG_ND
Debug normalize document
Field Value:
false

EMPTY_STRING

public static final org.apache.xerces.xni.XMLString EMPTY_STRING
Empty string to pass to the validator. *

PREFIX

protected static final String PREFIX
prefix added by namespace fixup algorithm should follow a pattern "NS" + index

abort

public static final RuntimeException abort
If the user stops the process, this exception will be thrown.

fAttrProxy

protected final DOMNormalizer.XMLAttributesProxy fAttrProxy

fAttributeList

protected final Vector fAttributeList
list of attributes

fConfiguration

protected DOMConfigurationImpl fConfiguration

fCurrentNode

protected org.w3c.dom.Node fCurrentNode
for setting the PSVI

fDocument

protected CoreDocumentImpl fDocument

fErrorHandler

protected org.w3c.dom.DOMErrorHandler fErrorHandler
error handler. may be null.

fLocalNSBinder

protected final org.apache.xerces.xni.NamespaceContext fLocalNSBinder
Stores all namespace bindings on the current element

fLocator

protected final DOMLocatorImpl fLocator
DOM Locator - for namespace fixup algorithm

fNamespaceContext

protected final org.apache.xerces.xni.NamespaceContext fNamespaceContext
The namespace context of this document: stores namespaces in scope

fNamespaceValidation

protected boolean fNamespaceValidation

fPSVI

protected boolean fPSVI

fQName

protected final org.apache.xerces.xni.QName fQName

fSymbolTable

protected SymbolTable fSymbolTable
symbol table

fValidationHandler

protected RevalidationHandler fValidationHandler
Validation handler represents validator instance.

Constructor Details

DOMNormalizer

public DOMNormalizer()

Method Details

addNamespaceDecl

protected final void addNamespaceDecl(String prefix,
                                      String uri,
                                      ElementImpl element)
Adds a namespace attribute or replaces the value of existing namespace attribute with the given prefix and value for URI. In case prefix is empty will add/update default namespace declaration.
Parameters:
prefix -
uri -

characters

public void characters(org.apache.xerces.xni.XMLString text,
                       org.apache.xerces.xni.Augmentations augs)
            throws org.apache.xerces.xni.XNIException
Character content.
Parameters:
text - The content.
augs - Additional information that may include infoset augmentations

comment

public void comment(org.apache.xerces.xni.XMLString text,
                    org.apache.xerces.xni.Augmentations augs)
            throws org.apache.xerces.xni.XNIException
A comment.
Parameters:
text - The text in the comment.
augs - Additional information that may include infoset augmentations

doctypeDecl

public void doctypeDecl(String rootElement,
                        String publicId,
                        String systemId,
                        org.apache.xerces.xni.Augmentations augs)
            throws org.apache.xerces.xni.XNIException
Notifies of the presence of the DOCTYPE line in the document.
Parameters:
rootElement - The name of the root element.
publicId - The public identifier if an external DTD or null if the external DTD is specified using SYSTEM.
systemId - The system identifier if an external DTD, null otherwise.
augs - Additional information that may include infoset augmentations

emptyElement

public void emptyElement(org.apache.xerces.xni.QName element,
                         org.apache.xerces.xni.XMLAttributes attributes,
                         org.apache.xerces.xni.Augmentations augs)
            throws org.apache.xerces.xni.XNIException
An empty element.
Parameters:
element - The name of the element.
attributes - The element attributes.
augs - Additional information that may include infoset augmentations

endCDATA

public void endCDATA(org.apache.xerces.xni.Augmentations augs)
            throws org.apache.xerces.xni.XNIException
The end of a CDATA section.
Parameters:
augs - Additional information that may include infoset augmentations

endDocument

public void endDocument(org.apache.xerces.xni.Augmentations augs)
            throws org.apache.xerces.xni.XNIException
The end of the document.
Parameters:
augs - Additional information that may include infoset augmentations

endElement

public void endElement(org.apache.xerces.xni.QName element,
                       org.apache.xerces.xni.Augmentations augs)
            throws org.apache.xerces.xni.XNIException
The end of an element.
Parameters:
element - The name of the element.
augs - Additional information that may include infoset augmentations

endGeneralEntity

public void endGeneralEntity(String name,
                             org.apache.xerces.xni.Augmentations augs)
            throws org.apache.xerces.xni.XNIException
This method notifies the end of a general entity.

Note: This method is not called for entity references appearing as part of attribute values.

Parameters:
name - The name of the entity.
augs - Additional information that may include infoset augmentations

expandEntityRef

protected final void expandEntityRef(org.w3c.dom.Node parent,
                                     org.w3c.dom.Node reference)

getDocumentSource

public org.apache.xerces.xni.parser.XMLDocumentSource getDocumentSource()
Returns the document source.

ignorableWhitespace

public void ignorableWhitespace(org.apache.xerces.xni.XMLString text,
                                org.apache.xerces.xni.Augmentations augs)
            throws org.apache.xerces.xni.XNIException
Ignorable whitespace. For this method to be called, the document source must have some way of determining that the text containing only whitespace characters should be considered ignorable. For example, the validator can determine if a length of whitespace characters in the document are ignorable based on the element content model.
Parameters:
text - The ignorable whitespace.
augs - Additional information that may include infoset augmentations

isAttrValueWF

public static final void isAttrValueWF(org.w3c.dom.DOMErrorHandler errorHandler,
                                       DOMErrorImpl error,
                                       DOMLocatorImpl locator,
                                       org.w3c.dom.NamedNodeMap attributes,
                                       org.w3c.dom.Attr a,
                                       String value,
                                       boolean xml11Version)
NON-DOM: check if attribute value is well-formed
Parameters:
attributes -
a -
value -

isCDataWF

public static final void isCDataWF(org.w3c.dom.DOMErrorHandler errorHandler,
                                   DOMErrorImpl error,
                                   DOMLocatorImpl locator,
                                   String datavalue,
                                   boolean isXML11Version)
Check if CDATA section is well-formed
Parameters:
datavalue -
isXML11Version - = true if XML 1.1

isCommentWF

public static final void isCommentWF(org.w3c.dom.DOMErrorHandler errorHandler,
                                     DOMErrorImpl error,
                                     DOMLocatorImpl locator,
                                     String datavalue,
                                     boolean isXML11Version)
NON-DOM: check if value of the comment is well-formed
Parameters:
datavalue -
isXML11Version - = true if XML 1.1

isXMLCharWF

public static final void isXMLCharWF(org.w3c.dom.DOMErrorHandler errorHandler,
                                     DOMErrorImpl error,
                                     DOMLocatorImpl locator,
                                     String datavalue,
                                     boolean isXML11Version)
NON-DOM: check for valid XML characters as per the XML version
Parameters:
datavalue -
isXML11Version - = true if XML 1.1

namespaceFixUp

protected final void namespaceFixUp(ElementImpl element,
                                    AttributeMap attributes)

normalizeDocument

protected void normalizeDocument(CoreDocumentImpl document,
                                 DOMConfigurationImpl config)
Normalizes document. Note: reset() must be called before this method.

normalizeNode

protected org.w3c.dom.Node normalizeNode(org.w3c.dom.Node node)
This method acts as if the document was going through a save and load cycle, putting the document in a "normal" form. The actual result depends on the features being set and governing what operations actually take place. See setNormalizationFeature for details. Noticeably this method normalizes Text nodes, makes the document "namespace wellformed", according to the algorithm described below in pseudo code, by adding missing namespace declaration attributes and adding or changing namespace prefixes, updates the replacement tree of EntityReference nodes,normalizes attribute values, etc.
Parameters:
node - Modified node or null. If node is returned, we need to normalize again starting on the node returned.
Returns:
the normalized Node

processingInstruction

public void processingInstruction(String target,
                                  org.apache.xerces.xni.XMLString data,
                                  org.apache.xerces.xni.Augmentations augs)
            throws org.apache.xerces.xni.XNIException
A processing instruction. Processing instructions consist of a target name and, optionally, text data. The data is only meaningful to the application.

Typically, a processing instruction's data will contain a series of pseudo-attributes. These pseudo-attributes follow the form of element attributes but are not parsed or presented to the application as anything other than text. The application is responsible for parsing the data.

Parameters:
target - The target.
data - The data or null if none specified.
augs - Additional information that may include infoset augmentations

reportDOMError

public static final void reportDOMError(org.w3c.dom.DOMErrorHandler errorHandler,
                                        DOMErrorImpl error,
                                        DOMLocatorImpl locator,
                                        String message,
                                        short severity,
                                        String type)
Reports a DOM error to the user handler. If the error is fatal, the processing will be always aborted.

setDocumentSource

public void setDocumentSource(org.apache.xerces.xni.parser.XMLDocumentSource source)
Sets the document source.

startCDATA

public void startCDATA(org.apache.xerces.xni.Augmentations augs)
            throws org.apache.xerces.xni.XNIException
The start of a CDATA section.
Parameters:
augs - Additional information that may include infoset augmentations

startDocument

public void startDocument(org.apache.xerces.xni.XMLLocator locator,
                          String encoding,
                          org.apache.xerces.xni.NamespaceContext namespaceContext,
                          org.apache.xerces.xni.Augmentations augs)
            throws org.apache.xerces.xni.XNIException
The start of the document.
Parameters:
locator - The document locator, or null if the document location cannot be reported during the parsing of this document. However, it is strongly recommended that a locator be supplied that can at least report the system identifier of the document.
encoding - The auto-detected IANA encoding name of the entity stream. This value will be null in those situations where the entity encoding is not auto-detected (e.g. internal entities or a document entity that is parsed from a java.io.Reader).
namespaceContext - The namespace context in effect at the start of this document. This object represents the current context. Implementors of this class are responsible for copying the namespace bindings from the the current context (and its parent contexts) if that information is important.
augs - Additional information that may include infoset augmentations

startElement

public void startElement(org.apache.xerces.xni.QName element,
                         org.apache.xerces.xni.XMLAttributes attributes,
                         org.apache.xerces.xni.Augmentations augs)
            throws org.apache.xerces.xni.XNIException
The start of an element.
Parameters:
element - The name of the element.
attributes - The element attributes.
augs - Additional information that may include infoset augmentations

startGeneralEntity

public void startGeneralEntity(String name,
                               org.apache.xerces.xni.XMLResourceIdentifier identifier,
                               String encoding,
                               org.apache.xerces.xni.Augmentations augs)
            throws org.apache.xerces.xni.XNIException
This method notifies the start of a general entity.

Note: This method is not called for entity references appearing as part of attribute values.

Parameters:
name - The name of the general entity.
identifier - The resource identifier.
encoding - The auto-detected IANA encoding name of the entity stream. This value will be null in those situations where the entity encoding is not auto-detected (e.g. internal entities or a document entity that is parsed from a java.io.Reader).
augs - Additional information that may include infoset augmentations

textDecl

public void textDecl(String version,
                     String encoding,
                     org.apache.xerces.xni.Augmentations augs)
            throws org.apache.xerces.xni.XNIException
Notifies of the presence of a TextDecl line in an entity. If present, this method will be called immediately following the startEntity call.

Note: This method will never be called for the document entity; it is only called for external general entities referenced in document content.

Note: This method is not called for entity references appearing as part of attribute values.

Parameters:
version - The XML version, or null if not specified.
encoding - The IANA encoding name of the entity.
augs - Additional information that may include infoset augmentations

updateQName

protected final void updateQName(org.w3c.dom.Node node,
                                 org.apache.xerces.xni.QName qname)

xmlDecl

public void xmlDecl(String version,
                    String encoding,
                    String standalone,
                    org.apache.xerces.xni.Augmentations augs)
            throws org.apache.xerces.xni.XNIException
Notifies of the presence of an XMLDecl line in the document. If present, this method will be called immediately following the startDocument call.
Parameters:
version - The XML version.
encoding - The IANA encoding name of the document, or null if not specified.
standalone - The standalone value, or null if not specified.
augs - Additional information that may include infoset augmentations

Copyright B) 1999-2006 The Apache Software Foundation. All Rights Reserved.