This class is responsible for holding scanning methods common to
scanning the XML document structure and content as well as the DTD
structure and content. Both XMLDocumentScanner and XMLDTDScanner inherit
from this base class.
This component requires the following features and properties from the
component manager that uses it:
- http://xml.org/sax/features/validation
- http://xml.org/sax/features/namespaces
- http://apache.org/xml/features/scanner/notify-char-refs
- http://apache.org/xml/properties/internal/symbol-table
- http://apache.org/xml/properties/internal/error-reporter
- http://apache.org/xml/properties/internal/entity-manager
DEBUG_ATTR_NORMALIZATION
protected static final boolean DEBUG_ATTR_NORMALIZATION
Debug attribute normalization.
ENTITY_MANAGER
protected static final String ENTITY_MANAGER
Property identifier: entity manager.
ERROR_REPORTER
protected static final String ERROR_REPORTER
Property identifier: error reporter.
NAMESPACES
protected static final String NAMESPACES
Feature identifier: namespaces.
NOTIFY_CHAR_REFS
protected static final String NOTIFY_CHAR_REFS
Feature identifier: notify character references.
PARSER_SETTINGS
protected static final String PARSER_SETTINGS
SYMBOL_TABLE
protected static final String SYMBOL_TABLE
Property identifier: symbol table.
VALIDATION
protected static final String VALIDATION
Feature identifier: validation.
fAmpSymbol
protected static final String fAmpSymbol
Symbol: "amp".
fAposSymbol
protected static final String fAposSymbol
Symbol: "apos".
fCharRefLiteral
protected String fCharRefLiteral
Literal value of the last character refence scanned.
fEncodingSymbol
protected static final String fEncodingSymbol
Symbol: "encoding".
fEntityDepth
protected int fEntityDepth
Entity depth.
fGtSymbol
protected static final String fGtSymbol
Symbol: "gt".
fLtSymbol
protected static final String fLtSymbol
Symbol: "lt".
fNamespaces
protected boolean fNamespaces
Namespaces.
fNotifyCharRefs
protected boolean fNotifyCharRefs
Character references notification.
fParserSettings
protected boolean fParserSettings
Internal parser-settings feature
fQuotSymbol
protected static final String fQuotSymbol
Symbol: "quot".
fReportEntity
protected boolean fReportEntity
Report entity boundary.
fScanningAttribute
protected boolean fScanningAttribute
Scanning attribute.
fStandaloneSymbol
protected static final String fStandaloneSymbol
Symbol: "standalone".
fSymbolTable
protected SymbolTable fSymbolTable
Symbol table.
fValidation
protected boolean fValidation
Validation. This feature identifier is:
http://xml.org/sax/features/validation
fVersionSymbol
protected static final String fVersionSymbol
Symbol: "version".
endEntity
public void endEntity(String name,
org.apache.xerces.xni.Augmentations augs)
throws org.apache.xerces.xni.XNIException
This method notifies the end of an entity. The document entity has
the pseudo-name of "[xml]" the DTD has the pseudo-name of "[dtd]"
parameter entity names start with '%'; and general entities are just
specified by their name.
name
- The name of the entity.augs
- Additional information that may include infoset augmentations
org.apache.xerces.xni.XNIException
- Thrown by handler to signal an error.
getFeature
public boolean getFeature(String featureId)
throws org.apache.xerces.xni.parser.XMLConfigurationException
getVersionNotSupportedKey
protected String getVersionNotSupportedKey()
isInvalid
protected boolean isInvalid(int value)
isInvalidLiteral
protected boolean isInvalidLiteral(int value)
isUnchangedByNormalization
protected int isUnchangedByNormalization(org.apache.xerces.xni.XMLString value)
Checks whether this string would be unchanged by normalization.
- -1 if the value would be unchanged by normalization,
otherwise the index of the first whitespace character which
would be transformed.
isValidNCName
protected boolean isValidNCName(int value)
isValidNameChar
protected boolean isValidNameChar(int value)
isValidNameStartChar
protected boolean isValidNameStartChar(int value)
isValidNameStartHighSurrogate
protected boolean isValidNameStartHighSurrogate(int value)
normalizeWhitespace
protected void normalizeWhitespace(org.apache.xerces.xni.XMLString value)
Normalize whitespace in an XMLString converting all whitespace
characters to space characters.
normalizeWhitespace
protected void normalizeWhitespace(org.apache.xerces.xni.XMLString value,
int fromIndex)
Normalize whitespace in an XMLString converting all whitespace
characters to space characters.
reportFatalError
protected void reportFatalError(String msgId,
Object[] args)
throws org.apache.xerces.xni.XNIException
Convenience function used in all XML scanners.
reset
protected void reset()
reset
public void reset(org.apache.xerces.xni.parser.XMLComponentManager componentManager)
throws org.apache.xerces.xni.parser.XMLConfigurationException
- reset in interface org.apache.xerces.xni.parser.XMLComponent
componentManager
- The component manager.
scanAttributeValue
protected boolean scanAttributeValue(org.apache.xerces.xni.XMLString value,
org.apache.xerces.xni.XMLString nonNormalizedValue,
String atName,
boolean checkEntities,
String eleName)
throws IOException,
org.apache.xerces.xni.XNIException
Scans an attribute value and normalizes whitespace converting all
whitespace characters to space characters.
[10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"
value
- The XMLString to fill in with the value.nonNormalizedValue
- The XMLString to fill in with the
non-normalized value.atName
- The name of the attribute being parsed (for error msgs).checkEntities
- true if undeclared entities should be reported as VC violation,
false if undeclared entities should be reported as WFC violation.eleName
- The name of element to which this attribute belongs.
- true if the non-normalized and normalized value are the same
Note: This method uses fStringBuffer2, anything in it
at the time of calling is lost.
scanCharReferenceValue
protected int scanCharReferenceValue(XMLStringBuffer buf,
XMLStringBuffer buf2)
throws IOException,
org.apache.xerces.xni.XNIException
Scans a character reference and append the corresponding chars to the
specified buffer.
[66] CharRef ::= '' [0-9]+ ';' | '' [0-9a-fA-F]+ ';'
Note: This method uses fStringBuffer, anything in it
at the time of calling is lost.
buf
- the character buffer to append chars tobuf2
- the character buffer to append non-normalized chars to
- the character value or (-1) on conversion failure
scanComment
protected void scanComment(XMLStringBuffer text)
throws IOException,
org.apache.xerces.xni.XNIException
Scans a comment.
[15] Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'
Note: Called after scanning past '<!--'
Note: This method uses fString, anything in it
at the time of calling is lost.
text
- The buffer to fill in with the text.
scanExternalID
protected void scanExternalID(String[] identifiers,
boolean optionalSystemId)
throws IOException,
org.apache.xerces.xni.XNIException
Scans External ID and return the public and system IDs.
identifiers
- An array of size 2 to return the system id,
and public id (in that order).optionalSystemId
- Specifies whether the system id is optional.
Note: This method uses fString and fStringBuffer,
anything in them at the time of calling is lost.
scanPI
protected void scanPI()
throws IOException,
org.apache.xerces.xni.XNIException
Scans a processing instruction.
[16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
[17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
Note: This method uses fString, anything in it
at the time of calling is lost.
scanPIData
protected void scanPIData(String target,
org.apache.xerces.xni.XMLString data)
throws IOException,
org.apache.xerces.xni.XNIException
Scans a processing data. This is needed to handle the situation
where a document starts with a processing instruction whose
target name starts with "xml". (e.g. xmlfoo)
Note: This method uses fStringBuffer, anything in it
at the time of calling is lost.
target
- The PI targetdata
- The string to fill in with the data
scanPseudoAttribute
public String scanPseudoAttribute(boolean scanningTextDecl,
org.apache.xerces.xni.XMLString value)
throws IOException,
org.apache.xerces.xni.XNIException
Scans a pseudo attribute.
scanningTextDecl
- True if scanning this pseudo-attribute for a
TextDecl; false if scanning XMLDecl. This
flag is needed to report the correct type of
error.value
- The string to fill in with the attribute
value.
- The name of the attribute
Note: This method uses fStringBuffer2, anything in it
at the time of calling is lost.
scanPubidLiteral
protected boolean scanPubidLiteral(org.apache.xerces.xni.XMLString literal)
throws IOException,
org.apache.xerces.xni.XNIException
Scans public ID literal.
[12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'"
[13] PubidChar::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]
The returned string is normalized according to the following rule,
from http://www.w3.org/TR/REC-xml#dt-pubid:
Before a match is attempted, all strings of white space in the public
identifier must be normalized to single space characters (#x20), and
leading and trailing white space must be removed.
literal
- The string to fill in with the public ID literal.
- True on success.
Note: This method uses fStringBuffer, anything in it at
the time of calling is lost.
scanSurrogates
protected boolean scanSurrogates(XMLStringBuffer buf)
throws IOException,
org.apache.xerces.xni.XNIException
Scans surrogates and append them to the specified buffer.
Note: This assumes the current char has already been
identified as a high surrogate.
buf
- The StringBuffer to append the read surrogates to.
scanXMLDeclOrTextDecl
protected void scanXMLDeclOrTextDecl(boolean scanningTextDecl,
String[] pseudoAttributeValues)
throws IOException,
org.apache.xerces.xni.XNIException
Scans an XML or text declaration.
[23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
[24] VersionInfo ::= S 'version' Eq (' VersionNum ' | " VersionNum ")
[80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'" EncName "'" )
[81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')*
[32] SDDecl ::= S 'standalone' Eq (("'" ('yes' | 'no') "'")
| ('"' ('yes' | 'no') '"'))
[77] TextDecl ::= '<?xml' VersionInfo? EncodingDecl S? '?>'
scanningTextDecl
- True if a text declaration is to
be scanned instead of an XML
declaration.pseudoAttributeValues
- An array of size 3 to return the version,
encoding and standalone pseudo attribute values
(in that order).
Note: This method uses fString, anything in it
at the time of calling is lost.
setFeature
public void setFeature(String featureId,
boolean value)
throws org.apache.xerces.xni.parser.XMLConfigurationException
- setFeature in interface org.apache.xerces.xni.parser.XMLComponent
setProperty
public void setProperty(String propertyId,
Object value)
throws org.apache.xerces.xni.parser.XMLConfigurationException
Sets the value of a property during parsing.
- setProperty in interface org.apache.xerces.xni.parser.XMLComponent
startEntity
public void startEntity(String name,
org.apache.xerces.xni.XMLResourceIdentifier identifier,
String encoding,
org.apache.xerces.xni.Augmentations augs)
throws org.apache.xerces.xni.XNIException
This method notifies of the start of an entity. The document entity
has the pseudo-name of "[xml]" the DTD has the pseudo-name of "[dtd]"
parameter entity names start with '%'; and general entities are just
specified by their name.
name
- The name of the entity.identifier
- The resource identifier.encoding
- The auto-detected IANA encoding name of the entity
stream. This value will be null in those situations
where the entity encoding is not auto-detected (e.g.
internal entities or a document entity that is
parsed from a java.io.Reader).augs
- Additional information that may include infoset augmentations
org.apache.xerces.xni.XNIException
- Thrown by handler to signal an error.
versionSupported
protected boolean versionSupported(String version)