The MatchActionProcessor class provides AWK-like line by line filtering
of a text stream, pattern action pair association, and field splitting
based on a registered separator. However, the class can be used with
any compatible PatternMatcher/PatternCompiler implementations and
need not use the AWK matching classes in org.apache.oro.text.awk. In fact,
the default matcher and compiler used by the class are Perl5Matcher and
Perl5Compiler from org.apache.oro.text.regex.
To completely understand how to use MatchActionProcessor, you should first
look at
MatchAction
and
MatchActionInfo
.
A MatchActionProcessor is first initialized with
the desired PatternCompiler and PatternMatcher instances to use to compile
patterns and perform matches. Then, optionally, a field separator may
be registered with
setFieldSeparator()
Finally, as many pattern action pairs as desired are registerd with
addAction()
before processing the input
with
processMatches()
. Pattern action
pairs are processed in the order they were registered.
The look of added actions can closely mirror that of AWK when anonymous
classes are used. Here's an example of how you might use
MatchActionProcessor to extract only the second column of a semicolon
delimited file:
import java.io.*;
import org.apache.oro.text.*;
import org.apache.oro.text.regex.*;
public final class semicolon {
public static final void main(String[] args) {
MatchActionProcessor processor = new MatchActionProcessor();
try {
processor.setFieldSeparator(";");
// Using a null pattern means to perform the action for every line.
processor.addAction(null, new MatchAction() {
public void processMatch(MatchActionInfo info) {
// We assume the second column exists
info.output.println(info.fields.elementAt(1));
}
});
} catch(MalformedPatternException e) {
e.printStackTrace();
System.exit(1);
}
try {
processor.processMatches(System.in, System.out);
} catch(IOException e) {
e.printStackTrace();
System.exit(1);
}
}
}
You can redirect the following sample input to stdin to test the code:
1;Trenton;New Jersey
2;Annapolis;Maryland
3;Austin;Texas
4;Richmond;Virginia
5;Harrisburg;Pennsylvania
6;Honolulu;Hawaii
7;Santa Fe;New Mexico
addAction
public void addAction(String pattern)
throws MalformedPatternException
Binds a patten to the default action. The default action is to simply
print the matched line to the output. If a pattern is null, the action
is performed for every line of input.
pattern
- The pattern to bind to an action.
MalformedPatternException
- If the pattern cannot be compiled.
addAction
public void addAction(String pattern,
int options)
throws MalformedPatternException
Binds a patten to the default action, providing options to be
used to compile the pattern. The default action is to simply print
the matched line to the output. If a pattern is null, the action
is performed for every line of input.
pattern
- The pattern to bind to an action.options
- The compilation options to use for the pattern.
MalformedPatternException
- If the pattern cannot be compiled.
addAction
public void addAction(String pattern,
int options,
MatchAction action)
throws MalformedPatternException
Registers a pattern action pair, providing options to be used to
compile the pattern. If a pattern is null, the action
is performed for every line of input.
pattern
- The pattern to bind to an action.options
- The compilation options to use for the pattern.action
- The action to associate with the pattern.
MalformedPatternException
- If the pattern cannot be compiled.
addAction
public void addAction(String pattern,
MatchAction action)
throws MalformedPatternException
Registers a pattern action pair. If a pattern is null, the action
is performed for every line of input.
pattern
- The pattern to bind to an action.action
- The action to associate with the pattern.
MalformedPatternException
- If the pattern cannot be compiled.
processMatches
public void processMatches(InputStream input,
OutputStream output)
throws IOException
This method reads the provided input one line at a time using the
platform standart character encoding and for every registered
pattern that is contained in the line it executes the associated
MatchAction's processMatch() method. If a field separator has been
defined with
setFieldSeparator()
, the
fields member of the MatchActionInfo instance passed to the
processMatch() method is set to a Vector of Strings containing
the split fields of the line. Otherwise the fields member is set
to null. If no match was performed to invoke the action (i.e.,
a null pattern was registered), then the match member is set
to null. Otherwise, the match member will contain the result of
the match.
The input stream, having been exhausted, is closed right before the
method terminates and the output stream is flushed.
input
- The input stream from which to read lines.output
- Where to send output.
MatchActionInfo
processMatches
public void processMatches(InputStream input,
OutputStream output,
String encoding)
throws IOException
This method reads the provided input one line at a time and for
every registered pattern that is contained in the line it executes
the associated MatchAction's processMatch() method. If a field
separator has been defined with
setFieldSeparator()
, the
fields member of the MatchActionInfo instance passed to the
processMatch() method is set to a Vector of Strings containing
the split fields of the line. Otherwise the fields member is set
to null. If no match was performed to invoke the action (i.e.,
a null pattern was registered), then the match member is set
to null. Otherwise, the match member will contain the result of
the match.
The input stream, having been exhausted, is closed right before the
method terminates and the output stream is flushed.
input
- The input stream from which to read lines.output
- Where to send output.encoding
- The character encoding of the InputStream source.
If you also want to define an output character encoding,
you should use processMatches(Reader,Writer)
and specify the encodings when creating the Reader and
Writer sources and sinks.
MatchActionInfo
processMatches
public void processMatches(Reader input,
Writer output)
throws IOException
This method reads the provided input one line at a time and for
every registered pattern that is contained in the line it executes
the associated MatchAction's processMatch() method. If a field
separator has been defined with
setFieldSeparator()
, the
fields member of the MatchActionInfo instance passed to the
processMatch() method is set to a Vector of Strings containing
the split fields of the line. Otherwise the fields member is set
to null. If no match was performed to invoke the action (i.e.,
a null pattern was registered), then the match member is set
to null. Otherwise, the match member will contain the result of
the match.
The input stream, having been exhausted, is closed right before the
method terminates and the output stream is flushed.
input
- The input stream from which to read lines.output
- Where to send output.
MatchActionInfo
setFieldSeparator
public void setFieldSeparator(String separator)
throws MalformedPatternException
Sets the field separator to use when splitting a line into fields.
If the field separator is never set, or set to null, matched input
lines are not split into fields.
separator
- A regular expression defining the field separator.
MalformedPatternException
- If the separator cannot be compiled.
setFieldSeparator
public void setFieldSeparator(String separator,
int options)
throws MalformedPatternException
Sets the field separator to use when splitting a line into fields.
If the field separator is never set, or set to null, matched input
lines are not split into fields.
separator
- A regular expression defining the field separator.options
- The options to use when compiling the separator.
MalformedPatternException
- If the separator cannot be compiled.