ParseTreePatternMatcher

A tree pattern matching mechanism for ANTLR {@link ParseTree}s.

Patterns are strings of source input text with special tags representing token or rule references such as:

<p>{@code <ID> = <expr>;}</p>

Given a pattern start rule such as {@code statement}, this object constructs a {@link ParseTree} with placeholders for the {@code ID} and {@code expr} subtree. Then the {@link #match} routines can compare an actual {@link ParseTree} from a parse with this pattern. Tag {@code <ID>} matches any {@code ID} token and tag {@code <expr>} references the result of the {@code expr} rule (generally an instance of {@code ExprContext}).

Pattern {@code x = 0;} is a similar pattern that matches the same pattern except that it requires the identifier to be {@code x} and the expression to be {@code 0}.

The {@link #matches} routines return {@code true} or {@code false} based upon a match for the tree rooted at the parameter sent in. The {@link #match} routines return a {@link ParseTreeMatch} object that contains the parse tree, the parse tree pattern, and a map from tag name to matched nodes (more below). A subtree that fails to match, returns with {@link ParseTreeMatch#mismatchedNode} set to the first tree node that did not match.

For efficiency, you can compile a tree pattern in string form to a {@link ParseTreePattern} object.

See {@code TestParseTreeMatcher} for lots of examples. {@link ParseTreePattern} has two static helper methods: {@link ParseTreePattern#findAll} and {@link ParseTreePattern#match} that are easy to use but not super efficient because they create new {@link ParseTreePatternMatcher} objects each time and have to compile the pattern in string form before using it.

The lexer and parser that you pass into the {@link ParseTreePatternMatcher} constructor are used to parse the pattern in string form. The lexer converts the {@code <ID> = <expr>;} into a sequence of four tokens (assuming lexer throws out whitespace or puts it on a hidden channel). Be aware that the input stream is reset for the lexer (but not the parser; a {@link ParserInterpreter} is created to parse the input.). Any user-defined fields you have put into the lexer might get changed when this mechanism asks it to scan the pattern string.

Normally a parser does not accept token {@code <expr>} as a valid {@code expr} but, from the parser passed in, we create a special version of the underlying grammar representation (an {@link ATN}) that allows imaginary tokens representing rules ({@code <expr>}) to match entire rules. We call these <em>bypass alternatives</em>.

Delimiters are {@code <} and {@code >}, with {@code \} as the escape string by default, but you can set them to whatever you want using {@link #setDelimiters}. You must escape both start and stop strings {@code \<} and {@code \>}.

Constructors

this
this(Lexer lexer, Parser parser)

Constructs a {@link ParseTreePatternMatcher} or from a {@link Lexer} and {@link Parser} object. The lexer input stream is altered for tokenizing the tree patterns. The parser is used as a convenient mechanism to get the grammar name, plus token, rule names.

Members

Functions

compile
ParseTreePattern compile(string pattern, int patternRuleIndex)

For repeated use of a tree pattern, compile it to a {@link ParseTreePattern} using this method.

getLexer
Lexer getLexer()

Used to convert the tree pattern string into a series of tokens. The input stream is reset.

getParser
Parser getParser()

Used to collect to the grammar file name, token names, rule names for used to parse the pattern into a parse tree.

getRuleTagToken
RuleTagToken getRuleTagToken(ParseTree t)
Undocumented in source. Be warned that the author may not have intended to support it.
match
ParseTreeMatch match(ParseTree tree, string pattern, int patternRuleIndex)

Compare {@code pattern} matched as rule {@code patternRuleIndex} against {@code tree} and return a {@link ParseTreeMatch} object that contains the matched elements, or the node at which the match failed.

match
ParseTreeMatch match(ParseTree tree, ParseTreePattern pattern)

Compare {@code pattern} matched against {@code tree} and return a {@link ParseTreeMatch} object that contains the matched elements, or thenode at which the match failed. Pass in a compiled pattern instead of a string representation of a tree pattern.

matchImpl
ParseTree matchImpl(ParseTree tree, ParseTree patternTree, ParseTree[][string] labels)

Recursively walk {@code tree} against {@code patternTree}, filling {@code match.}{@link ParseTreeMatch#labels labels}.

matches
bool matches(ParseTree tree, string pattern, int patternRuleIndex)

Does {@code pattern} matched as rule {@code patternRuleIndex} match {@code tree}?

matches
bool matches(ParseTree tree, ParseTreePattern pattern)

Does {@code pattern} matched as rule patternRuleIndex match tree? Pass in a compiled pattern instead of a string representation of a tree pattern.

setDelimiters
void setDelimiters(string start, string stop, string escapeLeft)

Set the delimiters used for marking rule and token tags within concrete syntax used by the tree pattern parser.

split
Chunk[] split(string pattern)

Split {@code <ID> = <e:expr> ;} into 4 chunks for tokenizing by {@link #tokenize}.

tokenize
Token[] tokenize(string pattern)
Undocumented in source. Be warned that the author may not have intended to support it.

Variables

escape
string escape;

e.g., \< and \> must escape BOTH!

start
string start;
Undocumented in source.
stop
string stop;
Undocumented in source.

Meta