ParserATNSimulator

The embodiment of the adaptive LL(*), ALL(*), parsing strategy.

<p> The basic complexity of the adaptive strategy makes it harder to understand. We begin with ATN simulation to build paths in a DFA. Subsequent prediction requests go through the DFA first. If they reach a state without an edge for the current symbol, the algorithm fails over to the ATN simulation to complete the DFA path for the current input (until it finds a conflict state or uniquely predicting state).</p>

<p> All of that is done without using the outer context because we want to create a DFA that is not dependent upon the rule invocation stack when we do a prediction. One DFA works in all contexts. We avoid using context not necessarily because it's slower, although it can be, but because of the DFA caching problem. The closure routine only considers the rule invocation stack created during prediction beginning in the decision rule. For example, if prediction occurs without invoking another rule's ATN, there are no context stacks in the configurations. When lack of context leads to a conflict, we don't know if it's an ambiguity or a weakness in the strong LL(*) parsing strategy (versus full LL(*)).</p>

<p> When SLL yields a configuration set with conflict, we rewind the input and retry the ATN simulation, this time using full outer context without adding to the DFA. Configuration context stacks will be the full invocation stacks from the start rule. If we get a conflict using full context, then we can definitively say we have a true ambiguity for that input sequence. If we don't get a conflict, it implies that the decision is sensitive to the outer context. (It is not context-sensitive in the sense of context-sensitive grammars.)</p>

<p> The next time we reach this DFA state with an SLL conflict, through DFA simulation, we will again retry the ATN simulation using full context mode. This is slow because we can't save the results and have to "interpret" the ATN each time we get that input.</p>

<p> <strong>CACHING FULL CONTEXT PREDICTIONS</strong></p>

<p> We could cache results from full context to predicted alternative easily and that saves a lot of time but doesn't work in presence of predicates. The set of visible predicates from the ATN start state changes depending on the context, because closure can fall off the end of a rule. I tried to cache tuples (stack context, semantic context, predicted alt) but it was slower than interpreting and much more complicated. Also required a huge amount of memory. The goal is not to create the world's fastest parser anyway. I'd like to keep this algorithm simple. By launching multiple threads, we can improve the speed of parsing across a large number of files.</p>

<p> There is no strict ordering between the amount of input used by SLL vs LL, which makes it really hard to build a cache for full context. Let's say that we have input A B C that leads to an SLL conflict with full context X. That implies that using X we might only use A B but we could also use A B C D to resolve conflict. Input A B C D could predict alternative 1 in one position in the input and A B C E could predict alternative 2 in another position in input. The conflicting SLL configurations could still be non-unique in the full context prediction, which would lead us to requiring more input than the original A B C. To make a prediction cache work, we have to track the exact input used during the previous prediction. That amounts to a cache that maps X to a specific DFA for that context.</p>

<p> Something should be done for left-recursive expression predictions. They are likely LL(1) + pred eval. Easier to do the whole SLL unless error and retry with full LL thing Sam does.</p>

<p> <strong>AVOIDING FULL CONTEXT PREDICTION</strong></p>

<p> We avoid doing full context retry when the outer context is empty, we did not dip into the outer context by falling off the end of the decision state rule, or when we force SLL mode.</p>

<p> As an example of the not dip into outer context case, consider as super constructor calls versus function calls. One grammar might look like this:</p>

<pre> ctorBody : '{' superCall? stat* '}' ; </pre>

<p> Or, you might see something like</p>

<pre> stat : superCall ';' | expression ';' | ... ; </pre>

<p> In both cases I believe that no closure operations will dip into the outer context. In the first case ctorBody in the worst case will stop at the '}'. In the 2nd case it should stop at the ';'. Both cases should stay within the entry rule and not dip into the outer context.</p>

<p> <strong>PREDICATES</strong></p>

<p> Predicates are always evaluated if present in either SLL or LL both. SLL and LL simulation deals with predicates differently. SLL collects predicates as it performs closure operations like ANTLR v3 did. It delays predicate evaluation until it reaches and accept state. This allows us to cache the SLL ATN simulation whereas, if we had evaluated predicates on-the-fly during closure, the DFA state configuration sets would be different and we couldn't build up a suitable DFA.</p>

<p> When building a DFA accept state during ATN simulation, we evaluate any predicates and return the sole semantically valid alternative. If there is more than 1 alternative, we report an ambiguity. If there are 0 alternatives, we throw an exception. Alternatives without predicates act like they have true predicates. The simple way to think about it is to strip away all alternatives with false predicates and choose the minimum alternative that remains.</p>

<p> When we start in the DFA and reach an accept state that's predicated, we test those and return the minimum semantically viable alternative. If no alternatives are viable, we throw an exception.</p>

<p> During full LL ATN simulation, closure always evaluates predicates and on-the-fly. This is crucial to reducing the configuration set size during closure. It hits a landmine when parsing with the Java grammar, for example, without this on-the-fly evaluation.</p>

<p> <strong>SHARING DFA</strong></p>

<p> All instances of the same parser share the same decision DFAs through a static field. Each instance gets its own ATN simulator but they share the same {@link #decisionToDFA} field. They also share a {@link PredictionContextCache} object that makes sure that all {@link PredictionContext} objects are shared among the DFA states. This makes a big size difference.</p>

<p> <strong>THREAD SAFETY</strong></p>

<p> The {@link ParserATNSimulator} locks on the {@link #decisionToDFA} field when it adds a new DFA object to that array. {@link #addDFAEdge} locks on the DFA for the current decision when setting the {@link DFAState#edges} field. {@link #addDFAState} locks on the DFA for the current decision when looking up a DFA state to see if it already exists. We must make sure that all requests to add DFA states that are equivalent result in the same shared DFA object. This is because lots of threads will be trying to update the DFA at once. The {@link #addDFAState} method also locks inside the DFA lock but this time on the shared context cache when it rebuilds the configurations' {@link PredictionContext} objects using cached subgraphs/nodes. No other locking occurs, even during DFA simulation. This is safe as long as we can guarantee that all threads referencing {@code s.edget} get the same physical target {@link DFAState}, or {@code null}. Once into the DFA, the DFA simulation does not reference the {@link DFA#states} map. It follows the {@link DFAState#edges} field to new targets. The DFA simulator will either find {@link DFAState#edges} to be {@code null}, to be non-{@code null} and {@code dfa.edgest} null, or {@code dfa.edgest} to be non-null. The {@link #addDFAEdge} method could be racing to set the field but in either case the DFA simulator works; if {@code null}, and requests ATN simulation. It could also race trying to get {@code dfa.edgest}, but either way it will work because it's not doing a test and set operation.</p>

<p> <strong>Starting with SLL then failing to combined SLL/LL (Two-Stage Parsing)</strong></p>

<p> Sam pointed out that if SLL does not give a syntax error, then there is no point in doing full LL, which is slower. We only have to try LL if we get a syntax error. For maximum speed, Sam starts the parser set to pure SLL mode with the {@link BailErrorStrategy}:</p>

<pre> parser.{@link Parser#getInterpreter() getInterpreter()}.{@link #setPredictionMode setPredictionMode}{@code (}{@link PredictionMode#SLL}{@code )}; parser.{@link Parser#setErrorHandler setErrorHandler}(new {@link BailErrorStrategy}()); </pre>

<p> If it does not get a syntax error, then we're done. If it does get a syntax error, we need to retry with the combined SLL/LL strategy.</p>

<p> The reason this works is as follows. If there are no SLL conflicts, then the grammar is SLL (at least for that input set). If there is an SLL conflict, the full LL analysis must yield a set of viable alternatives which is a subset of the alternatives reported by SLL. If the LL set is a singleton, then the grammar is LL but not SLL. If the LL set is the same size as the SLL set, the decision is SLL. If the LL set has size &gt; 1, then that decision is truly ambiguous on the current input. If the LL set is smaller, then the SLL conflict resolution might choose an alternative that the full LL would rule out as a possibility based upon better context information. If that's the case, then the SLL parse will definitely get an error because the full LL analysis says it's not viable. If SLL conflict resolution chooses an alternative within the LL set, them both SLL and LL would choose the same alternative because they both choose the minimum of multiple conflicting alternatives.</p>

<p> Let's say we have a set of SLL conflicting alternatives {@code {1, 2, 3}} and a smaller LL set called <em>s</em>. If <em>s</em> is {@code {2, 3}}, then SLL parsing will get an error because SLL will pursue alternative 1. If <em>s</em> is {@code {1, 2}} or {@code {1, 3}} then both SLL and LL will choose the same alternative because alternative one is the minimum of either set. If <em>s</em> is {@code {2}} or {@code {3}} then SLL will get a syntax error. If <em>s</em> is {@code {1}} then SLL will succeed.</p>

<p> Of course, if the input is invalid, then we will get an error for sure in both SLL and LL parsing. Erroneous input will therefore require 2 passes over the input.</p>

class ParserATNSimulator : ATNSimulator , InterfaceParserATNSimulator {}

Constructors

this
this(ATN atn, DFA[] decisionToDFA, PredictionContextCache sharedContextCache)

@uml Testing only!

this
this(Parser parser, ATN atn, DFA[] decisionToDFA, PredictionContextCache sharedContextCache)
Undocumented in source.

Members

Functions

actionTransition
ATNConfig actionTransition(ATNConfig config, ActionTransition t)
Undocumented in source. Be warned that the author may not have intended to support it.
adaptivePredict
int adaptivePredict(TokenStream input, int decision, ParserRuleContext outerContext)
Undocumented in source. Be warned that the author may not have intended to support it.
addDFAEdge
DFAState addDFAEdge(DFA dfa, DFAState from, size_t t, DFAState to)

Add an edge to the DFA, if possible. This method calls {@link #addDFAState} to ensure the {@code to} state is present in the DFA. If {@code from} is {@code null}, or if {@code t} is outside the range of edges that can be represented in the DFA tables, this method returns without adding the edge to the DFA.

addDFAState
DFAState addDFAState(DFA dfa, DFAState D)

Add state {@code D} to the DFA if it is not already present, and return the actual instance stored in the DFA. If a state equivalent to {@code D} is already in the DFA, the existing state is returned. Otherwise this method returns {@code D} after adding it to the DFA.

applyPrecedenceFilter
ATNConfigSet applyPrecedenceFilter(ATNConfigSet configs)
Undocumented in source. Be warned that the author may not have intended to support it.
canDropLoopEntryEdgeInLeftRecursiveRule
bool canDropLoopEntryEdgeInLeftRecursiveRule(ATNConfig config)

TODO implementation missing

clearDFA
void clearDFA()

@uml @override

closureATN
void closureATN(ATNConfig config, ATNConfigSet configs, ATNConfig[] closureBusy, bool collectPredicates, bool fullCtx, bool treatEofAsEpsilon)
Undocumented in source. Be warned that the author may not have intended to support it.
closureCheckingStopState
void closureCheckingStopState(ATNConfig config, ATNConfigSet configs, ATNConfig[] closureBusy, bool collectPredicates, bool fullCtx, int depth, bool treatEofAsEpsilon)
Undocumented in source. Be warned that the author may not have intended to support it.
closure_
void closure_(ATNConfig config, ATNConfigSet configs, ATNConfig[] closureBusy, bool collectPredicates, bool fullCtx, int depth, bool treatEofAsEpsilon)

Do the actual work of walking epsilon edges

computeReachSet
ATNConfigSet computeReachSet(ATNConfigSet closure, int t, bool fullCtx)
Undocumented in source. Be warned that the author may not have intended to support it.
computeStartState
ATNConfigSet computeStartState(ATNState p, RuleContext ctx, bool fullCtx)
Undocumented in source. Be warned that the author may not have intended to support it.
computeTargetState
DFAState computeTargetState(DFA dfa, DFAState previousD, int t)

Compute a target state for an edge in the DFA, and attempt to add the computed state and corresponding edge to the DFA.

dumpDeadEndConfigs
void dumpDeadEndConfigs(NoViableAltException nvae)

Used for debugging in adaptivePredict around execATN but I cut it out for clarity now that alg. works well. We can leave this "dead" code for a bit.

evalSemanticContext
BitSet evalSemanticContext(PredPrediction[] predPredictions, ParserRuleContext outerContext, bool complete)

Look through a list of predicate/alt pairs, returning alts for the pairs that win. A {@code NONE} predicate indicates an alt containing an unpredicated config which behaves as "always true." If !complete then we stop at the first predicate that evaluates to true. This includes pairs with null predicates.

evalSemanticContext
bool evalSemanticContext(SemanticContext pred, ParserRuleContext parserCallStack, int alt, bool fullCtx)
Undocumented in source. Be warned that the author may not have intended to support it.
execATN
int execATN(DFA dfa, DFAState s0, TokenStream input, size_t startIndex, ParserRuleContext outerContext)

There are some key conditions we're looking for after computing a new set of ATN configs (proposed DFA state): <br>- if the set is empty, there is no viable alternative for current symbol <br>- does the state uniquely predict an alternative? <br>- does the state have a conflict that would prevent us from putting it on the work list? <br><br>We also have some key operations to do: <br>- add an edge from previous DFA state to potentially new DFA state, D, upon current symbol but only if adding to work list, which means in all cases except no viable alternative (and possibly non-greedy decisions?) <br>- collecting predicates and adding semantic context to DFA accept states <br>- adding rule context to context-sensitive DFA accept states <br>- consuming an input symbol <br>- reporting a conflict <br>- reporting an ambiguity <br>- reporting a context sensitivity <br>- reporting insufficient predicates <br><br>cover these cases: <br>- dead end <br>- single alt <br>- single alt + preds <br>- conflict <br>- conflict + preds

execATNWithFullContext
int execATNWithFullContext(DFA dfa, DFAState D, ATNConfigSet s0, TokenStream input, size_t startIndex, ParserRuleContext outerContext)

@uml comes back with reach.uniqueAlt set to a valid alt

getAltThatFinishedDecisionEntryRule
int getAltThatFinishedDecisionEntryRule(ATNConfigSet configs)
Undocumented in source. Be warned that the author may not have intended to support it.
getConflictingAlts
BitSet getConflictingAlts(ATNConfigSet configs)

Gets a {@link BitSet} containing the alternatives in {@code configs} which are part of one or more conflicting alternative subsets.

getConflictingAltsOrUniqueAlt
BitSet getConflictingAltsOrUniqueAlt(ATNConfigSet configs)

Sam pointed out a problem with the previous definition, v3, of ambiguous states. If we have another state associated with conflicting alternatives, we should keep going. For example, the following grammar

getEpsilonTarget
ATNConfig getEpsilonTarget(ATNConfig config, Transition t, bool collectPredicates, bool inContext, bool fullCtx, bool treatEofAsEpsilon)
Undocumented in source. Be warned that the author may not have intended to support it.
getExistingTargetState
DFAState getExistingTargetState(DFAState previousD, int t)

Get an existing target state for an edge in the DFA. If the target state for the edge has not yet been computed or is otherwise not available, this method returns {@code null}.

getLookaheadName
string getLookaheadName(TokenStream input)
Undocumented in source. Be warned that the author may not have intended to support it.
getParser
Parser getParser()
Undocumented in source. Be warned that the author may not have intended to support it.
getPredicatePredictions
PredPrediction[] getPredicatePredictions(BitSet ambigAlts, SemanticContext[] altToPred)
Undocumented in source. Be warned that the author may not have intended to support it.
getPredictionMode
PredictionModeConst getPredictionMode()
Undocumented in source. Be warned that the author may not have intended to support it.
getPredsForAmbigAlts
SemanticContext[] getPredsForAmbigAlts(BitSet ambigAlts, ATNConfigSet configs, int nalts)
Undocumented in source. Be warned that the author may not have intended to support it.
getReachableTarget
ATNState getReachableTarget(Transition trans, int ttype)
Undocumented in source. Be warned that the author may not have intended to support it.
getRuleName
string getRuleName(int index)
Undocumented in source. Be warned that the author may not have intended to support it.
getSynValidOrSemInvalidAltThatFinishedDecisionEntryRule
int getSynValidOrSemInvalidAltThatFinishedDecisionEntryRule(ATNConfigSet configs, ParserRuleContext outerContext)

This method is used to improve the localization of error messages by choosing an alternative rather than throwing a {@link NoViableAltException} in particular prediction scenarios where the {@link #ERROR} state was reached during ATN simulation.

getTokenName
string getTokenName(int t)
Undocumented in source. Be warned that the author may not have intended to support it.
noViableAlt
NoViableAltException noViableAlt(TokenStream input, ParserRuleContext outerContext, ATNConfigSet configs, size_t startIndex)
Undocumented in source. Be warned that the author may not have intended to support it.
precedenceTransition
ATNConfig precedenceTransition(ATNConfig config, PrecedencePredicateTransition pt, bool collectPredicates, bool inContext, bool fullCtx)
Undocumented in source. Be warned that the author may not have intended to support it.
predTransition
ATNConfig predTransition(ATNConfig config, PredicateTransition pt, bool collectPredicates, bool inContext, bool fullCtx)
Undocumented in source. Be warned that the author may not have intended to support it.
predicateDFAState
void predicateDFAState(DFAState dfaState, DecisionState decisionState)
Undocumented in source. Be warned that the author may not have intended to support it.
removeAllConfigsNotInRuleStopState
ATNConfigSet removeAllConfigsNotInRuleStopState(ATNConfigSet configs, bool lookToEndOfRule)

Return a configuration set containing only the configurations from {@code configs} which are in a {@link RuleStopState}. If all configurations in {@code configs} are already in a rule stop state, this method simply returns {@code configs}.

reportAmbiguity
void reportAmbiguity(DFA dfa, DFAState D, size_t startIndex, size_t stopIndex, bool exact, BitSet ambigAlts, ATNConfigSet configs)
Undocumented in source. Be warned that the author may not have intended to support it.
reportAttemptingFullContext
void reportAttemptingFullContext(DFA dfa, BitSet conflictingAlts, ATNConfigSet configs, size_t startIndex, size_t stopIndex)
Undocumented in source. Be warned that the author may not have intended to support it.
reportContextSensitivity
void reportContextSensitivity(DFA dfa, int prediction, ATNConfigSet configs, size_t startIndex, size_t stopIndex)
Undocumented in source. Be warned that the author may not have intended to support it.
reset
void reset()

@uml @override

ruleTransition
ATNConfig ruleTransition(ATNConfig config, RuleTransition t)
Undocumented in source. Be warned that the author may not have intended to support it.
setPredictionMode
void setPredictionMode(PredictionModeConst mode)
Undocumented in source. Be warned that the author may not have intended to support it.
splitAccordingToSemanticValidity
ATNConfigSetATNConfigSetPair splitAccordingToSemanticValidity(ATNConfigSet configs, ParserRuleContext outerContext)
Undocumented in source. Be warned that the author may not have intended to support it.

Static functions

getUniqueAlt
int getUniqueAlt(ATNConfigSet configs)
Undocumented in source. Be warned that the author may not have intended to support it.

Static variables

mergeCache
DoubleKeyMap!(PredictionContext, PredictionContext, PredictionContext) mergeCache;

Each prediction operation uses a cache for merge of prediction contexts. Don't keep around as it wastes huge amounts of memory. DoubleKeyMap isn't synchronized but we're ok since two threads shouldn't reuse same parser/atnsim object because it can only handle one input at a time. This maps graphs a and b to merged result c. (a,b)&rarr;c. We can avoid the merge if we ever see a and b again. Note that (b,a)&rarr;c should also be examined during cache lookup. @uml @__gshared

Variables

_dfa
DFA _dfa;
Undocumented in source.
_input
TokenStream _input;
Undocumented in source.
_outerContext
ParserRuleContext _outerContext;
Undocumented in source.
_startIndex
size_t _startIndex;
Undocumented in source.
decisionToDFA
DFA[] decisionToDFA;
Undocumented in source.
mode
PredictionModeConst mode;
Undocumented in source.
parser
Parser parser;
Undocumented in source.

Inherited Members

From ATNSimulator

SERIALIZED_VERSION
int SERIALIZED_VERSION;
Undocumented in source.
SERIALIZED_UUID
UUID SERIALIZED_UUID;

This is the current serialized UUID. deprecated Use {@link ATNDeserializer#checkCondition(boolean)} instead.

ERROR
DFAState ERROR;

Must distinguish between missing edge and edge we know leads nowhere

atn
ATN atn;
Undocumented in source.
sharedContextCache
PredictionContextCache sharedContextCache;

The context cache maps all PredictionContext objects that are equals() to a single cached copy. This cache is shared across all contexts in all ATNConfigs in all DFA states. We rebuild each ATNConfigSet to use only cached nodes/graphs in addDFAState(). We don't want to fill this during closure() since there are lots of contexts that pop up but are not used ever again. It also greatly slows down closure().

reset
void reset()
Undocumented in source.
clearDFA
void clearDFA()

Clear the DFA cache used by the current instance. Since the DFA cache may be shared by multiple ATN simulators, this method may affect the performance (but not accuracy) of other parsers which are being used concurrently.

getSharedContextCache
PredictionContextCache getSharedContextCache()
Undocumented in source. Be warned that the author may not have intended to support it.
getCachedContext
PredictionContext getCachedContext(PredictionContext context)
Undocumented in source. Be warned that the author may not have intended to support it.

Meta