Computing (FOLDOC) dictionary
Jump to user comments
When a parser
cannot tell which alternative
E.g. given syntax
C -@# A | b c
A -@# d | b e
If you're parsing non-terminal C and the next token is 'b',
you don't know whether it's the first or second alternative of
C since they both can start with b.
To discover whether a grammar has a context clash:
For each non-terminal, N, with multiple alternatives, look at
the first symbol of each alternative's right-hand side, call
it s. If s is the empty string, then find the set FOLLOWER(N)
otherwise find the set FIRST*(s). If any of the sets for N's
alternatives intersect then there will be a context clash when
parsing N. If the next input symbol is one of those in the
intersection of two sets then you won't know which of the
FIRST(s) is the set of symbols with which s can start,
including s itself. If s is a non-terminal then FIRST(s) also
includes the first symbol of each alternative right-hand side
of FIRST which means keep applying FIRST to each element of
the result until the result doesn't change. I.e. start with
just the set R = s
, then for each non-terminal x in R, add
FIRST(x) to R. Keep doing this until nothing new is added.
(We are really only interested in the terminals in FIRST*(s)
but some definitions include the non-terminals).
FOLLOWER(N) is the set of symbols which can come after N in a
sentence. Find each occurrence of N on the right-hand side of
a rule, e.g.
M -@# ... | ... N ... | ...
If there is a symbol s immediately following N then add
FIRST*(s) to the result (again, we're only interested in the
terminal symbols in FIRST*(s)) if there is no symbol after N
in the alternative then add FOLLOWER(M) to the result (i.e. if
N can be the last symbol in an M then anything that can follow
M can also follow N).
If a grammar can generate the same sentence in multiple
different ways (with different parse tress) then it is
ambiguous. An ambiguity must start with a context clash (but
not all context clashes imply ambiguity). The context clash
occurs when trying to parse the first token of the phrase with
multiple parses - you will not be able to tell which
alternative to take. To see if a context clash is also a case
of ambiguity you would need to follow the alternatives
involved in each context clash to see if they can generate the
same complete sequence of tokens.