#Pitfalls ##Left Recursion An weakness shared by all recursive descent parsers is the inability to parse left-recursive rules. Consider the following rule: rule left_recursive left_recursive 'a' / 'a' end Logically it should match a list of 'a' characters. But it never consumes anything, because attempting to recognize `left_recursive` begins by attempting to recognize `left_recursive`, and so goes an infinite recursion. There's always a way to eliminate these types of structures from your grammar. There's a mechanistic transformation called _left factorization_ that can eliminate it, but it isn't always pretty, especially in combination with automatically constructed syntax trees. So far, I have found more thoughtful ways around the problem. For instance, in the interpreter example I interpret inherently left-recursive function application right recursively in syntax, then correct the directionality in my semantic interpretation. You may have to be clever. #Advanced Techniques Here are a few interesting problems I've encountered. I figure sharing them may give you insight into how these types of issues are addressed with the tools of parsing expressions. ##Matching a String rule string '"' ('\"' / !'"' .)* '"' end This expression says: Match a quote, then zero or more of, an escaped quote or any character but a quote, followed by a quote. Lookahead assertions are essential for these types of problems. ##Matching Nested Structures With Non-Unique Delimeters Say I want to parse a diabolical wiki syntax in which the following interpretations apply. ** *hello* ** --> hello * **hello** * --> hello rule strong '**' (em / !'*' . / '\*')+ '**' end rule em '**' (strong / !'*' . / '\*')+ '**' end Emphasized text is allowed within strong text by virtue of `em` being the first alternative. Since `em` will only successfully parse if a matching `*` is found, it is permitted, but other than that, no `*` characters are allowed unless they are escaped. ##Matching a Keyword But Not Words Prefixed Therewith Say I want to consider a given string a characters only when it occurs in isolation. Lets use the `end` keyword as an example. We don't want the prefix of `'enders_game'` to be considered a keyword. A naiive implementation might be the following. rule end_keyword 'end' &space end This says that `'end'` must be followed by a space, but this space is not consumed as part of the matching of `keyword`. This works in most cases, but is actually incorrect. What if `end` occurs at the end of the buffer? In that case, it occurs in isolation but will not match the above expression. What we really mean is that `'end'` cannot be followed by a _non-space_ character. rule end_keyword 'end' !(!' ' .) end In general, when the syntax gets tough, it helps to focus on what you really mean. A keyword is a character not followed by another character that isn't a space.