# Syntax Analysis or Parsing

Syntax Analysis is the second phase of the compiler after Lexical Analysis. It is also called as hierarchical analysis or Parsing.

Syntax Error Handling

The different type of errors which comes up during programming can be categorized into

– Lexical, like wrong spellings in an identifier, keyword, or operator

– Syntactic, for example arithmetic expression with unbalanced parenthesis

– Semantic, such as an operator applied to wrong operands

– Logical, such as infinity recursive call.

The error detection and recovery in a compiler happens around the syntax analysis phase.

The error handler in the parser has simple goals like,

– It should report the error clearly and accurately

– It should recover from one error quickly so that it is able to detect subsequent errors.

– It should not slow down the processing of correct programs.

Top down Parsing

In top down**,** a parse tree is constructed for an input string from root which means parse tree has been built in preorder(depth first). This also involves finding the left most derivation and at each step of a top down parse, the production to be applied is determined. It matches the terminal symbols in production body with input string. The top down Parsing further includes recursive descent parsing and backtracking, Recursive descent parsing, constructs a parse tree for the input string starting from roots and nodes are created in preorder.

**First and Follow**

First : For first(α) where α is any string of grammar symbols, to be the set of terminals that begin strings that are derived from α . If α → ε , then ε is also in FIRST (α).

Follow (A), For non-terminal A, to be the set of terminals α that can appear immediately to the right of A in some sentential form. i.e the set of terminals α such that there exists a derivation of the form S→ α Aaβ.

Consider grammar |
First |
Follow |

First (E) = First (T) |
||

= First (E’) |
||

E → TE’ | {(, id} | {) $} |

E’ → +TE’/ε | {+ ε} | {) $} |

T → FT’ | {(id} | {+) $} |

T’ → *FT’/ε | {*ε} | {+) $} |

F → (E)/id | {(id} | {*+) $} |

Example first and follow :

# Predictive Parser: Generalities

- In many cases, by carefully writing a grammar–eliminating left recursion from it and left factoring the resulting grammar–we can obtain a grammar that can be parsed by a recursive-descent parser that needs no backtracking.

Such parsers are called predictive parsers.

**Left Recursive Grammars I**

- A grammar is left recursive if it has a nonterminal A such that there is a derivation A → Aα, for some string α
- A left-recursive rule such as A → A α| β can be eliminated by replacing it by:

– A → β R where R is a new non-terminal

– R → α R | ε and ε is the empty string

Left-Recursive Grammars II

- The general procedure for removing direct left recursion–recursion that occurs in one rule–is the following:

– Group the A-rules as

A → Aα 1 |… | Aα m | β1 | β2 |…| βn

where none of the β’s begins with A

– Replace the original A-rules with

» A → β1A’ | β2 A’ | … | βn A’

» A’ → α 1 A’ | α 2 A’ | … | α m A’

Left-Recursive Grammars III

Here is an example of a (directly) left-recursive grammar

E → E + T | T

T → T * F | F

F → (E) | id

This grammar can be re-written as the following non left-recursive grammar:

E → T E’ E’ → + TE’ | ε

T → F T’ T’ → * FT’ | ε

F → (E) | id

Left-Factoring a Grammar I

- Left Recursion is not the only trait that disallows top-down parsing.
- Another is whether the parser can always choose the correct Right Hand Side on the basis of the next token of input, using only the first token generated by the leftmost nonterminal in the current derivation.

**Left-Factoring a Grammar II**

Here is the procedure used to left-factor a grammar:

- For each non-terminal A, find the longest prefix α common to two or more of its alternatives.
- Replace all the A productions:

A → α β1 | α β2 … | α βn | λ

(where λ represents all alternatives that do not begin with α )

Left-Factoring a Grammar III

- Here is an example of a common grammar that needs left factoring:

S → iEtS | iEtSeS | a

E → b

( i stands for “if”; t stands for “then”; and e stands for “else”)

Left factored, this grammar becomes

S → iEtSS’ | a

S’ → eS | ε

E → b

# LL(1) Grammar

- A grammar whose passing table has no multiply-defined entries is said to be LL(1).

**Parsing table**

**Constructing the Parsing Table I: First and Follow**

- First(α) is the set of terminals that begin the strings derived from α. Follow(A) is the set of terminals a that can appear to the right of A. First and Follow are used in the construction of the parsing table.
- Computing First:

– If X is a terminal, then First(X) is {X}

– If X → ε is a production, then add ? to First(X)

– If X is a non-terminal and X → Y1 Y2 … Yk is a production, then place a in First(X) if for some i, a is in First(Yi) and ? is in all of First(Y1)…First(Yi-1)

**Constructing the Parsing Table II: First and Follow**

- Place $ in Follow(S), where S is the start symbol and $ is the input right endmarker.
- If there is a production A →α Bβ, then everything in First (β) except for ε is placed in Follow (B).
- If there is a production A →α B, or a production A →α Bβ where First(β) contains ε, then everything in Follow(A) is in Follow(B)

**Example:** E → TE’ E’ → +TE’ | ε

T → FT’ T’ → *FT’ | ε

F → (E) | id

First(E) = First(T) = First(F) = {(, id} First(E’) = {+, ε} First(T’) = {*, ε}

Follow(E) = Follow(E’) = {),$} Follow(F)={+,*,),$} Follow(T) = Follow(T’) = {+,),$}

** **

**Constructing the Parsing Table III**

Algorithm for constructing a predictive parsing table:

- For each production A → α of the grammar, do steps 2 and 3
- For each terminal a in First(α), add A → α to M[A, a]
- If ε is in First(α), add A → α to M[A, b] for each terminal b in Follow(A). If ε is in First(α), add A → α to M[A,b] for each terminal b in Follow(A). If ε is in First(α) and $ is in Follow(A), add A → α to M[A, $].
- Make each undefined entry of M be an error.

# Bottom up Parsing

It gives a string of terminals. In this method, the parse tree is built from leaves and working up towards the root. This is the reverse of right most derivation and used for type of grammar called LR. LR parsers are difficult to build by hand, hence we use automatic parser generators for LR grammars.

hift reduce Parsing

This is a type of bottom up parsing. It consists of stacks which hold grammar symbols,

Actions in shift reduce Parsesr → shift, reduce, accept, error.

For grammar E → E + T|F

T → T *F|F

F → (ε)|id

Stack INPUT Action

- $ $id*id
_{2}$ Shift - $id *id
^{1}_{2}$ reduce by F → id - $F *id
_{2}$ reduce by T → F - $T *id
_{2}$ shift - $T* *id
_{2}$ shift - $T*id
_{2}$ reduce by F → id - $T*F $ reduce by T → T*F
- *T $ reduce by E → T
- $ E $ Accept