# Lexical Analysis

This is the first phase of a compiler and this reads the input characters from left to right and creates a sequence of tokens. These tokens are used by the parser for syntax analysis.

The schematic representation is given below,

Table: Showing Token, Patterns and Lexeme

Regular Expression

Regular expression can be called as a notation which is used in describing all the languages that can be built from these operators applied to symbol of some alphabet.

The two rules which form the basics are,

- if Îµ is a regular expression , then L(Îµ) is {Îµ} where Îµ is the only sole member in empty string.
- If
*a*is a symbol Îµ, then regular expression L(a) = {a} that is the language with one string, length one and position one.

**Regular Definitions**

The regular definition is a type of regular expression used for notational convenience. If Îµ is an alphabet of basic symbols, then a regular definition is a sequence of the form,

d1 â†’ r1

d2 â†’ r2

â€¦

dn â†’ rn

where, each *di* is a new symbol, not in Îµ and not same as any other of the dâ€™s and

each *ri* is the regular expression over the alphabet Îµ U {d1,d2,d3â€¦â€¦di-1}

Recognition of tokens

The piece of code which is given as the input to lexical analyzer matches for the pattern and returns the respective tokens. For example,

*
Stmt *â†’ if expr then stmt

| Elsestmt

| Îµ

Expr â†’ term relop term

| term

Term â†’ id

| number

In this the terms if, then, else, relop, id and number are the names of the tokens as the lexical analyzer is concerned and these are the reserved words.

# Transition diagrams

Transition diagrams have a collection of nodes, edges, where nodes are called stated.

Finite automata

Finite automata can be further divided into,

- Nondeterministic finite automata (NFA) can have multiple paths and have Îµ labeled edges. These edges are called Îµ - transitions. Some of the characters can label 2 or more edges out of the same state. Example,
- Deterministic finite automata(DFA) , no edges are labeled with Îµ and each character can label at most one edge out of same state. DFA has one unique path.