Stages of compiler design

12.5.1.3 describe program compilation stages: lexical and syntactic analysis, code generation, and optimization​

Stages of compiler design

When a programmer uses a computer language (high level) to write a program the statements are called source code.

Source code is a human-readable text written in a specific programming language.

The compiler translates source code into low-level language. The compiled code is saved as an object file.

Stages in the compilation of a program:

Lexical analysis

Lexical analysis is the process of analyzing a stream of individual characters (normally arranged as lines), into a sequence of lexical tokens (tokenization of words and symbols) to feed into the parser that the compiler will understand. 

Source code is written by programmers using ASCII characters. During lexical analysis, the compiler breaks down this stream of ASCII characters into its parts, called "lexemes".

A lexeme is the smallest unit of language. Lexemes cannot be broken down further without losing meaning.

Each lexeme defines a token corresponding to it.
Tokens:

Tokens

Keywords

Reserved words that have specific meanings in a programming language.
Examples:

  • Control flow: if, else, while, for, switch, break
  • Data types: int, float, char, double, boolean
  • Function-related: return, void, function, def
  • Other: class, public, private, static, extends.

 

Identifiers

Names created by the user for variables, functions, classes, etc.
Examples:

  • Variable names: x, myVar, totalSum
  • Function names: computeArea(), isEven()
  • Class names: Person, Vehicle.

 

Literals

Constant values directly written in the source code.
Examples:

  • Integer literals: 42, 0, -100
  • Floating-point literals: 3.14, -0.001, 2.0
  • String literals: "Hello", 'A', "123"
  • Boolean literals: true, false
  • Character literals: 'a', 'Z'.

 

Operators

Symbols that perform operations on operands.
Examples:

  • Arithmetic operators: +, -, *, /, %
  • Assignment operators: =, +=, -=, *=
  • Comparison operators: ==, !=, <, >, <=, >=
  • Logical operators: &&, ||, !
  • Bitwise operators: &, |, ^, ~, <<, >>.

 

Delimiters / Punctuation

Symbols that define the structure and separation in the code.
Examples:

  • Parentheses: (, )
  • Braces: {, }
  • Brackets: [, ]
  • Semicolon/: ;
  • Comma: ,
  • Dot (Period): .
  • Colon: :

 

Comments

Text ignored by the compiler but used for code documentation.
Examples:

  • Single-line comment: // This is a comment
  • Multi-line comment: /* This is a multi-line comment */

 

Whitespace

Spaces, tabs, or newlines that separate tokens. Typically ignored by the lexer but can be significant in some languages (e.g., Python).
Examples:
(space), \n (newline), \t (tab)

Lexical analysis

  • Identifies lexemes within the source code;
  • White space and comments are removed;
  • Generates a stream of tokens;
  • Each token passed to the parser on request;
  • Creates a symbol table of 'names' used in the source code;
  • Limited error checking at this stage.

Video - Compilation - Part Two: Lexical Analysis

Syntactic analysis

This is alternatively known as parsing. This stage analyses the syntax of the statements to ensure they conform to the rules of grammar for the computer language in question.
It is roughly the equivalent of checking that some ordinary text is written in a natural language (e.g. English) is grammatically correct (without worrying about meaning).
The purpose of syntax analysis or parsing is to check that we have a valid sequence of tokens. Tokens are a valid sequence of symbols, keywords, identifiers, etc. 

Syntax analyzers receive their inputs, in the form of tokens, from lexical analyzers. Lexical analyzers are responsible for the validity of a token supplied by the syntax analyzer.

Drawbacks of syntax analyzers:

  • it cannot determine if a token is valid,
  • it cannot determine if a token is declared before it is being used,
  • it cannot determine if a token is initialized before it is being used,
  • it cannot determine if an operation performed on a token type is valid or not.

Video - Compilation - Part Three: Syntax Analysis

 

Code optimization

Making the compile time as short as possible. Optimization is a program transformation technique, which tries to improve the code by making it consume fewer resources (i.e. CPU, Memory) and deliver high speed. The system programmer is responsible for optimization.

Optimizations provided by a compiler include:

  • Inlining small functions 
  • Code hoisting moves loop-invariant code out of loops.
  • Reduction in strength means code optimization obtained by the user of simpler machine instructions.
  • Dead store/code elimination removes values that never get used.
  • Eliminating common sub-expressions 
  • Loop unrolling avoids tests at every iteration of the loop.
  • Loop optimizations: Code motion, Induction variable elimination, and Reduction in strength.

Code generation

The code generated by the compiler is an object file.  In compiler design, object code is the code that the compiler creates after checking the program's structure and meaning, and optimizing it. This code is a machine-readable version of the original source code, allowing the computer’s CPU to run it directly. When the file runs the machine code is processed by the CPU. 

Minimum properties of low-level object code:​

  • It should carry the exact meaning of the source code.​
  • It should be efficient in terms of CPU usage and memory management.​

Using resources: Syntax Analysis 

Stages of compilation (Isaac CS)


Questions:

Give two examples of high-level languages. (Marks: 1)
  • eg Pascal, Python, etc.
A compiler is used to run them. What does it do? (Marks: 1)high-level
  • A compiler will translate a high level language into machine code and each program instruction translates into many machine code instructions.
What is the advantage of writing a program using Pascal or Python compared to writing the same program in assembly code? (Marks: 1)
  • The code can be compiled and distributed without the source code.

Exercises:

Ex. 1 Drag and Drop

Ex. 2 Fill the table

Ex. 3 "Order the stages of compilation"

Ex. 4 "Fill the gaps" (Author: Litvinova Olga - CS teacher of NIS Pavlodar)

Ex. 5 

 


Exam questions:

Why would a company not want to distribute source code when they sell a software package? (Marks: 2) 
  • Retention of source code ensures control over it is kept with the company or individual; (1)
  • software cannot be so easily reverse-engineered (taking design knowledge and reusing it); (1)
  • code cannot be modified. (1)
Категория: Programming languages | Добавил: bzfar77 (17.09.2020)
Просмотров: 10806 | Теги: compilation, object code, source code, syntactic analisys, code optimisation, Token, identifier, executable file, keyword, Operator, code genetation, lexical analisys | Рейтинг: 4.5/17
Всего комментариев: 0
avatar