banner
DIYgod

Hi, DIYgod

写代码是热爱,写到世界充满爱!
github
twitter
bilibili
telegram
email
steam
playstation
nintendo switch

Implementation of a complete compiler (Part 2) Syntax Analysis

1 Series Description

GitHub Address Source Code for Each Stage Collection of Explanations for Each Stage

2 Syntax Analysis Explanation

Syntax: The method of combining words to form phrases, clauses, or sentences.

After lexical analysis, we can already recognize the input text as individual words. The goal of this stage is to recognize these words as sentences and determine if the combination of words conforms to the syntax we have defined.

2.1 Defining Syntax with Grammar

Syntax analysis requires additional expressive power obtained through recursion. Obviously, regular expressions cannot meet our needs.

In fact, grammar can also be used to describe the structure of lexical words, but regular expressions are already sufficient for this purpose. In such cases, using regular expressions is more concise.

2.2 LR(1) Analysis Method

@%……¥&%#¥ Too complicated, don't want to explain.

In short, LR(1) is a very powerful analysis algorithm that can solve many shift-reduce conflicts. Most programming languages that describe their syntax using context-free grammars have an LR(1) grammar.

2.3 Generating Syntax Analyzer using Yacc

The algorithm for constructing LR(1) parsing table is simple enough to be automated by computers, and manual construction is very tedious and boring. Therefore, using Yacc is a wise decision.

Similar to Lex, the Yacc specification is divided into three parts:

%{
...
%}
...
%%
...

The first part is the same as Lex, including includes and declarations.

The second part defines the terminals received from lexical analysis, the start symbol, precedence, etc.

The third part defines the grammar and semantic actions. Only the grammar is defined in the syntax analysis stage, and the semantic actions are completed in the semantic analysis stage.

2.3.1 Conflicts

Yacc chooses shift over reduce to resolve shift-reduce conflicts, and chooses the rule that appears first in the grammar to resolve reduce-reduce conflicts.

2.3.2 Precedence Guidance

Defining precedence is to resolve ambiguity, which makes writing grammar much easier.

Yacc can include precedence guidance commands in the second part:

%left COMMA
%right PLUSASSIGN MINUSASSIGN TIMESASSIGN DIVIDEASSIGN ASSIGN
%left OR
%left AND
%left EQ NEQ
%left LE GE LT GT
%left PLUS MINUS
%left TIMES DIVIDE MOD
%right INC DEC NOT
%left LPAREN RPAREN LBRACK RBRACK

The precedence decreases from top to bottom, and "left" and "right" indicate whether the word is left-associative or right-associative.

3 Specific Implementation

The source code for the syntax analyzer and the improved lexical analyzer can be found at the beginning of the article.

At this stage, the compiler has developed into four modules:

  1. Error handling module (errormsg.c errormsg.h): Used to generate error messages with file names and line numbers.
  2. Common utility module (util.c util.h): Defines some commonly used functions.
  3. Lexical analysis module (simplec.lex): Performs lexical analysis using Lex.
  4. Syntax analysis module (simplec.yacc): Performs syntax analysis using Yacc.

The previous stage's token.h is no longer needed. Instead, Yacc will automatically generate a header file, y.tab.h, related to the words based on the word specifications in the grammar we write. parsetest.c is a driver program that will output "Parsing successful!" under normal circumstances.

Attachment: ANSI C grammar (Lex) ANSI C grammar (Yacc)

 

Syntax analysis Done.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.