Data Types

 

The Symbol type is used to represent each terminal, special terminal and nonterminal in the Symbol Table.

type Symbol

Kind    is a Symbol-Type-Constant
Name    is a String
Match   is a Symbol reference     This is just a reference or pointer. It is used for block comments.

end type

 

Each rule consists of a series of symbols, both terminals and nonterminals, and the single Head nonterminal that the rule defines. After the tables are loaded, Rules should not be able to be modified.

type Rule

Head    is Symbol
Symbols is an array of Symbol

end type

 

When the parsing engine has read enough tokens to conclude a rule in the grammar is complete, it is 'reduced' and passed to the developer. Basically a 'reduction'  will contain the tokens which correspond to the symbols of the rule. Tokens can represent actual data read from the file (a.k.a. terminals), but also may contain objects as well. Since a reduction contains the terminals of a rule as well as the nonterminals (reductions made earlier), the parser engine creates a "parse tree"   which contains a break down of the source text along the grammar's rules.

type Reduction

Parent-Rule is a Rule
Tokens      is an array of Token

end type

 

The Token type extends the Symbol type and is used to stored parsed data. Unlike symbols, which are used to represent a category of terminals and nonterminals, tokens represent different instances of these symbols. The information that is read from the source stream is stored into the "Data" property which can be modified at the developer's will.

type Token

Parent-Symbol  is a Symbol
Data           is any object

end type

 

LALR PARSER:
Each state represents a point in the parse process where a number of tokens have been read from the source and rules are in different states of completion. Each time an token is read, it is compared to the LALR-State's actions and the appropiate action is taken.

type LALR-Action

Entry   is a Symbol
Action
  is an Action-Type-Constant
Target  is either a Rule or LALR-State

end type

 

type LALR-State

Actions is an array of LALR-Action

end type

 

DFA TOKENIZER:
A DFA is commonly represented with a graph. The term "graph" is used quite loosely by other scientific fields. Often, it is refers to a plotted mathematical function or graphical representation of data. In computer science terms, however, a "graph" is simply a collection of nodes connected by edges.

type DFA-State

Accept-Symbol is a Symbol
Edges         is an array of DFA-Edge

end type

 

type DFA-Edge

Character-Set  is a set of Unicode characters
Target-State   is a DFA-State

end type