/ Parser

Parser

The parser comes after the lexer in the compiler chain. The parser reads in the list of tokens from the lexer and attempts to put them together into an Abstract Syntax Tree (AST) based on the specified AST schema. It works by starting at the top-level File schema and - for each schema - attempts to build each of the schemas' defined "forms" in turn until it finds a match. The complete tree is then passed onto the next compiler stage.

Each schema is divided into a series of forms. Each form contains 1 or more items. Each item follows a spec that marks what type index the node should have, the multiplicity of that item (whether it is mandatory, optional, or is to be repeated 1 or more or 0 or more times), and if there should be any specific data match (for things like specific symbols).

In each form's notation, content in bold specifies an exact symbol match e.g. = matches to that symbol specifically rather than any symbol token. Some tokens may be suffixed with an asterisk *, plus +, or question mark ? to denote multiplicity. No suffix means the item should appear exactly once, an asterisk means the item may appear 0 or more times contiguously, a plus means the item must appear at least once but may be repeated, and a question mark means the item is optional i.e. it does not have to be present, but if it is then it will not be repeated.

Once the parser has moved through the list of input tokens and built the AST, most expressions will move directly down through each expression type until the type of expression it is. This is because of the way the syntax spec is written. As a post-processing step in parsing, such parts of the AST will be "merged". For instance, an expression that cascades directly down to an AdditionExpression will be merged such that the AdditionExpression is the child of the top Expression rather than the list of single parents. This is to make the AST not only more readable, but also more semantically accurate as a lone AdditionExpression has nothing to do with say an AssignExpression, yet without this merging, there would be an AssignExpression node with the one child.

Bottom level tokens from the lexer do not have a schema as they are not comprised of any nodes. These tokens are the following:

SymbolSeparatorIntLitFloatLitByteLitBoolLitCharLitStringLitHexstrLitVarstrLitVarstrLitStartVarstrLitMiddleVarstrLitEndDocCommentIdentifier

Top Level Nodes

File Import* DeclDef* Import import Namespace ; import Namespace as Identifier ; import static QualifiedIdentifier ; import static QualifiedIdentifier as Identifier ; DeclDef ClassDecl EnumDecl AttrDecl MethodDecl

Declarations and Definitions

ClassDecl DocComment* Modifier* class Identifier TypeArgsDef? Inherits? DeclCondition? ; MemberDeclDef* DocComment* Modifier* interface Identifier TypeArgsDef? Inherits? DeclCondition? ; MemberDeclDef* DocComment* Modifier* entrypoint Identifier TypeArgsDef? Inherits? DeclCondition? ; MemberDeclDef*

Enums

EnumDecl DocComment* Modifier* enum Identifier : Type EnumMembersDef DocComment* Modifier* enum Identifier EnumMembersDef EnumMembersDef { FirstEnumMember* EnumMember } FirstEnumMember EnumMember , EnumMember DocComment* Identifier = Expression DocComment* Identifier

Class Members

MemberDeclDef AttrDecl MethodDecl DependencyDecl CastDecl OperatorDecl ExposeDecl AttrDecl DocComment* Attr DeclCondition? = Expression ; DocComment* Attr DeclCondition? ; Attr Modifier* Type Identifier MethodDecl DocComment* Method BaseMethod? DeclCondition? Body DocComment* Method BaseMethod? DeclCondition? ; Method Modifier* Type Identifier MethodParamsDef BaseMethod : base MethodParams DependencyDecl DocComment* Modifier* dependency Type as Identifier ; CastDecl DocComment* ( Type ) this Body OperatorDecl DocComment* Modifier* Type op Symbol MethodParamsDef? Body DocComment* Modifier* Type op read MethodParamsDef? Body DocComment* Modifier* Type op write MethodParamsDef? Body DocComment* Modifier* Type op index MethodParamsDef? Body DocComment* Modifier* Type op invoke MethodParamsDef? Body ExposeDecl DocComment* Modifier* expose PrimaryExpression as Identifier ;

Declaration Modifiers

Modifier body const flat piped private public ref restricted static DeclCondition where ( Expression )

Method Parameters

MethodParamsDef ( FirstMethodParamDef OtherMethodParamDef* ( ) FirstMethodParamDef Separator? MethodParamDef OtherMethodParamDef Separator MethodParamDef , MethodParamDef MethodParamDef MethodParamDecl = Expression MethodParamDecl MethodParamDecl PrimaryExpression MethodParams ( FirstMethodParam OtherMethodParam* ) ( ) FirstMethodParam Separator? MethodParam OtherMethodParam Separator MethodParam , MethodParam MethodParam Expression

Statements and Expressions

Body { Statement* } Statement Statement AttrDecl MethodDecl Expression ; BlockMethod BlockMethod QualifiedIdentifier MethodParams? Body Expression IfErrExpression IfErrExpression AssignExpression IfErrExpressionSuffix* IfErrExpressionSuffix #? AssignExpression AssignExpression PrimaryExpression AssignExpressionSuffix* AssignmentCondition? PipeExpression AssignmentCondition if MethodParams AssignExpressionSuffix = PipeExpression += PipeExpression -= PipeExpression *= PipeExpression /= PipeExpression %= PipeExpression |= PipeExpression ^= PipeExpression ~= PipeExpression *~= PipeExpression PipeExpression TernaryIfExpression PipeExpressionSuffix* PipeExpressionSuffix | TernaryIfExpression TernaryIfExpression OrExpression TernaryIfExpressionSuffix* TernaryIfExpressionSuffix ?? Expression ## OrExpression OrExpression XorExpression OrExpressionSuffix* OrExpressionSuffix || XorExpression XorExpression AndExpression XorExpressionSuffix* XorExpressionSuffix >< AndExpression AndExpression EqualityExpression AndExpressionSuffix* AndExpressionSuffix && EqualityExpression EqualityExpression ComparisonExpression EqualityExpressionSuffix* EqualityExpressionSuffix == ComparisonExpression != ComparisonExpression ComparisonExpression RangeExpression ComparisonExpressionSuffix* ComparisonExpressionSuffix > RangeExpression >= RangeExpression <= RangeExpression < RangeExpression has RangeExpression is RangeExpression RangeExpression AdditionExpression RangeExpressionSuffix* RangeExpressionSuffix .. AdditionExpression AdditionExpression ConcatExpression AdditionExpressionSuffix* AdditionExpressionSuffix + ConcatExpression - ConcatExpression ConcatExpression TimesConcatExpression ConcatExpressionSuffix* ConcatExpressionSuffix ~ TimesConcatExpression TimesConcatExpression MultiplicationExpression TimesConcatExpressionSuffix* TimesConcatExpressionSuffix *~ MultiplicationExpression MultiplicationExpression ExponentialExpression MultiplicationExpressionSuffix* MultiplicationExpressionSuffix * ExponentialExpression / ExponentialExpression % ExponentialExpression ExponentialExpression CastExpression ExponentialExpressionSuffix* ExponentialExpressionSuffix ^ CastExpression CastExpression CastExpressionPrefix* UnaryExpression CastExpressionPrefix ( Type ) UnaryExpression UnaryExpressionPrefix* PostfixExpression UnaryExpressionPrefix ++ -- + - ! PostfixExpression PrimaryExpression PostfixExpressionSuffix* PostfixExpressionSuffix . Type ++ -- MethodParams Body? PrimaryExpression Type IntLit FloatLit ByteLit BoolLit CharLit StringLit HexstrLit VarString ClassLiteral ObjectLiteral ArrayLiteral NewExpression Attr Block MethodLiteral ( Expression ) NewExpression new Type MethodParams Body?

Complex Literals

VarString VarstrLitStart Expression VarStringMiddle* VarstriLitEnd VarstrLit VarStringMiddle VarstrLitMiddle Expression ClassLiteral { MemberDeclDef* } ObjectLiteral { FirstMemberAssign* MemberAssign } { } FirstMemberAssign MemberAssign , MemberAssign AssignExpression ArrayLiteral [ FirstArrayMember* ArrayMember ] [ ] FirstArrayMember ArrayMember , ArrayMember ..? Expression ..? Block { Statement* } MethodLiteral Type MethodParamsDef Body

Types

Type QualifiedIdentifier TypeArgs? ArrayType* ArrayType [ Expression? ] TypeArgsDef < FirstTypeArgDef* TypeArgDef > FirstTypeArgDef TypeArgDef , TypeArgDef Identifier TypeArgs < FirstTypeArg* TypeArg > FirstTypeArg TypeArg , TypeArg Type

General Nodes

QualifiedIdentifier Qualification* Identifier Namespace Qualification* Identifier Qualification Identifier .