ANTLR Parse Grammatik -> Tree Grammatik

Unsere letzte Aufgabe für unsere Compiler-Theorie-Klasse ist die Erstellung eines Compilers für eine kleine Java-Untergruppe (nicht MiniJava). Unser Prof gab uns die Möglichkeit, die Werkzeuge zu benutzen, die wir uns wünschen, und nach langem Stöbern entschied ich mich für ANTLR. Ich habe es geschafft, den Scanner und den Parser in Betrieb zu nehmen, und der Parser gibt einen AST aus. Ich stehe jetzt fest und versuche eine Baumgrammatikdatei zum kompilieren zu bekommen. Ich verstehe, dass die Grundidee darin besteht, die Grammatikregeln aus dem Parser zu kopieren und den größten Teil des Codes zu eliminieren, wobei die Umschreibungsregeln bestehen bleiben, aber anscheinend nicht kompiliert werden müssen (offendingToken error). Bin ich auf dem richtigen Weg? Fehle ich etwas Triviales?ANTLR Parse Grammatik -> Tree Grammatik

Baum Grammatik:

tree grammar J0_SemanticAnalysis; 

options { 
    language = Java; 
    tokenVocab = J0_Parser; 
    ASTLabelType = CommonTree; 
} 

@header 
{ 
    package ritterre.a4; 
    import java.util.Map; 
    import java.util.HashMap; 
} 

@members 
{ 

} 

walk 
    : compilationunit 
    ; 

compilationunit 
    : ^(UNIT importdeclaration* classdeclaration*) 
    ; 

importdeclaration 
    : ^(IMP_DEC IDENTIFIER+) 
    ; 

classdeclaration 
    : ^(CLASS IDENTIFIER ^(EXTENDS IDENTIFIER)? fielddeclaration* methoddeclaration*) 
    ; 

fielddeclaration 
    : ^(FIELD_DEC IDENTIFIER type visibility? STATIC?) 
    ; 

methoddeclaration 
    : ^(METHOD_DEC IDENTIFIER type visibility? STATIC? ^(PARAMS parameter+)? body) 
    ; 

visibility 
    : PRIVATE 
    | PUBLIC 
    ; 

parameter 
    : ^(PARAM IDENTIFIER type) 
    ; 

body 
    : ^(BODY ^(DECLARATIONS localdeclaration*) ^(STATEMENTS statement*)) 
    ; 

localdeclaration 
    : ^(DECLARATION type IDENTIFIER) 
    ; 

statement 
    : assignment  
    | ifstatement  
    | whilestatement 
    | returnstatement 
    | callstatement 
    | printstatement 
    | block 
    ; 

assignment 
    : ^(ASSIGN IDENTIFIER+ expression? expression) 
    ; 

ifstatement 
    : ^(IF relation statement ^(ELSE statement)?) 
    ; 

whilestatement 
    : ^(WHILE relation statement) 
    ; 

returnstatement 
    : ^(RETURN expression?) 
    ; 

callstatement 
    : ^(CALL IDENTIFIER+ expression+) 
    ; 

printstatement 
    : ^(PRINT expression) 
    ; 

block 
    : ^(STATEMENTS statement*) 
    ; 

relation 
// : expression (LTHAN | GTHAN | EQEQ | NEQ)^ expression 
    : ^(LTHAN expression expression) 
    | ^(GTHAN expression expression) 
    | ^(EQEQ expression expression) 
    | ^(NEQ expression expression) 
    ; 

expression 
// : (PLUS | MINUS)? term ((PLUS | MINUS)^ term)* 
    : ^(PLUS term term) 
    | ^(MINUS term term) 
    ; 

term 
// : factor ((MULT | DIV)^ factor)* 
    : ^(MULT factor factor) 
    | ^(DIV factor factor) 
    ; 

factor 
    : NUMBER 
    | IDENTIFIER (DOT IDENTIFIER | LBRAC expression RBRAC)? 
    | NULL 
    | NEW IDENTIFIER LPAREN RPAREN 
    | NEW (INT | IDENTIFIER) (LBRAC RBRAC)? 
    ; 

type 
    : (INT | IDENTIFIER) (LBRAC RBRAC)? 
    | VOID 
    ;

Parser Grammatik:

parser grammar J0_Parser; 

options 
{ 
    output = AST;    // Output an AST 
    tokenVocab = J0_Scanner; // Pull Tokens from Scanner 
    //greedy = true; // forcing this throughout?! success!! 
    //cannot force greedy true throughout. bad things happen and the parser doesnt build 
} 

tokens 
{ 
    UNIT; 
    IMP_DEC; 
    FIELD_DEC; 
    METHOD_DEC; 
    PARAMS; 
    PARAM; 
    BODY; 
    DECLARATIONS; 
    STATEMENTS; 
    DECLARATION; 
    ASSIGN; 
    CALL; 
} 

@header { package ritterre.a4; } 

// J0 - Extended Specification - EBNF 
parse 
    : compilationunit EOF -> compilationunit 
    ; 

compilationunit 
    : importdeclaration* classdeclaration* 
    -> ^(UNIT importdeclaration* classdeclaration*) 
    ; 

importdeclaration 
    : IMPORT IDENTIFIER (DOT IDENTIFIER)* SCOLON 
    -> ^(IMP_DEC IDENTIFIER+) 
    ; 

classdeclaration 
    : (PUBLIC)? CLASS n=IDENTIFIER (EXTENDS e=IDENTIFIER)? LBRAK (fielddeclaration|methoddeclaration)* RBRAK 
    -> ^(CLASS $n ^(EXTENDS $e)? fielddeclaration* methoddeclaration*) 
    ; 

fielddeclaration 
    : visibility? STATIC? type IDENTIFIER SCOLON 
    -> ^(FIELD_DEC IDENTIFIER type visibility? STATIC?) 
    ; 

methoddeclaration 
    : visibility? STATIC? type IDENTIFIER LPAREN (parameter (COMMA parameter)*)? RPAREN body 
    -> ^(METHOD_DEC IDENTIFIER type visibility? STATIC? ^(PARAMS parameter+)? body) 
    ; 

visibility 
    : PRIVATE 
    | PUBLIC 
    ; 

parameter 
    : type IDENTIFIER 
    -> ^(PARAM IDENTIFIER type) 
    ; 

body 
    : LBRAK localdeclaration* statement* RBRAK 
    -> ^(BODY ^(DECLARATIONS localdeclaration*) ^(STATEMENTS statement*)) 
    ; 

localdeclaration 
    : type IDENTIFIER SCOLON 
    -> ^(DECLARATION type IDENTIFIER) 
    ; 

statement 
    : assignment 
    | ifstatement 
    | whilestatement 
    | returnstatement 
    | callstatement 
    | printstatement 
    | block 
    ; 

assignment 
    : IDENTIFIER (DOT IDENTIFIER | LBRAC a=expression RBRAC)? EQ b=expression SCOLON 
    -> ^(ASSIGN IDENTIFIER+ $a? $b) 
    ; 

ifstatement 
    : IF LPAREN relation RPAREN statement (options {greedy=true;} : ELSE statement)? 
    -> ^(IF relation statement ^(ELSE statement)?) 
    ; 

whilestatement 
    : WHILE LPAREN relation RPAREN statement 
    -> ^(WHILE relation statement) 
    ; 

returnstatement 
    : RETURN expression? SCOLON 
    -> ^(RETURN expression?) 
    ; 

callstatement 
    : IDENTIFIER (DOT IDENTIFIER)? LPAREN (expression (COMMA expression)*)? RPAREN SCOLON 
    -> ^(CALL IDENTIFIER+ expression+) 
    ; 

printstatement 
    : PRINT LPAREN expression RPAREN SCOLON 
    -> ^(PRINT expression) 
    ; 

block 
    : LBRAK statement* RBRAK 
    -> ^(STATEMENTS statement*) 
    ; 

relation 
    : expression (LTHAN | GTHAN | EQEQ | NEQ)^ expression 
    ; 

expression 
    : (PLUS | MINUS)? term ((PLUS | MINUS)^ term)* 
    ; 

term 
    : factor ((MULT | DIV)^ factor)* 
    ; 

factor 
    : NUMBER 
    | IDENTIFIER (DOT IDENTIFIER | LBRAC expression RBRAC)? 
    | NULL 
    | NEW IDENTIFIER LPAREN RPAREN 
    | NEW (INT | IDENTIFIER) (LBRAC RBRAC)? 
    ; 

type 
    : (INT | IDENTIFIER) (LBRAC RBRAC)? 
    | VOID 
    ;

Quelle

2012-04-14 Eric

Wenn Sie auch die gesamte Fehlermeldung posten, können wir Ihnen weiterhelfen :) – huon

Leider gibt das ANTLR-Plugin für Eclipse anscheinend keine Zeilennummern für seine Fehlermeldungen aus. Alles, was ich bekomme, ist sehr kryptisch: java.lang.NoSuchFieldError: offendingToken. – Eric

Haben Sie versucht, Bereiche zu löschen/auszukommentieren, um einzugrenzen, was das Problem verursacht? – huon

Das Problem ist, dass in Ihrem Baum Grammatik, haben Sie folgende Möglichkeiten (3 mal glaube ich):

classdeclaration 
    : ^(CLASS ... ^(EXTENDS IDENTIFIER)? ...) 
    ;

der ^(EXTENDS IDENTIFIER)? Teil ist falsch: Sie müssen den Baum um Klammern umwickeln, und erst dann machen i t optional:

classdeclaration 
    : ^(CLASS ... (^(EXTENDS IDENTIFIER))? ...) 
    ;

Allerdings wäre es ein bisschen zu einfach, wenn das alles wäre, oder? :)

Wenn Sie das oben erwähnte Problem beheben, wird sich ANTLR beschweren, dass die Baumgrammatik mehrdeutig ist, wenn Sie versuchen, einen Baumspaziergänger aus Ihrer Baumgrammatik zu generieren. ANTLR die folgende auf Sie werfen:

error(211): J0_SemanticAnalysis.g:61:26: [fatal] rule assignment has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.

Es beschwert sich über die assignment Regel in der Grammatik:

assignment 
    : ^(ASSIGN IDENTIFIER+ expression? expression) 
    ;

seit ANTLR ist ein LL Parser-Generator , analysiert er Token von links nach rechts. Daher macht der optionale Ausdruck in IDENTIFIER+ expression? expression die Grammatik mehrdeutig. Dieses Problem beheben, indem Sie den ? zur letzten expression bewegen:

assignment 
    : ^(ASSIGN IDENTIFIER+ expression expression?) 
    ;

lassen Sie sich nicht die beiden letzten Buchstaben im Namen ANT LR Sie verleiten, sie stehen für L anguage R ecognition, nicht die Klasse der Parser es generiert!

Quelle

2012-04-14 14:23:46

Vielen Dank! Alles baut jetzt, und ich kann zur nächsten Phase übergehen! – Eric

Gern geschehen @Eric. –

Antwort

Verwandte Themen