Google

"http://www.w3.org/TR/html4/loose.dtd"> >

Chapter 14
Code generation

This chapter shows the code generated by TPG. It is not necessary to read it to understand how TPG works. This chapter has been written mostly the curious readers.

14.1 Inheritance

TPG parsers can inherit from other Python classes (see 7.2). See figure 14.1 for the generated code.



Figure 14.1: Inheritance example


Grammar

Generated code





 
parser MyParser(Base1, Base2):
 
class MyParser(tpg.base.ToyParser,Base1,Base2):



14.2 Lexer

The figure 14.2 shows token precedence (see 6.2). Tokens are declared in the order of appearance except from inline tokens that are declared before predefined tokens.



Figure 14.2: Lexer example


Grammar

Generated code





 
parser Foo:  
 
 
 
 
 
    token integer: '\d+' int ;  
    token arrow: '->' ;  
    separator spaces: '\s+' ;  
 
 
    S ->  
 
        '\('  
        integer  
        arrow  
        '\)'  
        ;
 
class Foo(tpg.base.ToyParser,):  
 
    def _init_scanner(self):  
        self._lexer = tpg.base._Scanner(  
            tpg.base._TokenDef(r"_tok_1", r"\("),  
            tpg.base._TokenDef(r"_tok_2", r"\)"),  
            tpg.base._TokenDef(r"integer", r"\d+", int, 0),  
            tpg.base._TokenDef(r"arrow", r"->", None, 0),  
            tpg.base._TokenDef(r"spaces", r"\s+", None, 1),  
        )  
 
    def S(self,):  
        """ S -> '\(' integer arrow '\)' """  
        self._eat('_tok_1') # \(  
        self._eat('integer')  
        self._eat('arrow')  
        self._eat('_tok_2') # \)



14.3 Parser

14.3.1 Grammar rules

Grammar rules (see 7.3) are used to define what a symbol is composed of. A rule is translated into a method of the parser class (see figure 14.3). The attributes of the symbol are the parameters of the methods. The docstring of the method is the grammar rule.



Figure 14.3: Rule declaration example


Grammar

Generated code





 
parser Foo:  
 
    Symbol1 -> ;  
 
 
    Symbol2<arg1, arg2, arg3> -> ;  
 
 
    Symbol3/ret_val -> ;  
 
 
 
    Symbol4<arg1, arg2, arg3>/ret_val -> ;
 
class Foo(tpg.base.ToyParser,):  
 
    def Symbol1(self,):  
        """ Symbol1 ->  """  
 
    def Symbol2(self,arg1,arg2,arg3):  
        """ Symbol2 ->  """  
 
    def Symbol3(self,):  
        """ Symbol3 ->  """  
        return ret_val  
 
    def Symbol4(self,arg1,arg2,arg3):  
        """ Symbol4 ->  """  
        return ret_val



14.3.2 Symbols

Terminal symbols

Terminal symbols (see 6.2) are recognized by calling the _eat method with the name of the token to match (see figure 14.4). Terminal symbols can return the token text in a string. If the current token is not the expected token, _eat raises a TPGWrongMatch exception. This exception will be catched either by an outer choice point to try another choice or by TPG to turn this exception into a ParserError exception.



Figure 14.4: Terminal symbol matching example


Grammar

Generated code





 
parser Foo:  
 
 
 
 
    token predef: 'bar' ;  
 
 
    S ->  
 
        'inline'  
        predef  
        'inline'/s1  
        predef/s2  
        ;
 
class Foo(tpg.base.ToyParser,):  
 
    def _init_scanner(self):  
        self._lexer = tpg.base._Scanner(  
            tpg.base._TokenDef(r"inline", r"inline"),  
            tpg.base._TokenDef(r"predef", r"bar", None, 0),  
        )  
 
    def S(self,):  
        """ S -> 'inline' predef 'inline' predef """  
        self._eat('inline')  
        self._eat('predef')  
        s1 = self._eat('inline')  
        s2 = self._eat('predef')



Non terminal symbols

Non terminal symbols (see 7.5) are recognized by calling their rules (see figure 14.5). Non terminal symbols can have attributes, a return value or both.



Figure 14.5: Non terminal symbol matching example


Grammar

Generated code





 
parser Foo:  
 
    S ->  
 
        NTerm1  
        NTerm2<arg1, arg2>  
        NTerm3/ret_val  
        NTerm4<arg1, arg2>/ret_val  
        ;
 
class Foo(tpg.base.ToyParser,):  
 
    def S(self,):  
        """ S -> NTerm1 NTerm2 NTerm3 NTerm4 """  
        self.NTerm1()  
        self.NTerm2(arg1,arg2)  
        ret_val = self.NTerm3()  
        ret_val = self.NTerm4(arg1,arg2)



14.3.3 Sequences

The token number is updated by the _eat method when called so a sequence (see 7.6) in a rule is translated into a sequence of statements in Python (see figure 14.6).



Figure 14.6: Sequence of expressions example


Grammar

Generated code





 
parser Foo:  
 
    S -> A B C ;
 
class Foo(tpg.base.ToyParser,):  
 
    def S(self,):  
        """ S -> A B C """  
        self.A()  
        self.B()  
        self.C()



14.3.4 Cut

The cut mechanism (see 7.7) is implemented as a shortcut to the TPGWrongMatch exception. When the sequence following a cut fails, i.e. when it raises a TPGWrongMatch exception, TPG turns this exception into a ParserError exception to immediately abort parsing (see figure 14.7).



Figure 14.7: Cut example


Grammar

Generated code





 
parser Foo:  
 
    S ->  
 
 
 
        A1 !  
 
            B1  
            C1  
 
 
    |  
 
        A2 !  
 
            B2  
            C2  
    ;
 
class Foo(tpg.base.ToyParser,):  
 
    def S(self,):  
        """ S -> A1 B1 C1 | A2 B2 C2 """  
        __p1 = self._cur_token  
        try:  
            self.A1()  
            try:  
                self.B1()  
                self.C1()  
            except self.TPGWrongMatch, e:  
                self.ParserError(e.last)  
        except self.TPGWrongMatch:  
            self._cur_token = __p1  
            self.A2()  
            try:  
                self.B2()  
                self.C2()  
            except self.TPGWrongMatch, e:  
                self.ParserError(e.last)



14.3.5 Alternatives

Alternatives (see 7.8) are tried in the order they are declared. Before trying the first branch, TPG saves the current token number. If the first choice fails, the token number is restored before trying the second branch. When a branch fails, the _eat method raises a TPGWrongMatch exception which is catched by the alternative structure. This algorithm is very simple to implement but isn’t very efficient. This is how the computation of any prediction table is avoided.



Figure 14.8: Alternative in expressions example


Grammar

Generated code





 
parser Foo:  
 
    S -> A | B | C | D ;
 
class Foo(tpg.base.ToyParser,):  
 
    def S(self,):  
        """ S -> A | B | C | D """  
        __p1 = self._cur_token  
        try:  
            try:  
                self.A()  
            except self.TPGWrongMatch:  
                self._cur_token = __p1  
                self.B()  
        except self.TPGWrongMatch:  
            self._cur_token = __p1  
            try:  
                self.C()  
            except self.TPGWrongMatch:  
                self._cur_token = __p1  
                self.D()



14.3.6 Repetitions

Repetitions (see 7.9) are implemented in a similar way to alternatives. The TPGWrongMatch tells TPG when to go out of the loop. See figures 14.9 and 14.10 for repetition examples.



Figure 14.9: Repetition examples: builtin ?, * and +


Grammar

Generated code





 
parser Repetitions:  
 
    ZERO_or_ONE ->  
 
 
 
        A ?  
        ;  
 
 
    ZERO_or_MORE ->  
 
 
 
 
        A *  
        ;  
 
 
 
 
    ONE_or_MORE ->  
 
 
 
 
 
        A +  
        ;
 
class Repetitions(tpg.base.ToyParser,):  
 
    def ZERO_or_ONE(self,):  
        """ ZERO_or_ONE -> A? """  
        __p1 = self._cur_token  
        try:  
            self.A()  
        except self.TPGWrongMatch:  
            self._cur_token = __p1  
 
    def ZERO_or_MORE(self,):  
        """ ZERO_or_MORE -> A* """  
        __p1 = self._cur_token  
        while 1:  
            try:  
                self.A()  
                __p1 = self._cur_token  
            except self.TPGWrongMatch:  
                self._cur_token = __p1  
                break  
 
    def ONE_or_MORE(self,):  
        """ ONE_or_MORE -> A+ """  
        __p1 = self._cur_token  
        __n1 = 0  
        while 1:  
            try:  
                self.A()  
                __n1 += 1  
                __p1 = self._cur_token  
            except self.TPGWrongMatch:  
                if __n1 >= 1:  
                    self._cur_token = __p1  
                    break  
                else:  
                    self.WrongMatch()





Figure 14.10: Repetition examples: user defined {m,n}


Grammar

Generated code





 
parser Repetitions:  
 
    USER_DEFINED ->  
 
 
 
 
 
        A{2,5}  
        ;
 
class Repetitions(tpg.base.ToyParser,):  
 
    def USER_DEFINED(self,):  
        """ USER_DEFINED -> A{2,5} """  
        __p1 = self._cur_token  
        __n1 = 0  
        while __n1<5:  
            try:  
                self.A()  
                __n1 += 1  
                __p1 = self._cur_token  
            except self.TPGWrongMatch:  
                if __n1 >= 2:  
                    self._cur_token = __p1  
                    break  
                else:  
                    self.WrongMatch()



14.3.7 Abstract syntax trees

Abstract syntax trees (see 7.11.1) are simply Python objects. The figure 14.11 shows the instanciation of a node. The figure 14.12 shows the update with the add method.



Figure 14.11: AST instanciation example


Grammar

Generated code





 
{{  
    class Couple:  
        def __init__(self, a, b):  
            self.a = a  
            self.b = b  
}}  
parser Foo:  
    COUPLE1/c ->  
 
        c=Couple<a,b>  
        ;  
 
    COUPLE2/Couple<a,b> ->  
 
        ;
 
 
class Couple:  
    def __init__(self, a, b):  
        self.a = a  
        self.b = b  
class Foo(tpg.base.ToyParser,):  
 
    def COUPLE1(self,):  
        """ COUPLE1 ->  """  
        c = Couple(a,b)  
        return c  
 
    def COUPLE2(self,):  
        """ COUPLE2 ->  """  
        return Couple(a,b)





Figure 14.12: AST update example


Grammar

Generated code





 
{{  
    class List(list):  
        add = list.append  
}}  
parser Foo:  
    LIST/l ->  
 
        l = List<>  
        ITEM/a  
        l-a  
        ;
 
 
class List(list):  
    add = list.append  
class Foo(tpg.base.ToyParser,):  
 
    def LIST(self,):  
        """ LIST -> ITEM """  
        l = List()  
        a = self.ITEM()  
        l.add(a)  
        return l



14.3.8 Text extraction

Text can be extracted (see 7.11.2) from the input string (including separators). The prefix @ operator puts a mark on the current token. The infix .. operator extracts the text between two marks.

The figure 14.13 shows how this extraction works.



Figure 14.13: Text extraction


Grammar

Generated code





 
parser Foo:  
 
    S ->  
 
        A  
        @x          # put a mark 'x'  
        B  
        C  
        @y          # put a mark 'y'  
        t = x..y    # extract from 'x' to 'y'  
        ;
 
class Foo(tpg.base.ToyParser,):  
 
    def S(self,):  
        """ S -> A B C """  
        self.A()  
        x = self._mark()  
        self.B()  
        self.C()  
        y = self._mark()  
        t = self._extract(x,y)



14.3.9 Python objects

TPG has an adapted syntax for some Python expressions (see 7.11.3).

The figure 14.14 shows this implementation.



Figure 14.14: Python object in TPG


Grammar

Generated code





 
parser Foo:  
 
    Bar ->  
 
        x = y  
        x = "string"  
        x = <y>  
        x = <y, z>  
        x = {{ x + y }}  
        x = y.z  
        x = y<a,b>  
        x = z<>  
        x = lst[1]  
        x = lst[2:3]  
        x = lst[:3]  
        x = lst[2:]  
        x = lst[:]  
        ;
 
class Foo(tpg.base.ToyParser,):  
 
    def Bar(self,):  
        """ Bar ->  """  
        x = y  
        x = r"string"  
        x = (y, )  
        x = (y, z, )  
        x = x + y  
        x = y.z  
        x = y(a,b)  
        x = z()  
        x = lst[1]  
        x = lst[2:3]  
        x = lst[:3]  
        x = lst[2:]  
        x = lst[:]