EverET.org

好记性不如烂笔头

用LL(1)递归下降语法器构造一个计算器

| Comments

LL(1)

何为LL(1)?通俗来说就是向前看一个词法单元的自顶向下解析器。两个L都代表left-to-right,第一个L表示解析器按“从左到右”的顺序解析输入内容;第二个L表示下降解析时也是按“从左到右”的顺序遍历子节点。而(1)表示它使用一个向前看 词法单元。

我们从一个简单的计算器来看看递归下降的语法器如何构造。

对于 2 + 3 * 5 的抽象语法树如下:

我们可以使用如下文法表示计算表达式:

1
2
3
4
5
# expr ::= expr addop term | term
# term ::= term mulop factor | factor
# factor ::= number | ( expr )
# addop ::= + | -
# mulop ::= * | /

例子

我们用Python构造一个递归下降实现的计算器。

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
#!/usr/bin/env python
# author:  Hua Liang [ Stupid ET ]
# email:   et@everet.org
# website: http://EverET.org
#

# Grammar:
#     expr    ::= expr addop term | term
#     term    ::= term mulop factor | factor
#     factor  ::= number | ( expr )
#     addop   ::= + | -
#     mulop   ::= * | /

import tokenize, StringIO

tokens = None
cur_tok = None

def scan(text):
    g = tokenize.generate_tokens(
        StringIO.StringIO(text).readline)
    return ((v[0], v[1]) for v in g)

def get_token():
    global tokens, cur_tok
    cur_tok = tokens.next()
    #print cur_tok
    return cur_tok

def match(type, val = ''):
    global tokens, cur_tok
    t, v = cur_tok
    if t == type or t == tokenize.OP and v == val:
        get_token()
    else:
        raise

def expr():
    global cur_tok
    tmp = term()
    t, v = cur_tok
    while v == '+' or v == '-':
        match(tokenize.OP)
        rhs = term()
        e = str(tmp) + str(v) + str(rhs)
        tmp = eval(e)
        print e, '=', tmp
        t, v = cur_tok
    return tmp

def term():
    global cur_tok
    tmp = factor()
    t, v = cur_tok
    while v == '*' or v == '/':
        match(tokenize.OP)
        rhs = factor()
        e = str(tmp) + str(v) + str(rhs)
        tmp = eval(e)
        print e, '=', tmp
        t, v = cur_tok
    return tmp

def factor():
    global cur_tok
    t, v = cur_tok
    if t == tokenize.NUMBER:
        match(tokenize.NUMBER)
        return int(v)
    elif v == '(':
        match(tokenize.OP, '(')
        tmp = expr()
        match(tokenize.OP, ')')
        return tmp
    else:
        raise

if __name__ == '__main__':
    text = '12 + 2 * ( 5 + 6 )'
    tokens = scan(text)
    get_token()
    res = expr()
    print text, '=', res

对于12 + 2 * ( 5 + 6 ),运行结果如下:

源码请见:https://github.com/cedricporter/et-python/blob/master/compilers/func-cal/main.py

参考

[1] Language Implementation Patterns

[2] Compiling Little Languages in Python

[3] Python使用spark模块构造计算器

本文链接: http://everet.org/calculator-by-recursive-descent-parser.html

您可能也喜欢

Comments