0%

用LL(1)递归下降语法器构造一个计算器

LL(1)

何为LL(1)?通俗来说就是向前看一个词法单元的自顶向下解析器。两个L都代表left-to-right,第一个L表示解析器按“从左到右”的顺序解析输入内容;第二个L表示下降解析时也是按“从左到右”的顺序遍历子节点。而(1)表示它使用一个向前看 词法单元。

我们从一个简单的计算器来看看递归下降的语法器如何构造。

对于 2 + 3 * 5 的抽象语法树如下:

我们可以使用如下文法表示计算表达式:

1
2
3
4
5
# expr ::= expr addop term | term
# term ::= term mulop factor | factor
# factor ::= number | ( expr )
# addop ::= + | -
# mulop ::= * | /

例子

我们用Python构造一个递归下降实现的计算器。

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86

#!/usr/bin/env python
# author: Hua Liang [ Stupid ET ]
# email: et@everet.org
# website: http://EverET.org
#

# Grammar:
# expr ::= expr addop term | term
# term ::= term mulop factor | factor
# factor ::= number | ( expr )
# addop ::= + | -
# mulop ::= * | /

import tokenize, StringIO

tokens = None
cur_tok = None

def scan(text):
g = tokenize.generate_tokens(
StringIO.StringIO(text).readline)
return ((v[0], v[1]) for v in g)

def get_token():
global tokens, cur_tok
cur_tok = tokens.next()
#print cur_tok
return cur_tok

def match(type, val = ''):
global tokens, cur_tok
t, v = cur_tok
if t == type or t == tokenize.OP and v == val:
get_token()
else:
raise

def expr():
global cur_tok
tmp = term()
t, v = cur_tok
while v == '+' or v == '-':
match(tokenize.OP)
rhs = term()
e = str(tmp) + str(v) + str(rhs)
tmp = eval(e)
print e, '=', tmp
t, v = cur_tok
return tmp

def term():
global cur_tok
tmp = factor()
t, v = cur_tok
while v == '*' or v == '/':
match(tokenize.OP)
rhs = factor()
e = str(tmp) + str(v) + str(rhs)
tmp = eval(e)
print e, '=', tmp
t, v = cur_tok
return tmp

def factor():
global cur_tok
t, v = cur_tok
if t == tokenize.NUMBER:
match(tokenize.NUMBER)
return int(v)
elif v == '(':
match(tokenize.OP, '(')
tmp = expr()
match(tokenize.OP, ')')
return tmp
else:
raise

if __name__ == '__main__':
text = '12 + 2 * ( 5 + 6 )'
tokens = scan(text)
get_token()
res = expr()
print text, '=', res


对于12 + 2 * ( 5 + 6 ),运行结果如下:

源码请见:https://github.com/cedricporter/et-python/blob/master/compilers/func-cal/main.py

参考

[1] Language Implementation Patterns

[2] Compiling Little Languages in Python

[3] Python使用spark模块构造计算器