目标:开发汇编编译器,将Hack汇编语言编写成的程序翻译成Hach硬件平台能够理解的二进制代码。
分析:本章的本质就是文本处理,将给定的.asm文本根据给定的规则映射为.hack二进制文件。
我们先分析.asm文件,一行可以是一下几种情况:
1)指令:又分为A-指令,C-指令。
2)常数和符号:常数还好解决,用户自定义的符号,还得为其分配内存。
3)注释:以"//"开头的被认为是注释,忽略。
4)空行:忽略。
书中为了降低难度先要求实现一个无符号版的,这也是一种很好的思维方法,将复杂的问题,转换为简单的已知的问题。
书中也给出了模块的API,既然是文本处理,语言自然就是python啦。
实现:
Paser模块:
<span style="font-size:12px;">def hasMoreCommand(line): if not line: return 0 return 1 def advance(fl): #return next line line = fl.readline() return line def clear(line): index = line.find('//') if index > 0: #remove the gloss line = line[0:index] return line.strip('\n ') #remove the blank then return def commandType(line): if line.find('(') > -1: return 'L_COMMAND' elif line.find('@') > -1: return 'A_COMMAND' else: return 'C_COMMAND' def symbol(line): line = line.strip(' ()@\n') return line def dest(line): if line.find(";") > 0: #deal with the jump command return 'null' index = line.find('=') return line[0:index].strip() def comp(line): index = line.find(';') if index > 0: return line[0:index].strip() idx = line.find('=') return line[idx + 1:].strip()</span>
该模块,负责解析C-命令,返回comp,dest,jump域的二进制代码。
<span style="font-size:12px;">import Parser def dest(line): destDict = {'null':'000', 'M':'001', 'D':'010', 'MD':'011', 'A':'100', 'AM':'101', 'AD':'110', 'AMD':'111'} return destDict[Parser.dest(line)] def comp(line): compDict = {'0':'0101010', '1':'0111111', '-1':'0111010', 'D':'0001100', 'A':'0110000', '!D':'001101', '!A':'0110001', '-D':'0001111', '-A':'0110011', 'D+1':'0011111', 'A+1':'0110111', 'D-1':'0001110', 'A-1':'0110010', 'D+A':'0000010', 'D-A':'0010011', 'A-D':'0000111', 'D&A':'0000000', 'D|A':'0010101', 'M':'1110000', '!M':'1110001', '-M':'1110011', 'M+1':'1110111', 'M-1':'1110010', 'D+M':'1000010', 'D-M':'1010011', 'M-D':'1000111', 'D&M':'1000000', 'D|M':'1010101'} return compDict[Parser.comp(line)] def jump(line): jumpDict = {'null':'000', 'JGT':'001', 'JEQ':'010', 'JGE':'011', 'JLT':'100', 'JNE':'101', 'JLE':'110', 'JMP':'111'} return jumpDict[Parser.jump(line)]</span>
该模块实现无符号的汇编编译器,默认输出prog.hack文件。
<span style="font-size:12px;">import sys import Parser import Code fileName = sys.argv[1] rfile = open(fileName, 'r') wfile = open('prog.hack', 'w') line = rfile.readline() flag = Parser.hasMoreCommand(line) while flag: while line.startswith('\n') or line.startswith('//'): #skip the gloss line and null string line = Parser.advance(rfile) flag = Parser.hasMoreCommand(line) line = Parser.clear(line) #remove gloss and '\n' cType = Parser.commandType(line) if cType == 'L_COMMAND': LString = Parser.symbol(line) wfile.write('0' + LString + '\n') elif cType == 'A_COMMAND': AD = Parser.symbol(line) AB = bin(int(AD))[2:] AString = AB.zfill(15) wfile.write('0' + AString + '\n') elif cType == 'C_COMMAND': destString = Code.dest(line) compString = Code.comp(line) jumpString = Code.jump(line) wfile.write("111" + compString + destString + jumpString + "\n") line = Parser.advance(rfile) flag = Parser.hasMoreCommand(line)</span>
以下实现有符号版的:
遇到的问题:汇编程序允许在符号被定义之前使用该符号标签。
为了解决这个问题,需要读两遍文本,第一遍遍历时每遇到一条伪指令时,就在符号表上加一条该符号到下一条指令地址的映射,第二次遍历时,每次遇到符号化的A-指令时,
就查找符号表,若不存在,则在符号表中增加该符号到可用的RAM地址。改地址从16开始。
其他都和无符号版的差不多,只是增加了SymbolTable模块。
SymbolTable:
<span style="font-size:12px;">def Constructor(): symbolDict = {'SP':0, 'LCL':1, 'ARG':2, 'THIS':3, 'THAT':4, 'R0':0, 'R1':1, 'R2':2, 'R3':3, 'R4':4, 'R5':5, 'R6':6, 'R7':7, 'R8':8, 'R9':9, 'R10':10, 'R11':11, 'R12':12, 'R13':13, 'R14':14, 'R15':15, 'SCREEN':16384, 'KBD':24576} return symbolDict def addEntry(symbolDict, symbol, address): symbolDict[symbol] = address def contains(symbolDict, symbol): return symbolDict.has_key(symbol) def GetAddress(symbolDict, symbol): return symbolDict[symbol]</span>
assembler:
<span style="font-size:12px;">import sys import Parser import Code import SymbolTable fileName = sys.argv[1] rfile = open(fileName, 'r') wfile = open('prog.hack', 'w') smtbl = SymbolTable.Constructor() line = rfile.readline() flag = Parser.hasMoreCommand(line) counter = 0 while flag: #first loop complete the symboltable while line.startswith('\n') or line.startswith('//'): line = Parser.advance(rfile) flag = Parser.hasMoreCommand(line) line = Parser.clear(line) cType = Parser.commandType(line) if cType == 'L_COMMAND': sym = Parser.symbol(line) SymbolTable.addEntry(smtbl, sym, counter) else: counter += 1 line = Parser.advance(rfile) #read next line flag = Parser.hasMoreCommand(line) #qualify this line rfile.close() rfile = open(fileName) line = rfile.readline() flag = Parser.hasMoreCommand(line) j = 16 #remember the number of the memory address while flag: #second loop while line.startswith('\n') or line.startswith('//'): #skip the gloss line and null string line = Parser.advance(rfile) flag = Parser.hasMoreCommand(line) line = Parser.clear(line) #remove gloss and '\n' cType = Parser.commandType(line) if cType == 'L_COMMAND': pass elif cType == 'A_COMMAND': Acmd = Parser.symbol(line) if Acmd.isdigit(): #Acmd is digit bn = bin(int(Acmd))[2:] elif SymbolTable.contains(smtbl, Acmd): #Acmd is not digit but in Symboltable dec = SymbolTable.GetAddress(smtbl, Acmd) bn = bin(dec)[2:] else: #Acmd is a symbol that not exist in Symboltable bn = bin(j)[2:] SymbolTable.addEntry(smtbl, Acmd, j) j += 1 AString = bn.zfill(15) wfile.write('0' + AString + '\n') elif cType == 'C_COMMAND': destString = Code.dest(line) compString = Code.comp(line) jumpString = Code.jump(line) wfile.write("111" + compString + destString + jumpString + "\n") line = Parser.advance(rfile) flag = Parser.hasMoreCommand(line)</span>
至此我们的汇编编译器就完成啦!
原文:http://blog.csdn.net/u013573789/article/details/44892389