Skip to main content.

 
DUE: Wednesday, October 11, 2006

Compiling Arithmetic Expressions

Overview: Your assignment for this project is to write a simple compiler from infix arithmetic expressions to an imaginary assembly language. Assembly language is "a human-readable notation for the machine language used to control a specific computer architecture" [wikipedia]. The imaginary assembly language instructions you will be generating here have the form:

add a b ==> c
mul b 3 ==> d
There are four assembly instructions, corresponding to the basic arithmetic operations: mul, div, add, sub.

Your final program should read an expression from standard input (keyboard) and print to standard output the compiled result: (the underlined portion represents the user's input)

Enter an expression: a + b - 56 * 34 / b3_3f
add   a        b        ==>   var000
mul   56       34       ==>   var001
div   var001   b3_3f    ==>   var002
sub   var000   var002   ==>   var003

As you can see, the compiler breaks the infix arithmetic expression into single arithmetic steps, storing the results in newly generated variable names as it goes. I will provide you a simple function to generate the variable names.

Algorithm

(This problem taken from Objects, Abstraction, Data Structures, and Design using Java by Koffman and Wolfgang).

Assume that the tokens (operators and operands) in the input string are separated by spaces. You will use two stacks in this algorithm. Your program should read in each token and process it as follows:

If the character is neither an operand nor an operator, display a helpful error message and terminate the program. If it is an operand, push it onto the operand stack. If it is an operator, compare its precedence to that of the operator on top of the operator stack. If the current operator has higher precedence than the one on top of the stack (or if the stack is empty), it should be pushed onto the operator stack. As long as the current operator has the same or lower precedence to the one on top of the operator stack, the operator on top of the operator stack must be evaluated next. This is done by popping that operator off the operator stack along with a pair of operands from the operand stack and writing a new line in the output table. The variable selected to hold the result should then be pushed onto the operand stack. Continue this process until the top of the operator stack has lower precedence than the current operator, or until the stack is empty. At this point, push the current operator onto the top of the stack and examine the next token in the input.

When the end of the input is reached, pop any remaining operator along with its operand pair and output a line. Remember to push the result variable onto the operand stack after each line of output is generated.

Notes

Don't make this problem harder than it is. Read the algorithm above and make sure you understand it well enough to be able to carry it out step by step with paper and pencil. The completed program should probably not be longer than this web page. In implementing the algorithm, pay attention to the following programmatic details:

Extra Credit

For extra credit, extend the algorithm above in some way; for example, support input expressions with parentheses, or a better tokenizing function that doesn't need whitespace between tokens. If you do try adding features to your implementation, document them in a README file and submit that with your source code.