5.5.5 An Example of Compiled Code - SICP Comparison Edition

Permalink copied!

Now that we have seen all the elements of the compiler, let us examine an example of compiled code to see how things fit together. We will compile the declaration of a recursive factorial function by passing as first argument to compile the result of applying parse to a string representation of the program (here using back quotes `$\ldots$`, which work like single and double quotation marks but allow the string to span multiple lines):

compile(parse(`
function factorial(n) {
    return n === 1
           ? 1
           : factorial(n - 1) * n;
}
              `),
        "val",
        "next");

We have specified that the value of the declaration should be placed in the val register. We don't care what the compiled code does after executing the declaration, so our choice of "next" as the linkage descriptor is arbitrary.

The function compile determines that it was given a function declaration, so it transforms it to a constant declaration and then calls compile_declaration. This compiles code to compute the value to be assigned (targeted to val), followed by code to install the declaration, followed by code to put the value of the declaration (which is the value undefined) into the target register, followed finally by the linkage code. The env register is preserved around the computation of the value, because it is needed in order to install the declaration. Because the linkage is "next", there is no linkage code in this case. The skeleton of the compiled code is thus $\langle{}$save env if modified by code to compute value$\rangle$ $\langle{}$compilation of declaration value, target val, linkage "next"$\rangle$ $\langle{}$restore env if saved above$\rangle$ perform(list(op("assign_symbol_value"), constant("factorial"), reg("val"), reg("env"))), assign("val", constant(undefined))

The expression that is compiled to produce the value for the name factorial is a lambda expression whose value is the function that computes factorials. The function compile handles this by calling compile_lambda_expression, which compiles the function body, labels it as a new entry point, and generates the instruction that will combine the function body at the new entry point with the runtime environment and assign the result to val. The sequence then skips around the compiled function code, which is inserted at this point. The function code itself begins by extending the function's declaration environment by a frame that binds the parameter n to the function argument. Then comes the actual function body. Since this code for the value of the name doesn't modify the env register, the optional save and restore shown above aren't generated. (The function code at entry1 isn't executed at this point, so its use of env is irrelevant.) Therefore, the skeleton for the compiled code becomes $\texttt{ }\texttt{ }$assign("val", list(op("make_compiled_function"), label("entry1"), reg("env"))), go_to(label("after_lambda2")), "entry1", assign("env", list(op("compiled_function_env"), reg("fun"))), assign("env", list(op("extend_environment"), constant(list("n")), reg("argl"), reg("env"))), $\langle{}$compilation of function body$\rangle$ "after_lambda2", perform(list(op("assign_symbol_value"), constant("factorial"), reg("val"), reg("env"))), assign("val", constant(undefined))

A function body is always compiled (by compile_lambda_body) with target val and linkage "next". The body in this case consists of a single return statement:[1]

return n === 1
       ? 1
       : factorial(n - 1) * n;

The function compile_return_statement generates code to revert the stack using the marker and to restore the continue register, and then compiles the return expression with target val and linkage "return", because its value is to be returned from the function. The return expression is a conditional expression, for which compile_conditional generates code that first computes the predicate (targeted to val), then checks the result and branches around the true branch if the predicate is false. Registers env and continue are preserved around the predicate code, since they may be needed for the rest of the conditional expression. The true and false branches are both compiled with target val and linkage "return". (That is, the value of the conditional, which is the value computed by either of its branches, is the value of the function.) $\texttt{ }\texttt{ }$revert_stack_to_marker(), restore("continue"), $\langle{}$save continue, env if modified by predicate and needed by branches$\rangle$ $\langle{}$compilation of predicate, target val, linkage "next"$\rangle$ $\langle{}$restore continue, env if saved above$\rangle$ test(list(op("is_falsy"), reg("val"))), branch(label("false_branch4")), "true_branch3", $\langle{}$compilation of true branch, target val, linkage "return"$\rangle$ "false_branch4", $\langle{}$compilation of false branch, target val, linkage "return"$\rangle$ "after_cond5",

The predicate n === 1 is a function application (after transformation of the operator combination). This looks up the function expression (the symbol "===") and places this value in fun. It then assembles the arguments 1 and the value of n into argl. Then it tests whether fun contains a primitive or a compound function, and dispatches to a primitive branch or a compound branch accordingly. Both branches resume at the after_call label. The compound branch must set up continue to jump past the primitive branch and push a marker to the stack to match the revert operation in the compiled return statement of the function. The requirements to preserve registers around the evaluation of the function and argument expressions don't result in any saving of registers, because in this case those evaluations don't modify the registers in question. $\texttt{ }\texttt{ }$assign("fun", list(op("lookup_symbol_value"), constant("==="), reg("env"))), assign("val", constant(1)), assign("argl", list(op("list"), reg("val"))), assign("val", list(op("lookup_symbol_value"), constant("n"), reg("env"))), assign("argl", list(op("pair"), reg("val"), reg("argl"))), test(list(op("is_primitive_function"), reg("fun"))), branch(label("primitive_branch6")), "compiled_branch7", assign("continue", label("after_call8")), save("continue"), push_marker_to_stack(), assign("val", list(op("compiled_function_entry"), reg("fun"))), go_to(reg("val")), "primitive_branch6", assign("val", list(op("apply_primitive_function"), reg("fun"), reg("argl"))), "after_call8",

The true branch, which is the constant 1, compiles (with target val and linkage "return") to $\texttt{ }\texttt{ }$assign("val", constant(1)), go_to(reg("continue")), The code for the false branch is another function call, where the function is the value of the symbol "*", and the arguments are n and the result of another function call (a call to factorial). Each of these calls sets up fun and argl and its own primitive and compound branches. Figure 5.22 shows the complete compilation of the declaration of the factorial function. Notice that the possible save and restore of continue and env around the predicate, shown above, are in fact generated, because these registers are modified by the function call in the predicate and needed for the function call and the "return" linkage in the branches.

// construct the function and skip over the code for the function body assign("val", list(op("make_compiled_function"), label("entry1"), reg("env"))), go_to(label("after_lambda2")), "entry1", // calls to $\texttt{factorial}$ will enter here assign("env", list(op("compiled_function_env"), reg("fun"))), assign("env", list(op("extend_environment"), constant(list("n")), reg("argl"), reg("env"))), // begin actual function body revert_stack_to_marker(), // starts with a return statement restore("continue"), save("continue"), // preserve registers across predicate save("env"), // compute $\texttt{n === 1}$ assign("fun", list(op("lookup_symbol_value"), constant("==="), reg("env"))), assign("val", constant(1)), assign("argl", list(op("list"), reg("val"))), assign("val", list(op("lookup_symbol_value"), constant("n"), reg("env"))), assign("argl", list(op("pair"), reg("val"), reg("argl"))), test(list(op("is_primitive_function"), reg("fun"))), branch(label("primitive_branch6")), "compiled_branch7", assign("continue", label("after_call8")), save("continue"), push_marker_to_stack(), assign("val", list(op("compiled_function_entry"), reg("fun"))), go_to(reg("val")), "primitive_branch6", assign("val", list(op("apply_primitive_function"), reg("fun"), reg("argl"))), "after_call8", // $\texttt{val}$ now contains result of $\texttt{n === 1}$ restore("env"), restore("continue"), test(list(op("is_falsy"), reg("val"))), branch(label("false_branch4")), "true_branch3", // return 1 assign("val", constant(1)), go_to(reg("continue")), "false_branch4", // compute and return $\texttt{factorial(n - 1) * n}$ assign("fun", list(op("lookup_symbol_value"), constant("*"), reg("env"))), save("continue"), save("fun"), // save $\texttt{*}$ function assign("val", list(op("lookup_symbol_value"), constant("n"), reg("env"))), assign("argl", list(op("list"), reg("val"))), save("argl"), // save partial argument list for $\texttt{*}$ // compute $\texttt{factorial(n - 1)}$ which is the other argument for $\texttt{*}$ assign("fun", list(op("lookup_symbol_value"), constant("factorial"), reg("env"))), save("fun"), // save $\texttt{factorial}$ function

Figure 5.22 Compilation of the declaration of the factorial function (continued on next page).

// compute $\texttt{n - 1}$ which is the argument for $\texttt{factorial}$ assign("fun", list(op("lookup_symbol_value"), constant("-"), reg("env"))), assign("val", constant(1)), assign("argl", list(op("list"), reg("val"))), assign("val", list(op("lookup_symbol_value"), constant("n"), reg("env"))), assign("argl", list(op("pair"), reg("val"), reg("argl"))), test(list(op("is_primitive_function"), reg("fun"))), branch(label("primitive_branch10")), "compiled_branch11", assign("continue", label("after_call12")), save("continue"), push_marker_to_stack(), assign("val", list(op("compiled_function_entry"), reg("fun"))), go_to(reg("val")), "primitive_branch10", assign("val", list(op("apply_primitive_function"), reg("fun"), reg("argl"))), "after_call12", // $\texttt{val}$ now contains result of $\texttt{n - 1}$ assign("argl", list(op("list"), reg("val"))), restore("fun"), // restore $\texttt{factorial}$ // apply $\texttt{factorial}$ test(list(op("is_primitive_function"), reg("fun"))), branch(label("primitive_branch14")), "compiled_branch15", assign("continue", label("after_call16")), save("continue"), // set up for compiled function $-$ push_marker_to_stack(), // return in function will restore stack assign("val", list(op("compiled_function_entry"), reg("fun"))), go_to(reg("val")), "primitive_branch14", assign("val", list(op("apply_primitive_function"), reg("fun"), reg("argl"))), "after_call16", // $\texttt{val}$ now contains result of $\texttt{factorial(n - 1)}$ restore("argl"), // restore partial argument list for $\texttt{*}$ assign("argl", list(op("pair"), reg("val"), reg("argl"))), restore("fun"), // restore $\texttt{*}$ restore("continue"), // apply $\texttt{*}$ and return its value test(list(op("is_primitive_function"), reg("fun"))), branch(label("primitive_branch18")), "compiled_branch19", // note that a compound function here is called tail-recursively save("continue"), push_marker_to_stack(), assign("val", list(op("compiled_function_entry"), reg("fun"))), go_to(reg("val")), "primitive_branch18", assign("val", list(op("apply_primitive_function"), reg("fun"), reg("argl"))), go_to(reg("continue")), "after_call20", "after_cond5", "after_lambda2", // assign the function to the name $\texttt{factorial}$ perform(list(op("assign_symbol_value"), constant("factorial"), reg("val"), reg("env"))), assign("val", constant(undefined))

Figure 5.23 (continued)

Exercise 5.35 Consider the following declaration of a factorial function, which is slightly different from the one given above:

function factorial_alt(n) {
    return n === 1
           ? 1
           : n * factorial_alt(n - 1);
}

Compile this function and compare the resulting code with that produced for factorial. Explain any differences you find. Does either program execute more efficiently than the other?

There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.

Exercise 5.36 Compile the iterative factorial function

function factorial(n) {
    function iter(product, counter) {
        return counter > n
               ? product
               : iter(product * counter, counter + 1);
    }
    return iter(1, 1);
}

Annotate the resulting code, showing the essential difference between the code for iterative and recursive versions of factorial that makes one process build up stack space and the other run in constant stack space.

Exercise 5.37 What program was compiled to produce the code shown in figure 5.24?

$\texttt{ }\texttt{ }$assign("val", list(op("make_compiled_function"), label("entry1"), reg("env"))), "entry1" assign("env", list(op("compiled_function_env"), reg("fun"))), assign("env", list(op("extend_environment"), constant(list("x")), reg("argl"), reg("env"))), revert_stack_to_marker(), restore("continue"), assign("fun", list(op("lookup_symbol_value"), constant("+"), reg("env"))), save("continue"), save("fun"), save("env"), assign("fun", list(op("lookup_symbol_value"), constant("g"), reg("env"))), save("fun"), assign("fun", list(op("lookup_symbol_value"), constant("+"), reg("env"))), assign("val", constant(2)), assign("argl", list(op("list"), reg("val"))), assign("val", list(op("lookup_symbol_value"), constant("x"), reg("env"))), assign("argl", list(op("pair"), reg("val"), reg("argl"))), test(list(op("is_primitive_function"), reg("fun"))), branch(label("primitive_branch3")), "compiled_branch4" assign("continue", label("after_call5")), save("continue"), push_marker_to_stack(), assign("val", list(op("compiled_function_entry"), reg("fun"))), go_to(reg("val")), "primitive_branch3", assign("val", list(op("apply_primitive_function"), reg("fun"), reg("argl"))), "after_call5", assign("argl", list(op("list"), reg("val"))), restore("fun"), test(list(op("is_primitive_function"), reg("fun"))), branch(label("primitive_branch7")), "compiled_branch8", assign("continue", label("after_call9")), save("continue"), push_marker_to_stack(), assign("val", list(op("compiled_function_entry"), reg("fun"))), go_to(reg("val")), "primitive_branch7", assign("val", list(op("apply_primitive_function"), reg("fun"), reg("argl"))), "after_call9", assign("argl", list(op("list"), reg("val"))), restore("env"), assign("val", list(op("lookup_symbol_value"), constant("x"), reg("env"))), assign("argl", list(op("pair"), reg("val"), reg("argl"))), restore("fun"), restore("continue"), test(list(op("is_primitive_function"), reg("fun"))), branch(label("primitive_branch11")),

Figure 5.24 An example of compiler output (continued on next page). See exercise 5.37.

"compiled_branch12", save("continue"), push_marker_to_stack(), assign("val", list(op("compiled_function_entry"), reg("fun"))), go_to(reg("val")), "primitive_branch11", assign("val", list(op("apply_primitive_function"), reg("fun"), reg("argl"))), go_to(reg("continue")), "after_call13", "after_lambda2", perform(list(op("assign_symbol_value"), constant("f"), reg("val"), reg("env"))), assign("val", constant(undefined))

Figure 5.25 (continued)

Exercise 5.38 What order of evaluation does our compiler produce for arguments of an application? Is it left-to-right (as mandated by the ECMAScript specification), right-to-left, or some other order? Where in the compiler is this order determined? Modify the compiler so that it produces some other order of evaluation. (See the discussion of order of evaluation for the explicit-control evaluator in section 5.4.1.) How does changing the order of argument evaluation affect the efficiency of the code that constructs the argument list?

Exercise 5.39 One way to understand the compiler's preserving mechanism for optimizing stack usage is to see what extra operations would be generated if we did not use this idea. Modify preserving so that it always generates the save and restore operations. Compile some simple expressions and identify the unnecessary stack operations that are generated. Compare the code to that generated with the preserving mechanism intact.

Exercise 5.40 Our compiler is clever about avoiding unnecessary stack operations, but it is not clever at all when it comes to compiling calls to the primitive functions of the language in terms of the primitive operations supplied by the machine. For example, consider how much code is compiled to compute a + 1: The code sets up an argument list in argl, puts the primitive addition function (which it finds by looking up the symbol "+" in the environment) into fun, and tests whether the function is primitive or compound. The compiler always generates code to perform the test, as well as code for primitive and compound branches (only one of which will be executed). We have not shown the part of the controller that implements primitives, but we presume that these instructions make use of primitive arithmetic operations in the machine's data paths. Consider how much less code would be generated if the compiler could open-code primitives—that is, if it could generate code to directly use these primitive machine operations. The expression a + 1 might be compiled into something as simple as[2]

assign("val", list(op("lookup_symbol_value"), constant("a"), reg("env"))),
assign("val", list(op("+"), reg("val"), constant(1)))

In this exercise we will extend our compiler to support open coding of selected primitives. Special-purpose code will be generated for calls to these primitive functions instead of the general function-application code. In order to support this, we will augment our machine with special argument registers arg1 and arg2. The primitive arithmetic operations of the machine will take their inputs from arg1 and arg2. The results may be put into val, arg1, or arg2.

The compiler must be able to recognize the application of an open-coded primitive in the source program. We will augment the dispatch in the compile function to recognize the names of these primitives in addition to the syntactic forms it currently recognizes. For each syntactic form our compiler has a code generator. In this exercise we will construct a family of code generators for the open-coded primitives.

The open-coded primitives, unlike the syntactic forms, all need their argument expressions evaluated. Write a code generator spread_arguments for use by all the open-coding code generators. The function spread_arguments should take a list of argument expressions and compile the given argument expressions targeted to successive argument registers. Note that an argument expression may contain a call to an open-coded primitive, so argument registers will have to be preserved during argument-expression evaluation.
The JavaScript operators ===, *, -, and +, among others, are implemented in the register machine as primitive functions and are referred to in the global environment with the symbols "===", "*", "-", and "+". In JavaScript, it is not possible to redeclare these names, because they do not meet the syntactic restrictions for names. This means it is safe to open-code them. For each of the primitive functions ===, *, -, and +, write a code generator that takes an application with a function expression that names that function, together with a target and a linkage descriptor, and produces code to spread the arguments into the registers and then perform the operation targeted to the given target with the given linkage. Make compile dispatch to these code generators.
Try your new compiler on the factorial example. Compare the resulting code with the result produced without open coding.

[1] Because of the append_return_undefined in compile_lambda_body, the body actually consists of a sequence with two return statements. However, the dead-code check in compile_sequence will stop after the compilation of the first return statement, so the body effectively consists of only a single return statement.

[2] We have used the same symbol + here to denote both the source-language function and the machine operation. In general there will not be a one-to-one correspondence between primitives of the source language and primitives of the machine.

< Previous

Next >

5.5.5 An Example of Compiled Code