Now that we have seen all the elements of the compiler, let us examine
an example of compiled code to see how things fit together. We will
compile the
declaration
of a recursive
factorial
function
by passing as first argument to
compile
the result of applying
parse to
a string representation of the program
(here using
back quotes
`$\ldots$`, which work like
single and double quotation marks
but allow the string to span multiple lines):
compile(parse(`
function factorial(n) {
return n === 1
? 1
: factorial(n - 1) * n;
}
`),
"val",
"next");
We have specified that the value of the
declaration
should be placed in the val register.
We don't care what the compiled
code does after executing the
declaration,
so our choice of
"next"
as the linkage
descriptor is arbitrary.
The function compile
determines that it was given a function declaration, so it transforms it
to a constant declaration and then
calls
compile_declaration. This compiles
code to compute the value to be assigned (targeted to
val), followed by code to install the
declaration,
followed by code to put the value of the
declaration (which is the value
undefined)
into the target register, followed finally by the linkage code.
The env register
is preserved around the computation of the
value, because it is needed in order to install the
declaration.
Because
the linkage is
"next",
there is no linkage code
in this case. The skeleton of the compiled code is thus
$\langle{}$save env if modified by code to compute value$\rangle$
$\langle{}$compilation of declaration value, target val, linkage "next"$\rangle$
$\langle{}$restore env if saved above$\rangle$
perform(list(op("assign_symbol_value"),
constant("factorial"),
reg("val"),
reg("env"))),
assign("val", constant(undefined))
The expression that is
compiled to produce the value for the
name
factorial
is a
lambda
expression whose value is the
function
that computes factorials.
The function
compile
handles this
by calling
compile_lambda_expression,
which compiles the
function
body, labels it as a new entry point, and generates the instruction that
will combine the
function
body at the new entry point with the runtime environment and assign the
result to val. The sequence then skips around
the compiled
function
code, which is inserted at this point. The
function
code itself begins by extending the
function's declaration
environment by a frame that binds the
parameter n to the
function
argument. Then comes the actual
function
body. Since this code for the value of the
name
doesn't modify the env register, the
optional save
and restore shown above aren't
generated. (The
function
code at
entry1
isn't executed at this point,
so its use of env is irrelevant.)
Therefore, the skeleton for the compiled code becomes
$\texttt{ }\texttt{ }$assign("val", list(op("make_compiled_function"),
label("entry1"),
reg("env"))),
go_to(label("after_lambda2")),
"entry1",
assign("env", list(op("compiled_function_env"), reg("fun"))),
assign("env", list(op("extend_environment"),
constant(list("n")),
reg("argl"),
reg("env"))),
$\langle{}$compilation of function body$\rangle$
"after_lambda2",
perform(list(op("assign_symbol_value"),
constant("factorial"),
reg("val"),
reg("env"))),
assign("val", constant(undefined))
A
function
body is always compiled (by
compile_lambda_body)
with target val and linkage
"next".
The
body
in this case consists of
a single
return statement:[1]
return n === 1
? 1
: factorial(n - 1) * n;
The function
compile_return_statement
generates code to revert the stack using the marker and to restore
the continue register,
and then compiles the return
expression with target val and linkage
"return", because
its value is to be returned from the function.
The return expression is a conditional expression,
for which
compile_conditional
generates code that first computes the predicate (targeted to
val), then checks the result and branches
around the true branch if the predicate is false.
Registers env
and continue
are preserved around the predicate code, since they may be needed for the
rest of the
conditional
expression.
The
true and false branches are both
compiled with target val and linkage
"return".
(That is, the value of the conditional,
which is the value computed by either of its branches, is the value of the
function.)
$\texttt{ }\texttt{ }$revert_stack_to_marker(),
restore("continue"),
$\langle{}$save continue, env if modified by predicate and needed by branches$\rangle$
$\langle{}$compilation of predicate, target val, linkage "next"$\rangle$
$\langle{}$restore continue, env if saved above$\rangle$
test(list(op("is_falsy"), reg("val"))),
branch(label("false_branch4")),
"true_branch3",
$\langle{}$compilation of true branch, target val, linkage "return"$\rangle$
"false_branch4",
$\langle{}$compilation of false branch, target val, linkage "return"$\rangle$
"after_cond5",
The predicate
n === 1
is a
function application (after transformation of the
operator combination).
This looks up the
function expression
(the symbol "===")
and places this value in
fun.
It then assembles the arguments 1 and the value
of n into argl.
Then it tests whether
fun
contains a primitive or a compound
function,
and dispatches to a primitive branch or a compound branch accordingly.
Both branches resume at the
after_call
label.
The compound branch must set up
continue to jump past the primitive
branch and push a marker to the stack to match the revert
operation in the compiled return statement of the function.
The requirements to preserve registers around the evaluation of the
function and argument expressions
don't result in
any saving of registers, because in this case those evaluations don't
modify the registers in question.
$\texttt{ }\texttt{ }$assign("fun", list(op("lookup_symbol_value"),
constant("==="), reg("env"))),
assign("val", constant(1)),
assign("argl", list(op("list"), reg("val"))),
assign("val", list(op("lookup_symbol_value"),
constant("n"), reg("env"))),
assign("argl", list(op("pair"), reg("val"), reg("argl"))),
test(list(op("is_primitive_function"), reg("fun"))),
branch(label("primitive_branch6")),
"compiled_branch7",
assign("continue", label("after_call8")),
save("continue"),
push_marker_to_stack(),
assign("val", list(op("compiled_function_entry"), reg("fun"))),
go_to(reg("val")),
"primitive_branch6",
assign("val", list(op("apply_primitive_function"),
reg("fun"),
reg("argl"))),
"after_call8",
The true branch, which is the constant 1, compiles (with target
val and linkage
"return")
to
$\texttt{ }\texttt{ }$assign("val", constant(1)),
go_to(reg("continue")),
The code for the false branch is another
function
call, where the
function
is the value of the symbol
"*",
and the arguments
are n and the result of another
function
call (a call to factorial).
Each of these calls sets up
fun
and argl and its own primitive
and compound branches. Figure 5.22
shows the complete compilation of the
declaration
of the factorial
function.
Notice that the possible save and
restore of
continue and
env around the predicate, shown above,
are in fact generated, because these registers are modified by the
function
call in the predicate and needed for the
function
call and the
"return"
linkage in the branches.
// construct the function and skip over the code for the function body
assign("val", list(op("make_compiled_function"),
label("entry1"), reg("env"))),
go_to(label("after_lambda2")),
"entry1", // calls to $\texttt{factorial}$ will enter here
assign("env", list(op("compiled_function_env"), reg("fun"))),
assign("env", list(op("extend_environment"), constant(list("n")),
reg("argl"), reg("env"))),
// begin actual function body
revert_stack_to_marker(), // starts with a return statement
restore("continue"),
save("continue"), // preserve registers across predicate
save("env"),
// compute $\texttt{n === 1}$
assign("fun", list(op("lookup_symbol_value"), constant("==="), reg("env"))),
assign("val", constant(1)),
assign("argl", list(op("list"), reg("val"))),
assign("val", list(op("lookup_symbol_value"), constant("n"), reg("env"))),
assign("argl", list(op("pair"), reg("val"), reg("argl"))),
test(list(op("is_primitive_function"), reg("fun"))),
branch(label("primitive_branch6")),
"compiled_branch7",
assign("continue", label("after_call8")),
save("continue"),
push_marker_to_stack(),
assign("val", list(op("compiled_function_entry"), reg("fun"))),
go_to(reg("val")),
"primitive_branch6",
assign("val", list(op("apply_primitive_function"), reg("fun"), reg("argl"))),
"after_call8", // $\texttt{val}$ now contains result of $\texttt{n === 1}$
restore("env"),
restore("continue"),
test(list(op("is_falsy"), reg("val"))),
branch(label("false_branch4")),
"true_branch3", // return 1
assign("val", constant(1)),
go_to(reg("continue")),
"false_branch4",
// compute and return $\texttt{factorial(n - 1) * n}$
assign("fun", list(op("lookup_symbol_value"), constant("*"), reg("env"))),
save("continue"),
save("fun"), // save $\texttt{*}$ function
assign("val", list(op("lookup_symbol_value"), constant("n"), reg("env"))),
assign("argl", list(op("list"), reg("val"))),
save("argl"), // save partial argument list for $\texttt{*}$
// compute $\texttt{factorial(n - 1)}$ which is the other argument for $\texttt{*}$
assign("fun", list(op("lookup_symbol_value"),
constant("factorial"), reg("env"))),
save("fun"), // save $\texttt{factorial}$ function
Figure 5.22
Compilation of the declaration of the factorial
function
(continued on next page).
// compute $\texttt{n - 1}$ which is the argument for $\texttt{factorial}$
assign("fun", list(op("lookup_symbol_value"), constant("-"), reg("env"))),
assign("val", constant(1)),
assign("argl", list(op("list"), reg("val"))),
assign("val", list(op("lookup_symbol_value"), constant("n"), reg("env"))),
assign("argl", list(op("pair"), reg("val"), reg("argl"))),
test(list(op("is_primitive_function"), reg("fun"))),
branch(label("primitive_branch10")),
"compiled_branch11",
assign("continue", label("after_call12")),
save("continue"),
push_marker_to_stack(),
assign("val", list(op("compiled_function_entry"), reg("fun"))),
go_to(reg("val")),
"primitive_branch10",
assign("val", list(op("apply_primitive_function"), reg("fun"), reg("argl"))),
"after_call12", // $\texttt{val}$ now contains result of $\texttt{n - 1}$
assign("argl", list(op("list"), reg("val"))),
restore("fun"), // restore $\texttt{factorial}$
// apply $\texttt{factorial}$
test(list(op("is_primitive_function"), reg("fun"))),
branch(label("primitive_branch14")),
"compiled_branch15",
assign("continue", label("after_call16")),
save("continue"), // set up for compiled function $-$
push_marker_to_stack(), // return in function will restore stack
assign("val", list(op("compiled_function_entry"), reg("fun"))),
go_to(reg("val")),
"primitive_branch14",
assign("val", list(op("apply_primitive_function"), reg("fun"), reg("argl"))),
"after_call16", // $\texttt{val}$ now contains result of $\texttt{factorial(n - 1)}$
restore("argl"), // restore partial argument list for $\texttt{*}$
assign("argl", list(op("pair"), reg("val"), reg("argl"))),
restore("fun"), // restore $\texttt{*}$
restore("continue"),
// apply $\texttt{*}$ and return its value
test(list(op("is_primitive_function"), reg("fun"))),
branch(label("primitive_branch18")),
"compiled_branch19", // note that a compound function here is called tail-recursively
save("continue"),
push_marker_to_stack(),
assign("val", list(op("compiled_function_entry"), reg("fun"))),
go_to(reg("val")),
"primitive_branch18",
assign("val", list(op("apply_primitive_function"), reg("fun"), reg("argl"))),
go_to(reg("continue")),
"after_call20",
"after_cond5",
"after_lambda2",
// assign the function to the name $\texttt{factorial}$
perform(list(op("assign_symbol_value"),
constant("factorial"), reg("val"), reg("env"))),
assign("val", constant(undefined))
Exercise 5.35
Consider the following declaration of a factorial
function,
which is slightly different from the one given above:
function factorial_alt(n) {
return n === 1
? 1
: n * factorial_alt(n - 1);
}
Compile this
function
and compare the resulting code with that produced for
factorial. Explain any differences you find.
Does either program execute more efficiently than the other?
There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.
function factorial(n) {
function iter(product, counter) {
return counter > n
? product
: iter(product * counter, counter + 1);
}
return iter(1, 1);
}
Annotate the resulting code, showing the essential difference between
the code for iterative and recursive versions of
factorial that makes one process build up
stack space and the other run in constant stack space.
There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.
Exercise 5.37
What
program
was compiled to produce the code shown in
figure 5.24?
There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.
Exercise 5.38
What
order of evaluation does our compiler produce for
arguments of an application?
Is it left-to-right (as mandated by the ECMAScript specification), right-to-left, or some other order?
Where in the compiler is this order determined? Modify the compiler
so that it produces some other order of evaluation. (See the
discussion of order of evaluation for the explicit-control evaluator
in section 5.4.1.) How does changing the
order of
argument
evaluation affect the efficiency of the code that
constructs the argument list?
There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.
Exercise 5.39
One way to understand the compiler's
preserving mechanism for
optimizing stack usage is to see what extra operations would
be generated if we did not use this idea. Modify
preserving so
that it always generates the save and
restore operations.
Compile some simple expressions and identify the unnecessary stack
operations that are generated.
Compare the code to that generated with the
preserving mechanism intact.
There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.
Exercise 5.40
Our compiler is clever about avoiding unnecessary stack operations,
but it is not clever at all when it comes to compiling calls to the primitive
functions
of the language in terms of the primitive operations
supplied by the machine. For example, consider how much code is
compiled to compute
a + 1:
The code sets up an argument list in argl, puts
the primitive addition
function
(which it finds by looking up the symbol
"+"
in the environment) into
fun,
and tests whether the
function
is primitive or compound. The
compiler always generates code to perform the test, as well as code
for primitive and compound branches (only one of which will be executed).
We have not shown the part of the controller that implements
primitives, but we presume that these instructions make use of
primitive arithmetic operations in the machine's data paths. Consider
how much less code would be generated if the compiler could
open-code primitives—that is, if it could generate code to
directly use these primitive machine operations. The expression
a + 1
might be compiled into something as simple as[2]
In this exercise we will extend our compiler to support open coding of
selected primitives. Special-purpose code will be generated for calls to these primitive
functions
instead of the general
function-application
code. In order to support this, we will augment
our machine with special argument registers
arg1 and arg2.
The primitive arithmetic operations of the machine will take their
inputs from arg1 and
arg2. The results may be put into
val, arg1, or
arg2.
The compiler must be able to recognize the application of an
open-coded primitive in the source program. We will augment the
dispatch in the compile
function
to recognize the names of these primitives in addition to the
syntactic forms it currently recognizes.
For each
syntactic
form our compiler has a code
generator. In this exercise we will construct a family of code generators
for the open-coded primitives.
The open-coded primitives, unlike the
syntactic
forms, all need their
argument expressions
evaluated. Write a code generator
spread_arguments
for use by all the open-coding code generators.
The function
spread_arguments
should take
a list of argument expressions
and compile the given
argument expressions
targeted to
successive argument registers. Note that an
argument expression
may contain a call
to an open-coded primitive, so argument registers will have to be
preserved during
argument-expression
evaluation.
The JavaScript operators
===,
*,
-, and +,
among others, are implemented in the register machine as
primitive functions and are referred to in the global environment
with the symbols
"===",
"*",
"-", and
"+". In JavaScript, it is
not possible to redeclare these names, because they do not
meet the syntactic restrictions for names. This means it is safe
to open-code them.
For each of the primitive
functions
===,
*,
-, and +,
write a code generator that takes
an application with a function expression that
names that function,
together with a target and a linkage descriptor, and
produces code to spread the arguments into the registers and then
perform the operation targeted to the given target with the given
linkage.
Make compile dispatch to these code
generators.
Try your new compiler on the factorial
example. Compare the resulting code with the result produced without
open coding.
There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.
[1]
Because of the append_return_undefined in
compile_lambda_body, the body actually
consists of a sequence with two return statements. However, the dead-code check
in compile_sequence will stop after the compilation
of the first return statement,
so the body effectively consists of only a single return statement.
[2]
We have used
the same symbol + here to denote both the
source-language
function
and the machine operation. In general there will not be a
one-to-one correspondence between primitives of the source language
and primitives of the machine.