One of the most common optimizations performed by compilers is the
optimization of
name
lookup. Our compiler, as we have implemented it so far, generates code that
uses the
lookup_symbol_value
operation of the evaluator machine.
This searches for a
name
by comparing
it
with each
name
that is currently bound, working frame
by frame outward through the runtime environment. This search can be
expensive if the frames are deeply nested or if there are many
names.
For example, consider the problem of looking up the value
of x while evaluating the expression
x * y * z
in an application of the
function of five arguments
that is returned by
((x, y) =>
(a, b, c, d, e) =>
((y, z) => x * y * z)(a * b * x, c + d + x))(3, 4)
Each time
lookup_symbol_value
searches for x, it must determine that
the symbol
"x" is not
equal to
"y" or
"z" (in the first frame), nor to
"a",
"b",
"c",
"d", or
"e" (in the second frame).
Because our language is
lexically scoped, the runtime environment for any
component
will have a
structure that parallels the lexical structure of the program in which
the
component
appears.
Thus, the compiler can know, when it analyzes the
above expression,
that each time the
function
is applied the
binding for
x
in
x * y * z
will be found two frames out from the
current frame and will be the first
binding
in that frame.
We can exploit this fact by inventing a new kind of
name-lookup
operation,
lexical_address_lookup,
that takes as arguments an environment and a
lexical address that
consists of two numbers: a frame number, which specifies how many
frames to pass over, and a displacement number, which specifies
how many
bindings
to pass over in that frame.
The operation
lexical_address_lookup
will produce the value of the
name
stored at that lexical address
relative to the current environment. If we add the
lexical_address_lookup
operation to our machine, we can make the compiler generate code that
references
names
using this operation, rather than
lookup_symbol_value.
Similarly, our compiled code can use a new
lexical_address_assign
operation instead of
assign_symbol_value.
With lexical addressing, there is no need to include any
symbolic references to names in the object code,
and frames do not need to include symbols at run time.
In order to generate such code, the compiler must be able to determine
the lexical address of a
name
it is about to compile a reference
to. The lexical address of a
name
in a program depends on where
one is in the code. For example, in the following program, the
address of x in expression
$e_1$ is (2,0)—two frames back
and the first
name
in the frame. At that point
y is at
address (0,0) and c is at address (1,2).
In expression
$e_2$,
x is at (1,0),
y is at (1,1), and
c is at (0,2).
((x, y) =>
(a, b, c, d, e) =>
((y, z) => $e_1$)($e_2$, c + d + x))(3, 4);
One way for the compiler to produce code that uses lexical addressing
is to maintain a data structure called a
compile-time environment. This keeps track of which
bindings
will be at which
positions in which frames in the runtime environment when a
particular
name-access
operation is executed. The compile-time
environment is a list of frames, each containing a list of
symbols.
There will be no values associated with the symbols,
since values are not computed at compile time.
(Exercise 5.46
will change this, as an optimization for constants.)
The compile-time
environment becomes an additional argument to
compile and is
passed along to each code generator. The top-level call to
compile uses
a compile-time-environment that includes the names of all primitive
functions and primitive values.
When
the body of a lambda expression
is compiled,
compile_lambda_body
extends the compile-time environment by a frame containing the
function's
parameters, so that the
body is compiled with that extended environment.
Similarly, when
the body of a block
is compiled,
compile_block
extends the compile-time environment by a frame containing the
scanned-out local names of the body.
At each point in the compilation,
compile_name
and
compile_assignment_declaration
use the compile-time
environment in order to generate the appropriate lexical addresses.
Exercises 5.41
through 5.44 describe how to
complete this sketch of the lexical-addressing strategy in order to
incorporate lexical lookup into the compiler.
Exercises 5.45
and 5.46
describe other uses for the compile-time environment.
Exercise 5.41
Write a
function
lexical_address_lookup
that implements the new lookup operation. It should take two
arguments—a lexical address and a runtime environment—and
return the value of the
name
stored at the specified lexical address.
The function
lexical_address_lookup
should signal an error if the value
of the name
is the
string "*unassigned*".
Also write a
function
lexical_address_assign
that implements the operation that changes the value
of the name
at a specified lexical address.
There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.
Exercise 5.42
Modify the compiler to maintain the
compile-time environment as
described above. That is, add a compile-time-environment argument to
compile and the various code generators, and
extend it in
compile_lambda_body and
compile_block.
There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.
Exercise 5.43
Write a
function
find_symbol
that takes as arguments a
symbol
and a
compile-time environment and
returns the lexical address of the
symbol
with respect to that
environment. For example, in the program fragment that is shown above, the
compile-time environment during the compilation of expression
$e_1$ is
There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.
Exercise 5.44
Using
find_symbol
from exercise 5.43,
rewrite
compile_assignment_declaration
and
compile_name
to output lexical-address instructions.
In cases where find_symbol
returns "not found"
(that is, where the name is not in the compile-time environment),
you should report a compile-time error.
Test the modified compiler on a few simple cases, such as the nested
lambda
combination at the beginning of this section.
There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.
Exercise 5.45
In JavaScript, an attempt to assign a new value to a name that is declared
as a
constant leads to an error.
Exercise 4.11 shows how to
detect such errors at run time. With the techniques presented in this
section, we can detect attempts to assign a new value to a constant
at compile time. For this purpose, extend the functions
compile_lambda_body and
compile_block
to record in the compile-time environment whether a name is declared as a variable (using
let or as a parameter), or as
a constant (using
const
or
function).
Modify compile_assignment
to report an appropriate error when it detects an
assignment to a constant.
There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.
Exercise 5.46
Knowledge about constants at compile time opens the door to many
optimizations that allow us to generate more efficient object code. In
addition to the extension of the
compile-time environment in
exercise 5.45 to indicate names
declared as constants, we may store the
value of a constant if it is known at compile time, or other information
that can help us optimize the code.
A constant declaration such as const$name$=$literal$; allows us
to replace all occurrences of $name$ within the scope of
the declaration by $literal$ so that $name$
doesn't have to be looked up in the runtime environment. This optimization is
called constant propagation. Use an extended compile-time
environment to store literal constants, and modify
compile_name to use the stored
constant in the generated assign
instruction instead of the
lookup_symbol_value operation.
Function declaration is a derived component that expands to
constant declaration. Let us assume that the names of primitive functions
in the global environment are also considered constants.
If we further extend our compile-time
environment to keep track of which names refer to compiled
functions and which ones to primitive functions, we can move
the test that checks whether a function is compiled or primitive
from run time to compile time. This makes the object code more
efficient because it replaces a test that must be performed once per
function application in the generated code by one that is performed
by the compiler. Using such an extended compile-time environment,
modify compile_function_call
so that if it can be determined at
compile time whether the called function is compiled or primitive,
only the instructions in the
compiled_branch or the
primitive_branch
are generated.
Replacing constant names with their literal values as in part (a)
paves the way for another optimization, namely replacing
applications of primitive functions to literal values with the
compile-time computed result. This optimization, called
constant folding, replaces expressions such as
40 + 2 by
42 by performing the addition
in the compiler. Extend the compiler to perform constant folding for
arithmetic operations on numbers and for string concatenation.
There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.