5.5.6 Lexical Addressing - SICP Comparison Edition

One of the most common optimizations performed by compilers is the optimization of name lookup. Our compiler, as we have implemented it so far, generates code that uses the lookup_symbol_value operation of the evaluator machine. This searches for a name by comparing it with each name that is currently bound, working frame by frame outward through the runtime environment. This search can be expensive if the frames are deeply nested or if there are many names. For example, consider the problem of looking up the value of x while evaluating the expression x * y * z in an application of the function of five arguments that is returned by

((x, y) =>
   (a, b, c, d, e) =>
     ((y, z) => x * y * z)(a * b * x, c + d + x))(3, 4)

Each time lookup_symbol_value searches for x, it must determine that the symbol "x" is not equal to "y" or "z" (in the first frame), nor to "a", "b", "c", "d", or "e" (in the second frame). Because our language is lexically scoped, the runtime environment for any component will have a structure that parallels the lexical structure of the program in which the component appears. Thus, the compiler can know, when it analyzes the above expression, that each time the function is applied the binding for x in x * y * z will be found two frames out from the current frame and will be the first binding in that frame.

We can exploit this fact by inventing a new kind of name-lookup operation, lexical_address_lookup, that takes as arguments an environment and a lexical address that consists of two numbers: a frame number, which specifies how many frames to pass over, and a displacement number, which specifies how many bindings to pass over in that frame. The operation lexical_address_lookup will produce the value of the name stored at that lexical address relative to the current environment. If we add the lexical_address_lookup operation to our machine, we can make the compiler generate code that references names using this operation, rather than lookup_symbol_value. Similarly, our compiled code can use a new lexical_address_assign operation instead of assign_symbol_value. With lexical addressing, there is no need to include any symbolic references to names in the object code, and frames do not need to include symbols at run time.

In order to generate such code, the compiler must be able to determine the lexical address of a name it is about to compile a reference to. The lexical address of a name in a program depends on where one is in the code. For example, in the following program, the address of x in expression $e_1$ is (2,0)—two frames back and the first name in the frame. At that point y is at address (0,0) and c is at address (1,2). In expression $e_2$, x is at (1,0), y is at (1,1), and c is at (0,2). ((x, y) => (a, b, c, d, e) => ((y, z) => $e_1$)($e_2$, c + d + x))(3, 4);

One way for the compiler to produce code that uses lexical addressing is to maintain a data structure called a compile-time environment. This keeps track of which bindings will be at which positions in which frames in the runtime environment when a particular name-access operation is executed. The compile-time environment is a list of frames, each containing a list of symbols. There will be no values associated with the symbols, since values are not computed at compile time. (Exercise 5.46 will change this, as an optimization for constants.) The compile-time environment becomes an additional argument to compile and is passed along to each code generator. The top-level call to compile uses a compile-time-environment that includes the names of all primitive functions and primitive values. When the body of a lambda expression is compiled, compile_lambda_body extends the compile-time environment by a frame containing the function's parameters, so that the body is compiled with that extended environment. Similarly, when the body of a block is compiled, compile_block extends the compile-time environment by a frame containing the scanned-out local names of the body. At each point in the compilation, compile_name and compile_assignment_declaration use the compile-time environment in order to generate the appropriate lexical addresses.

Exercises 5.41 through 5.44 describe how to complete this sketch of the lexical-addressing strategy in order to incorporate lexical lookup into the compiler. Exercises 5.45 and 5.46 describe other uses for the compile-time environment.

Exercise 5.41 Write a function lexical_address_lookup that implements the new lookup operation. It should take two arguments—a lexical address and a runtime environment—and return the value of the name stored at the specified lexical address. The function lexical_address_lookup should signal an error if the value of the name is the string "*unassigned*". Also write a function lexical_address_assign that implements the operation that changes the value of the name at a specified lexical address.

There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.

Exercise 5.42 Modify the compiler to maintain the compile-time environment as described above. That is, add a compile-time-environment argument to compile and the various code generators, and extend it in compile_lambda_body and compile_block.

There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.

Exercise 5.43 Write a function find_symbol that takes as arguments a symbol and a compile-time environment and returns the lexical address of the symbol with respect to that environment. For example, in the program fragment that is shown above, the compile-time environment during the compilation of expression $e_1$ is

list(list("y", "z"),
     list("a", "b", "c", "d", "e"),
     list("x", "y"))

The function find_symbol should produce

find_symbol("c", list(list("y", "z"),
                      list("a", "b", "c", "d", "e"),
                      list("x", "y")));

list(1, 2)

find_symbol("x", list(list("y", "z"),
                      list("a", "b", "c", "d", "e"),
                      list("x", "y")));

list(2, 0)

find_symbol("w", list(list("y", "z"),
                      list("a", "b", "c", "d", "e"),
                      list("x", "y")));

"not found"

There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.

Exercise 5.44 Using find_symbol from exercise 5.43, rewrite compile_assignment_declaration and compile_name to output lexical-address instructions. In cases where find_symbol returns "not found" (that is, where the name is not in the compile-time environment), you should report a compile-time error. Test the modified compiler on a few simple cases, such as the nested lambda combination at the beginning of this section.

There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.

Exercise 5.45 In JavaScript, an attempt to assign a new value to a name that is declared as a constant leads to an error. Exercise 4.11 shows how to detect such errors at run time. With the techniques presented in this section, we can detect attempts to assign a new value to a constant at compile time. For this purpose, extend the functions compile_lambda_body and compile_block to record in the compile-time environment whether a name is declared as a variable (using let or as a parameter), or as a constant (using const or function). Modify compile_assignment to report an appropriate error when it detects an assignment to a constant.

There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.

Exercise 5.46 Knowledge about constants at compile time opens the door to many optimizations that allow us to generate more efficient object code. In addition to the extension of the compile-time environment in exercise 5.45 to indicate names declared as constants, we may store the value of a constant if it is known at compile time, or other information that can help us optimize the code.

A constant declaration such as const $name$ = $literal$; allows us to replace all occurrences of $name$ within the scope of the declaration by $literal$ so that $name$ doesn't have to be looked up in the runtime environment. This optimization is called constant propagation. Use an extended compile-time environment to store literal constants, and modify compile_name to use the stored constant in the generated assign instruction instead of the lookup_symbol_value operation.
Function declaration is a derived component that expands to constant declaration. Let us assume that the names of primitive functions in the global environment are also considered constants. If we further extend our compile-time environment to keep track of which names refer to compiled functions and which ones to primitive functions, we can move the test that checks whether a function is compiled or primitive from run time to compile time. This makes the object code more efficient because it replaces a test that must be performed once per function application in the generated code by one that is performed by the compiler. Using such an extended compile-time environment, modify compile_function_call so that if it can be determined at compile time whether the called function is compiled or primitive, only the instructions in the compiled_branch or the primitive_branch are generated.
Replacing constant names with their literal values as in part (a) paves the way for another optimization, namely replacing applications of primitive functions to literal values with the compile-time computed result. This optimization, called constant folding, replaces expressions such as 40 + 2 by 42 by performing the addition in the compiler. Extend the compiler to perform constant folding for arithmetic operations on numbers and for string concatenation.

There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.