In section 4.1.7 we modified our original metacircular interpreter to separate analysis from execution. We analyzed each component to produce an execution function that took an environment as argument and performed the required operations. In our compiler, we will do essentially the same analysis. Instead of producing execution functions, however, we will generate sequences of instructions to be run by our register machine.
The function compile is the top-level dispatch in the compiler. It corresponds to the evaluate function of section 4.1.1, the analyze function of section 4.1.7, and the eval_dispatch entry point of the explicit-control-evaluator in section 5.4.1. The compiler, like the interpreters, uses the component-syntax functions defined in section 4.1.2.[1] The function compile performs a case analysis on the syntactic type of the component to be compiled. For each type of component, it dispatches to a specialized code generator:
function compile(component, target, linkage) {
return is_literal(component)
? compile_literal(component, target, linkage)
: is_name(component)
? compile_name(component, target, linkage)
: is_application(component)
? compile_application(component, target, linkage)
: is_operator_combination(component)
? compile(operator_combination_to_application(component),
target, linkage)
: is_conditional(component)
? compile_conditional(component, target, linkage)
: is_lambda_expression(component)
? compile_lambda_expression(component, target, linkage)
: is_sequence(component)
? compile_sequence(sequence_statements(component),
target, linkage)
: is_block(component)
? compile_block(component, target, linkage)
: is_return_statement(component)
? compile_return_statement(component, target, linkage)
: is_function_declaration(component)
? compile(function_decl_to_constant_decl(component),
target, linkage)
: is_declaration(component)
? compile_declaration(component, target, linkage)
: is_assignment(component)
? compile_assignment(component, target, linkage)
: error(component, "unknown component type -- compile");
}The function compile and the code generators that it calls take two arguments in addition to the component to compile. There is a target, which specifies the register in which the compiled code is to return the value of the component. There is also a linkage descriptor, which describes how the code resulting from the compilation of the component should proceed when it has finished its execution. The linkage descriptor can require the code to do one of the following three things:
For example, compiling the literal 5 with a target of the val register and a linkage of "next" should produce the instruction
assign("val", constant(5))
assign("val", constant(5)),
go_to(reg("continue"))Each code generator returns an instruction sequence containing the object code it has generated for the component. Code generation for a compound component is accomplished by combining the output from simpler code generators for subcomponents, just as evaluation of a compound component is accomplished by evaluating the subcomponents.
The simplest method for combining instruction sequences is a function called append_instruction_sequences, which takes as arguments two instruction sequences that are to be executed sequentially. It appends them and returns the combined sequence. That is, if $seq_1$ and $seq_2$ are sequences of instructions, then evaluating append_instruction_sequences($seq$$_1$, $seq$$_2$) produces the sequence $seq$$_1$ $seq$$_2$
Whenever registers might need to be saved, the compiler's code generators use preserving, which is a more subtle method for combining instruction sequences. The function preserving takes three arguments: a set of registers and two instruction sequences that are to be executed sequentially. It appends the sequences in such a way that the contents of each register in the set is preserved over the execution of the first sequence, if this is needed for the execution of the second sequence. That is, if the first sequence modifies the register and the second sequence actually needs the register's original contents, then preserving wraps a save and a restore of the register around the first sequence before appending the sequences. Otherwise, preserving simply returns the appended instruction sequences. Thus, for example, preserving(list($reg$$_1$, $reg$$_2$), $seq$$_1$, $seq$$_2$) produces one of the following four sequences of instructions, depending on how $seq$$_1$ and $seq$$_2$ use $reg$$_1$ and $reg$$_2$:
By using preserving to combine instruction sequences the compiler avoids unnecessary stack operations. This also isolates the details of whether or not to generate save and restore instructions within the preserving function, separating them from the concerns that arise in writing each of the individual code generators. In fact no save or restore instructions are explicitly produced by the code generators, except that the code for calling a function saves continue and the code for returning from a function restores it: These corresponding save and restore instructions are explicitly generated by different calls to compile, not as a matched pair by preserving (as we will see in section 5.5.3).
In principle, we could represent an instruction sequence simply as a list of instructions. The function append_instruction_sequences could then combine instruction sequences by performing an ordinary list append. However, preserving would then be a complex operation, because it would have to analyze each instruction sequence to determine how the sequence uses its registers. The function preserving would be inefficient as well as complex, because it would have to analyze each of its instruction sequence arguments, even though these sequences might themselves have been constructed by calls to preserving, in which case their parts would have already been analyzed. To avoid such repetitious analysis we will associate with each instruction sequence some information about its register use. When we construct a basic instruction sequence we will provide this information explicitly, and the functions that combine instruction sequences will derive register-use information for the combined sequence from the information associated with the sequences being combined.
An instruction sequence will contain three pieces of information:
function make_instruction_sequence(needs, modifies, instructions) {
return list(needs, modifies, instructions);
}For example, the two-instruction sequence that looks up the value of the symbol "x" in the current environment, assigns the result to val, and then proceeds to the continuation, requires registers env and continue to have been initialized, and modifies register val. This sequence would therefore be constructed as
make_instruction_sequence(list("env", "continue"), list("val"),
list(assign("val",
list(op("lookup_symbol_value"), constant("x"),
reg("env"))),
go_to(reg("continue"))));
f("x", "y")
f()("x", "y")
f(g("x"), y)
f(g("x"), "y")