4.2.2 An Interpreter with Lazy Evaluation

Permalink copied!

In this section we will implement a normal-order language that is the same as JavaScript except that compound functions are non-strict in each argument. Primitive functions will still be strict. It is not difficult to modify the evaluator of section 4.1.1 so that the language it interprets behaves this way. Almost all the required changes center around function application.

The basic idea is that, when applying a function, the interpreter must determine which arguments are to be evaluated and which are to be delayed. The delayed arguments are not evaluated; instead, they are transformed into objects called thunks.[1] The thunk must contain the information required to produce the value of the argument when it is needed, as if it had been evaluated at the time of the application. Thus, the thunk must contain the argument expression and the environment in which the function application is being evaluated.

The process of evaluating the expression in a thunk is called forcing.[2] In general, a thunk will be forced only when its value is needed: when it is passed to a primitive function that will use the value of the thunk; when it is the value of a predicate of a conditional; and when it is the value of a function expression that is about to be applied as a function. One design choice we have available is whether or not to memoize thunks, similar to the optimization for streams in section 3.5.1. With memoization, the first time a thunk is forced, it stores the value that is computed. Subsequent forcings simply return the stored value without repeating the computation. We'll make our interpreter memoize, because this is more efficient for many applications. There are tricky considerations here, however.[3]

Modifying the evaluator

The main difference between the lazy evaluator and the one in section 4.1 is in the handling of function applications in evaluate and apply.

The is_application clause of evaluate becomes

: is_application(component)      
? apply(actual_value(function_expression(component), env),
        arg_expressions(component), env)

This is almost the same as the is_application clause of evaluate in section 4.1.1. For lazy evaluation, however, we call apply with the argument expressions, rather than the arguments produced by evaluating them. Since we will need the environment to construct thunks if the arguments are to be delayed, we must pass this as well. We still evaluate the function expression, because apply needs the actual function to be applied in order to dispatch on its type (primitive versus compound) and apply it.

Whenever we need the actual value of an expression, we use

function actual_value(exp, env) {
  return force_it(evaluate(exp, env));
}

instead of just evaluate, so that if the expression's value is a thunk, it will be forced.

Our new version of apply is also almost the same as the version in section 4.1.1. The difference is that evaluate has passed in unevaluated argument expressions: For primitive functions (which are strict), we evaluate all the arguments before applying the primitive; for compound functions (which are non-strict) we delay all the arguments before applying the function.

function apply(fun, args, env) {
  if (is_primitive_function(fun)) {
      return apply_primitive_function(
                 fun, 
                 list_of_arg_values(args, env));              // changed
  } else if (is_compound_function(fun)) {
      const result = evaluate(
                         function_body(fun),
                         extend_environment(
                             function_parameters(fun),
                             list_of_delayed_args(args, env), // changed
                         function_environment(fun)));
      return is_return_value(result)
             ? return_value_content(result)
             : undefined;
  } else {
      error(fun, "unknown function type -- apply");
  }
}

The functions that process the arguments are just like list_of_values from section 4.1.1, except that list_of_delayed_args delays the arguments instead of evaluating them, and list_of_arg_values uses actual_value instead of evaluate:

function list_of_arg_values(exps, env) {	
  return map(exp => actual_value(exp, env), exps);
}
function list_of_delayed_args(exps, env) {	
  return map(exp => delay_it(exp, env), exps);
}

The other place we must change the evaluator is in the handling of conditionals, where we must use actual_value instead of evaluate to get the value of the predicate expression before testing whether it is true or false:

function eval_conditional(component, env) {	
  return is_truthy(actual_value(conditional_predicate(component), env))
         ? evaluate(conditional_consequent(component), env)
         : evaluate(conditional_alternative(component), env);
}

Finally, we must change the driver_loop function (from section 4.1.4) to use actual_value instead of evaluate, so that if a delayed value is propagated back to the read-evaluate-print loop, it will be forced before being printed. We also change the prompts to indicate that this is the lazy evaluator:

const input_prompt = "L-evaluate input: ";
const output_prompt = "L-evaluate value: ";

function driver_loop(env) {
  const input = user_read(input_prompt);
  if (is_null(input)) {
      display("evaluator terminated");
  } else {
      const program = parse(input);
      const locals = scan_out_declarations(program);
      const unassigneds = list_of_unassigned(locals);
      const program_env = extend_environment(locals, unassigneds, env);
      const output = actual_value(program, program_env);
      user_print(output_prompt, output);
      return driver_loop(program_env);
  }
}

With these changes made, we can start the evaluator and test it. The successful evaluation of the try_me expression discussed in section 4.2.1 indicates that the interpreter is performing lazy evaluation:

const the_global_environment = setup_environment();
driver_loop(the_global_environment);

L-evaluate input:

function try_me(a, b) {
  return a === 0 ? 1 : b;
}

L-evaluate value:
undefined

L-evaluate input:

try_me(0, head(null));

L-evaluate value:
1

Representing thunks

Our evaluator must arrange to create thunks when functions are applied to arguments and to force these thunks later. A thunk must package an expression together with the environment, so that the argument can be produced later. To force the thunk, we simply extract the expression and environment from the thunk and evaluate the expression in the environment. We use actual_value rather than evaluate so that in case the value of the expression is itself a thunk, we will force that, and so on, until we reach something that is not a thunk:

function force_it(obj) {	
    return is_thunk(obj)
           ? actual_value(thunk_exp(obj), thunk_env(obj))
           : obj;
}

One easy way to package an expression with an environment is to make a list containing the expression and the environment. Thus, we create a thunk as follows:

function delay_it(exp, env) {	
    return list("thunk", exp, env);
}
function is_thunk(obj) {    
    return is_tagged_list(obj, "thunk");
}
function thunk_exp(thunk) { return head(tail(thunk)); }

function thunk_env(thunk) { return head(tail(tail(thunk))); }

Actually, what we want for our interpreter is not quite this, but rather thunks that have been memoized. When a thunk is forced, we will turn it into an evaluated thunk by replacing the stored expression with its value and changing the thunk tag so that it can be recognized as already evaluated.[4]

function is_evaluated_thunk(obj) {
    return is_tagged_list(obj, "evaluated_thunk");
}
function thunk_value(evaluated_thunk) { 
    return head(tail(evaluated_thunk));
}

function force_it(obj) {
    if (is_thunk(obj)) {
        const result = actual_value(thunk_exp(obj), thunk_env(obj));
        set_head(obj, "evaluated_thunk");
        set_head(tail(obj), result);  // replace exp with its value
        set_tail(tail(obj), null);    // forget unneeded env
        return result;	
    } else if (is_evaluated_thunk(obj)) {
        return thunk_value(obj);
    } else {
        return obj;
    }
}

Notice that the same delay_it function works both with and without memoization.

Exercise 4.25 Suppose we type in the following declarations to the lazy evaluator:

let count = 0;
function id(x) {
    count = count + 1;
    return x;
}

Give the missing values in the following sequence of interactions, and explain your answers.[5]

const w = id(id(10));

L-evaluate input:count;L-evaluate value: $\langle{}$response$\rangle$ L-evaluate input:w;L-evaluate value: $\langle{}$response$\rangle$ L-evaluate input:count;L-evaluate value: $\langle{}$response$\rangle$

There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.

Exercise 4.26 The function evaluate uses actual_value rather than evaluate to evaluate the function expression before passing it to apply, in order to force the value of the function expression. Give an example that demonstrates the need for this forcing.

Exercise 4.27 Exhibit a program that you would expect to run much more slowly without memoization than with memoization. Also, consider the following interaction, where the id function is defined as in exercise 4.25 and count starts at 0:

function square(x) {
    return x * x;
}

L-evaluate input:square(id(10));L-evaluate value: $\langle{}$response$\rangle$ L-evaluate input:count;L-evaluate value: $\langle{}$response$\rangle$ Give the responses both when the evaluator memoizes and when it does not.

Exercise 4.28 Cy D. Fect, a reformed C programmer, is worried that some side effects may never take place, because the lazy evaluator doesn't force the statements in a sequence. Since the value of a statement in a sequence may not be used (the statement may be there only for its effect, such as assigning to a variable or printing), there may be no subsequent use of this value (e.g., as an argument to a primitive function) that will cause it to be forced. Cy thus thinks that when evaluating sequences, we must force all statements in the sequence. He proposes to modify evaluate_sequence from section 4.1.1 to use actual_value rather than evaluate:

function eval_sequence(stmts, env) {
    if (is_empty_sequence(stmts)) {
        return undefined;
    } else if (is_last_statement(stmts)) {
        return actual_value(first_statement(stmts), env);
    } else {
        const first_stmt_value = 
            actual_value(first_statement(stmts), env);
        if (is_return_value(first_stmt_value)) {
            return first_stmt_value;
        } else {
            return eval_sequence(rest_statements(stmts), env);
        }
    }
}

Ben Bitdiddle thinks Cy is wrong. He shows Cy the for_each function described in exercise 2.23, which gives an important example of a sequence with side effects:
```
function for_each(fun, items) {
    if (is_null(items)){
        return "done";
    } else {
        fun(head(items));
        for_each(fun, tail(items));
    }
}
```
He claims that the evaluator in the text (with the original eval_sequence) handles this correctly:
```
L-evaluate input:
```
```
for_each(display, list(57, 321, 88));
```
```
57
321
88
L-evaluate value:
"done"
```
Explain why Ben is right about the behavior of for_each.
Cy agrees that Ben is right about the for_each example, but says that that's not the kind of program he was thinking about when he proposed his change to eval_sequence. He declares the following two functions in the lazy evaluator:
```
function f1(x) {
    x = pair(x, list(2));
    return x;
}

function f2(x) {
    function f(e) {
        e;
        return x;
    }
    return f(x = pair(x, list(2)));
}
```
What are the values of f1(1) and f2(1) with the original eval_sequence? What would the values be with Cy's proposed change to eval_sequence?
Cy also points out that changing eval_sequence as he proposes does not affect the behavior of the example in part a. Explain why this is true.
How do you think sequences ought to be treated in the lazy evaluator? Do you like Cy's approach, the approach in the text, or some other approach?

Exercise 4.29 The approach taken in this section is somewhat unpleasant, because it makes an incompatible change to JavaScript. It might be nicer to implement lazy evaluation as an upward-compatible extension, that is, so that ordinary JavaScript programs will work as before. We can do this by introducing optional parameter declaration as a new syntactic form inside function declarations to let the user control whether or not arguments are to be delayed. While we're at it, we may as well also give the user the choice between delaying with and without memoization. For example, the declaration function f(a, b, c, d) { parameters("strict", "lazy", "strict", "lazy_memo"); $\ldots$ } would define f to be a function of four arguments, where the first and third arguments are evaluated when the function is called, the second argument is delayed, and the fourth argument is both delayed and memoized. You can assume that the parameter declaration is always the first statement in the body of a function declaration, and if it is omitted, all parameters are strict. Thus, ordinary function declaration will produce the same behavior as ordinary JavaScript, while adding the "lazy_memo" declaration to each parameter of every compound function will produce the behavior of the lazy evaluator defined in this section. Design and implement the changes required to produce such an extension to JavaScript. The parse function will treat parameter declarations as function applications, so you need to modify apply to dispatch to your implementation of the new syntactic form. You must also arrange for evaluate or apply to determine when arguments are to be delayed, and to force or delay arguments accordingly, and you must arrange for forcing to memoize or not, as appropriate.

[1] The word thunk was invented by an informal working group that was discussing the implementation of call-by-name in Algol 60. They observed that most of the analysis of (thinking about) the expression could be done at compile time; thus, at run time, the expression would already have been thunk about (Ingerman et al. 1960).

[2] This is analogous to the forcing of the delayed objects that were introduced in chapter 3 to represent streams. The critical difference between what we are doing here and what we did in chapter 3 is that we are building delaying and forcing into the evaluator, and thus making this uniform and automatic throughout the language.

[3] Lazy evaluation combined with memoization is sometimes referred to as call-by-need argument passing, in contrast to call-by-name argument passing. (Call-by-name, introduced in Algol 60, is similar to non-memoized lazy evaluation.) As language designers, we can build our evaluator to memoize, not to memoize, or leave this an option for programmers (exercise 4.29). As you might expect from chapter 3, these choices raise issues that become both subtle and confusing in the presence of assignments. (See exercises 4.25 and 4.27.) An excellent article by Clinger (1982) attempts to clarify the multiple dimensions of confusion that arise here.

[4] Notice that we also erase the env from the thunk once the expression's value has been computed. This makes no difference in the values returned by the interpreter. It does help save space, however, because removing the reference from the thunk to the env once it is no longer needed allows this structure to be garbage-collected and its space recycled, as we will discuss in section 5.3.

Similarly, we could have allowed unneeded environments in the memoized delayed objects of section 3.5.1 to be garbage-collected, by having memo do something like fun = null; to discard the function fun (which includes the environment in which the lambda expression that makes up the tail of the stream was evaluated) after storing its value.

[5] This exercise demonstrates that the interaction between lazy evaluation and side effects can be very confusing. This is just what you might expect from the discussion in chapter 3.

< Previous

Next >

4.2.2 An Interpreter with Lazy Evaluation