4.1.2 Representing Components - SICP Comparison Edition

Programmers write programs as text, i.e. sequences of characters, entered in a programming environment or a text editor. To run our evaluator, we need to start with a representation of this program text as a JavaScript value. In section 2.3.1 we introduced strings to represent text. We would like to evaluate programs such as "const size = 2; 5 * size;" from section 1.1.2. Unfortunately, such program text does not provide enough structure to the evaluator. In this example, the program parts "size = 2" and "5 * size" look similar, but carry very different meanings. Abstract syntax functions such as declaration_value_expression would be difficult and error-prone to implement by examining the program text. In this section, we therefore introduce a function parse that translates program text to a tagged-list representation, reminiscent of the tagged data of section 2.4.2. For example, the application of parse to the program string above produces a data structure that reflects the structure of the program: a sequence consisting of a constant declaration associating the name size with the value 2 and a multiplication.

parse("const size = 2; 5 * size;");

list("sequence",
     list(list("constant_declaration",
               list("name", "size"), list("literal", 2)),
          list("binary_operator_combination", "*",
               list("literal", 5), list("name", "size"))))

The syntax functions used by the evaluator access the tagged-list representation produced by parse.

The evaluator is reminiscent of the symbolic differentiation program discussed in section 2.3.2. Both programs operate on symbolic data. In both programs, the result of operating on an object is determined by operating recursively on the pieces of the object and combining the results in a way that depends on the type of the object. In both programs we used data abstraction to decouple the general rules of operation from the details of how the objects are represented. In the differentiation program this meant that the same differentiation function could deal with algebraic expressions in prefix form, in infix form, or in some other form. For the evaluator, this means that the syntax of the language being evaluated is determined solely by parse and the functions that classify and extract pieces of the tagged lists produced by parse.

Figure 4.3 Syntax abstraction in the evaluator.

Figure 4.3 depicts the abstraction barrier formed by the syntax predicates and selectors, which interface the evaluator to the tagged-list representation of programs, which in turn is separated from the string representation by parse. Below we describe the parsing of program components and list the corresponding syntax predicates and selectors, as well as constructors if they are needed.

Literal expression

Literal expressions are parsed into tagged lists with tag "literal" and the actual value. \[ \begin{align*} \ll\ \mathit{literal}\textit{-}\mathit{expression}\ \gg & = \texttt{list("literal", }\mathit{value}\texttt{)} \end{align*} \] where $value$ is the JavaScript value represented by the $literal$-$expression$ string. Here $\ll\ $$literal$-$expression$$\ \gg$ denotes the result of parsing the string $literal$-$expression$.

parse("1;");

list("literal", 1)

parse("'hello world';");

list("literal", "hello world")

parse("null;");

list("literal", null)

The syntax predicate for literal expressions is is_literal.

function is_literal(component) {
    return is_tagged_list(component, "literal");
}

It is defined in terms of the function is_tagged_list, which identifies lists that begin with a designated string:

function is_tagged_list(component, the_tag) {
    return is_pair(component) && head(component) === the_tag;
}

The second element of the list that results from parsing a literal expression is its actual JavaScript value. The selector for retrieving the value is literal_value.

function literal_value(component) {    
    return head(tail(component));
}

literal_value(parse("null;"));

null

In the rest of this section, we just list the syntax predicates and selectors, and omit their declarations if they just access the obvious list elements.

We provide a constructor for literals, which will come in handy:

function make_literal(value) {
    return list("literal", value);
}

Names

The tagged-list representation for names includes the tag "name" as first element and the string representing the name as second element. \[ \begin{align*} \ll\ \mathit{name}\ \gg & = \texttt{list("name", }\mathit{symbol}\texttt{)} \end{align*} \] where $symbol$ is a string that contains the characters that make up the $name$ as written in the program. The syntax predicate for names is is_name. The symbol is accessed using the selector symbol_of_name. We provide a constructor for names, to be used by operator_combination_to_application:

function make_name(symbol) {
    return list("name", symbol);
}

Expression statements

We do not need to distinguish between expressions and expression statements. Consequently, parse can ignore the difference between the two kinds of components: \[ \ll\ \mathit{expression}\texttt{;}\ \gg {=} \ll\ \mathit{expression}\ \gg \]

Function applications

Function applications are parsed as follows: \[ \ll\ \mathit{fun}\textit{-}\mathit{expr}\texttt{(}\mathit{arg}\textit{-}\mathit{expr}_1\texttt{, }\ldots\texttt{, } \mathit{arg}\textit{-}\mathit{expr}_n \texttt{)}\ \gg \\ = \\ \texttt{list("application",} \ll\ \mathit{fun}\textit{-}\mathit{expr}\gg\texttt{, list(} \ll\ \mathit{arg}\textit{-}\mathit{expr}_1\;\gg \texttt{, }\ldots\texttt{, } \ll\ \mathit{arg}\textit{-}\mathit{expr}_n\;\gg \texttt{))} \] We declare is_application as the syntax predicate and function_expression and arg_expressions as the selectors. We add a constructor for function applications, to be used by operator_combination_to_application:

function make_application(function_expression, argument_expressions) {
    return list("application",
                function_expression, argument_expressions);
}

Conditionals

Conditional expressions are parsed as follows:

$\ll\ \mathit{predicate}\ \texttt{?}\ \mathit{consequent}\textit{-}\mathit{expression}\ \texttt{:}\ \mathit{alternative}\textit{-}\mathit{expression}\ \gg\ \ = $ list("conditional_expression", $\ll\ $$predicate$$\ \gg$, $\ll\ $$consequent$-$expression$$\ \gg$, $\ll\ $$alternative$-$expression$$\ \gg$)

Similarly, conditional statements are parsed as follows:

$\ll\ \textbf{if}\ \texttt{(} \mathit{predicate} \texttt{)}\ \mathit{consequent}\textit{-}\mathit{block}\ \textbf{else}\ \mathit{alternative}\textit{-}\mathit{block}\ \gg\ \ =$ list("conditional_statement", $\ll\ $$predicate$$\ \gg$, $\ll\ $$consequent$-$block$$\ \gg$, $\ll\ $$alternative$-$block$$\ \gg$)

The syntax predicate is_conditional returns true for both kinds of conditionals, and the selectors conditional_predicate, conditional_consequent, and conditional_alternative can be applied to both kinds.

Lambda expressions

A lambda expression whose body is an expression is parsed as if the body consisted of a block containing a single return statement whose return expression is the body of the lambda expression. \[ \ll\ \texttt{(}\mathit{name}_1\texttt{, }\ldots\texttt{, } \mathit{name}_n \texttt{) =>}\ \mathit{expression}\ \gg \\ = \\ \ll\ \texttt{(}\mathit{name}_1\texttt{, }\ldots\texttt{, } \mathit{name}_n \texttt{) => \{}\ \textbf{return} \ \mathit{expression}\texttt{;}\ \texttt{\}}\ \gg \] A lambda expression whose body is a block is parsed as follows:

$\ll\ \texttt{(}\mathit{name}_1\texttt{, }\ldots\texttt{, } \mathit{name}_n \texttt{) =>}\ \mathit{block}\ \gg\ \ =$ list("lambda_expression", list($\ll\ name_1\;\gg$, $\ldots$, $\ll\ name_n\;\gg$), $\ll\ block\ \gg$)

The syntax predicate is is_lambda_expression and the selector for the body of the lambda expression is lambda_body. The selector for the parameters, called lambda_parameter_symbols, additionally extracts the symbols from the names.

function lambda_parameter_symbols(component) {
    return map(symbol_of_name, head(tail(component)));
}

The function function_decl_to_constant_decl needs a constructor for lambda expressions:

function make_lambda_expression(parameters, body) {
    return list("lambda_expression", parameters, body);
}

Sequences

A sequence statement packages a sequence of statements into a single statement. A sequence of statements is parsed as follows: \[ \ll\ \mathit{statement}_1 \cdots \mathit{statement}_n\;\gg \\ = \\ \texttt{list("sequence", list(} \ll\ \mathit{statement}_1\;\gg \texttt{, } \ldots \texttt{, } \ll\ \mathit{statement}_n\;\gg \texttt{))} \] The syntax predicate is is_sequence and the selector is sequence_statements. We retrieve the first of a list of statements using first_statement and the remaining statements using rest_statements. We test whether the list is empty using the predicate is_empty_sequence and whether it contains only one element using the predicate is_last_statement.[1]

function first_statement(stmts) { return head(stmts); }

function rest_statements(stmts) { return tail(stmts); }

function is_empty_sequence(stmts) { return is_null(stmts); }

function is_last_statement(stmts) { return is_null(tail(stmts)); }

Blocks

Blocks are parsed as follows:[2] \[ \begin{align*} \ll\ \texttt{\{}\ \mathit{statements}\ \texttt{\}}\ \gg & = \texttt{list("block",}\ \ll\ \mathit{statements}\ \gg \texttt{)} \end{align*} \] Here $statements$ refers to a sequence of statements, as shown above. The syntax predicate is is_block and the selector is block_body.

Return statements

Return statements are parsed as follows: \[ \begin{align*} \ll\ \textbf{return}\ \mathit{expression} \texttt{;}\ \gg & = \texttt{list("return\_statement",}\ \ll\ \mathit{expression}\ \gg \texttt{)} \end{align*} \] The syntax predicate and selector are, respectively, is_return_statement and return_expression.

Assignments

Assignments are parsed as follows: \[ \begin{align*} \ll\ \mathit{name}\ \ \texttt{=}\ \ \mathit{expression}\ \gg & = \texttt{list("assignment", }\ll\ \mathit{name}\gg \texttt{, }\ll\ \mathit{expression}\ \gg \texttt{)} \end{align*} \] The syntax predicate is is_assignment and the selectors are assignment_symbol and assignment_value_expression. The symbol is wrapped in a tagged list representing the name, and thus assignment_symbol needs to unwrap it.

function assignment_symbol(component) {
    return symbol_of_name(head(tail(component))));
}

Constant, variable, and function declarations

Constant and variable declarations are parsed as follows: \[ \ll\ \textbf{const}\ \mathit{name}\ \ \texttt{=}\ \ \mathit{expression}\texttt{;}\ \gg \\ = \\ \texttt{list("constant\_declaration", } \ll\ \mathit{name}\ \gg \texttt{, }\ll\ \mathit{expression}\ \gg \texttt{)}\\[3mm] \ll\ \textbf{let}\ \mathit{name} \ \ \texttt{=}\ \ \mathit{expression}\texttt{;}\ \gg \\ = \\ \texttt{list("variable\_declaration", } \ll\ \mathit{name}\ \gg \texttt{, }\ll\ \mathit{expression}\ \gg \texttt{)} \] The selectors declaration_symbol and declaration_value_expression apply to both kinds.

function declaration_symbol(component) {
    return symbol_of_name(head(tail(component)));
}
function declaration_value_expression(component) {
    return head(tail(tail(component)));
}

The function function_decl_to_constant_decl needs a constructor for constant declarations:

function make_constant_declaration(name, value_expression) {
    return list("constant_declaration", name, value_expression);
}

Function declarations are parsed as follows:

$\ll\ \textbf{function}\ \mathit{name} \texttt{(}\mathit{name}_1\texttt{, }\ldots\texttt{, } \mathit{name}_n \texttt{)} \ \mathit{block}\ \gg\ \ =$ list("function_declaration", $\ll\ $$name$$\ \gg$, list($\ll\ $$name$$_1\;\gg$, $\ldots$, $\ll\ $$name$$_n\;\gg$), $\ll\ $$block$$\ \gg$)

The syntax predicate is_function_declaration recognizes these. The selectors are function_declaration_name, function_declaration_parameters, and function_declaration_body.

The syntax predicate is_declaration returns true for all three kinds of declarations.

function is_declaration(component) {
    return is_tagged_list(component, "constant_declaration") ||
           is_tagged_list(component, "variable_declaration") ||
           is_tagged_list(component, "function_declaration");
}

Derived components

Some syntactic forms in our language can be defined in terms of components involving other syntactic forms, rather than being implemented directly. One example is function declaration, which evaluate transforms into a constant declaration whose value expression is a lambda expression.[3]

function function_decl_to_constant_decl(component) {
    return make_constant_declaration(
               function_declaration_name(component),
               make_lambda_expression(
                   function_declaration_parameters(component),
                   function_declaration_body(component)));
}

Implementing the evaluation of function declarations in this way simplifies the evaluator because it reduces the number of syntactic forms for which the evaluation process must be explicitly specified.

Similarly, we define operator combinations in terms of function applications. Operator combinations are unary or binary and carry their operator symbol as second element in the tagged-list representation: \[ \ll\ \mathit{unary}\textit{-}\mathit{operator}\ \ \mathit{expression}\ \gg \\ = \\ \texttt{list("unary\_operator\_combination", "}\mathit{unary}\textit{-}\mathit{operator}\texttt{"},\ \texttt{list(}\ll\ \mathit{expression}\ \gg \texttt{))} \] where $unary$-$operator$ is ! (for logical negation) or -unary (for numeric negation), and \[ \ll\ \mathit{expression}_1\ \mathit{binary}\textit{-}\mathit{operator}\ \ \mathit{expression}_2\;\gg \\ = \\ \texttt{list("binary\_operator\_combination", "}\mathit{binary}\textit{-}\mathit{operator}\texttt{"},\\ \texttt{list(}\ll\ \mathit{expression}_1\;\gg\texttt{,}\ \ll\ \mathit{expression}_2\;\gg \texttt{))} \] where $binary$-$operator$ is +, -, *, /, %, ===, !==, >, <, >= or <=. The syntax predicates are is_operator_combination, is_unary_operator_combination, and is_binary_operator_combination, and the selectors are operator_symbol, first_operand, and second_operand.

The evaluator uses operator_combination_to_application to transform an operator combination into a function application whose function expression is the name of the operator:

function operator_combination_to_application(component) {
    const operator = operator_symbol(component);
    return is_unary_operator_combination(component)
           ? make_application(make_name(operator),
                              list(first_operand(component)))
           : make_application(make_name(operator),
                              list(first_operand(component),
                                   second_operand(component)));
}

Components (such as function declarations and operator combinations) that we choose to implement as syntactic transformations are called derived components. Logical composition operations are also derived components (see exercise 4.4).

Exercise 4.2 The inverse of parse is called unparse. It takes as argument a tagged list as produced by parse and returns a string that adheres to JavaScript notation.

Write a function unparse by following the structure of evaluate (without the environment parameter), but producing a string that represents the given component, rather than evaluating it. Recall from section 3.3.4 that the operator + can be applied to two strings to concatenate them and that the primitive function stringify turns values such as 1.5, true, null and undefined into strings. Take care to respect operator precedences by surrounding the strings that result from unparsing operator combinations with parentheses (always or whenever necessary).
Your unparse function will come in handy when solving later exercises in this section. Improve unparse by adding " " (space) and "\n" (newline) characters to the result string, to follow the indentation style used in the JavaScript programs of this book. Adding such whitespace characters to (or removing them from) a program text in order to make the text easier to read is called pretty-printing.

There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.

Exercise 4.3 Rewrite evaluate so that the dispatch is done in data-directed style. Compare this with the data-directed differentiation function of exercise 2.73. (You may use the tag of the tagged-list representation as the type of the components.)

There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.

Exercise 4.4 Recall from section 1.1.6 that the logical composition operations && and || are syntactic sugar for conditional expressions: The logical conjunction $\mathit{expression}_1$ && $\mathit{expression}_2$ is syntactic sugar for $ \mathit{expression}_1$ ? $\mathit{expression}_2$ : false, and the logical disjunction $\mathit{expression}_1$ || $\mathit{expression}_2$ is syntactic sugar for $ \mathit{expression}_1$ ? true : $\mathit{expression}_2$. They are parsed as follows:

$\ll\ \mathit{expression}_1\ \ \mathit{logical}\textit{-}\mathit{operation}\ \ \mathit{expression}_2\;\gg \ \ = $ list("logical_composition", "$logical$-$operation$", list($\ll\ $$expression$$_1\;\gg$, $\ll\ $$expression$$_2\;\gg$))

where $logical$-$operation$ is && or ||. Install && and || as new syntactic forms for the evaluator by declaring appropriate syntax functions and evaluation functions eval_and and eval_or. Alternatively, show how to implement && and || as derived components.

GitHub user EmilyOng provides a solution for the alternative: how to implement && and || as derived components.

function make_conditional_expr_decl(predicate, consequent_expression, alternative_expression) {
    return list("conditional_expression", predicate, consequent_expression, alternative_expression);
}
function make_literal(value) {
    return list("literal", value);
}

// Syntax selectors
function logical_operation(component) {
    return head(tail(component));
}
function first_expression(component) {
    return head(tail(tail(component)));
}
function second_expression(component) {
    return head(tail(tail(tail(component))));
}

function logical_comp_decl_to_conditional_expr_decl(component) {
    const operation = logical_operation(component);
    
    return operation === "&&"
        ? make_conditional_expr_decl(
            first_expression(component),
            second_expression(component),
            false
        )
        : operation === "||"
        ? make_conditional_expr_decl(
            first_expression(component),
            true,
            second_expression(component)
        )
        : error(component, "unknown operation -- logical_comp_decl_to_conditional_expr_decl");
}

display(logical_comp_decl_to_conditional_expr_decl(parse("a && b;")));
display(logical_comp_decl_to_conditional_expr_decl(parse("a || b;")));
display(logical_comp_decl_to_conditional_expr_decl(parse("(a && !b) || (!a && b);")));

Exercise 4.5

In JavaScript, lambda expressions must not have duplicate parameters. The evaluator in section 4.1.1 does not check for this.
- Modify the evaluator so that any attempt to apply a function with duplicate parameters signals an error.
- Implement a verify function that checks whether any lambda expression in a given program contains duplicate parameters. With such a function, we could check the entire program before we pass it to evaluate.
In order to implement this check in an evaluator for JavaScript, which of these two approaches would you prefer? Why?
In JavaScript, the parameters of a lambda expression must be distinct from the names declared directly in the body block of the lambda expression (as opposed to in an inner block). Use your preferred approach above to check for this as well.

There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.

Exercise 4.6 The language Scheme includes a variant of let called let*. We could approximate the behavior of let* in JavaScript by stipulating that a let* declaration implicitly introduces a new block whose body includes the declaration and all subsequent statements of the statement sequence in which the declaration occurs. For example, the program

let* x = 3;
let* y = x + 2;
let* z = x + y + 5;
display(x * z);

displays 39 and could be seen as a shorthand for

{
  let x = 3;
  {
    let y = x + 2;
    {
      let z = x + y + 5;
      display(x * z);
    }
  }
}

Write a program in such an extended JavaScript language that behaves differently when some occurrences of the keyword let are replaced with let*.
Introduce let* as a new syntactic form by designing a suitable tagged-list representation and writing a parse rule. Declare a syntax predicate and selectors for the tagged-list representation.
Assuming that parse implements your new rule, write a let_star_to_nested_let function that transforms any occurrence of let* in a given program as described above. We could then evaluate a program p in the extended language by running evaluate(let_star_to_nested_let(p)).
As an alternative, consider implementing let* by adding to evaluate a clause that recognizes the new syntactic form and calls a function eval_let_star_declaration. Why does this approach not work?

There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.

Exercise 4.7 JavaScript supports while loops that execute a given statement repeatedly. Specifically, while ($predicate$) { $body$ } evaluates the $predicate$, and if the result is true, evaluates the $body$ and then evaluates the whole while loop again. Once the $predicate$ evaluates to false, the while loop terminates.

For example, recall the imperative-style version of the iterative factorial function from section 3.1.3:

function factorial(n) {
    let product = 1;
    let counter = 1;
    function iter() {
        if (counter > n) {
            return product;
        } else {
            product = counter * product;
            counter = counter + 1;
            return iter();
        }
    }
    return iter();
}

We can formulate the same algorithm using a while loop as follows:

function factorial(n) {
    let product = 1;
    let counter = 1;
    while (counter <= n) {
        product = counter * product;
        counter = counter + 1;
    }
    return product;
}

While loops are parsed as follows:

$\ll\ \textbf{while} \ \texttt{(}\ \mathit{predicate}\ \texttt{)}\ \mathit{block}\ \gg \ \ =$ $\ll\ $while ($predicate$) $block$$\ \gg$ = list("while_loop", $\ll\ $$predicate$$\ \gg$, $\ll\ $$block$$\ \gg$)

Declare a syntax predicate and selectors to handle while loops.
Declare a function while_loop that takes as arguments a predicate and a body—each represented by a function of no arguments—and simulates the behavior of the while loop. The factorial function would then look as follows:
```
function factorial(n) {
    let product = 1;
    let counter = 1;
    while_loop(() => counter <= n,
               () => {
                   product = counter * product;
                   counter = counter + 1;
               });
    return product;
}
```
Your function while_loop should generate an iterative process (see section 1.2.1).
Install while loops as a derived component by defining a transformation function while_to_application that makes use of your function while_loop.
What problem arises with this approach for implementing while loops, when the programmer decides within the body of the loop to return from the function that contains the loop?
Change your approach to address the problem. How about directly installing while loops for the evaluator, using a function eval_while?
Following this direct approach, implement a break; statement that immediately terminates the loop in which it is evaluated.
Implement a continue; statement that terminates only the loop iteration in which it is evaluated, and continues with evaluating the while loop predicate.

There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.

Exercise 4.8 The result of evaluating the body of a function is determined by its return statements. Following up on footnote 6 and the evaluation of declarations in section 4.1.1, this exercise addresses the question of what should be the result of evaluating a JavaScript program that consists of a sequence of statements (declarations, blocks, expression statements, and conditional statements) outside of any function body.

For such a program, JavaScript statically distinguishes between value-producing and non-value-producing statements. (Here statically means that we can make the distinction by inspecting the program rather than by running it.) All declarations are non-value-producing, and all expression statements and conditional statements are value-producing. The value of an expression statement is the value of the expression. The value of a conditional statement is the value of the branch that gets executed, or the value undefined if that branch is not value-producing. A block is value-producing if its body (sequence of statements) is value-producing, and then its value is the value of its body. A sequence is value-producing if any of its component statements is value-producing, and then its value is the value of its last value-producing component statement. Finally, if the whole program is not value-producing, its value is the value undefined.

According to this specification, what are the values of the following four programs?

1; 2; 3;

1; { if (true) {} else { 2; } }

1; const x = 2;

1; { let x = 2; { x = x + 3; } }

Modify the evaluator to adhere to this specification.

There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.