A conventional computer memory can be thought of as an array of cubbyholes, each of which can contain a piece of information. Each cubbyhole has a unique name, called its address or location. Typical memory systems provide two primitive operations: one that fetches the data stored in a specified location and one that assigns new data to a specified location. Memory addresses can be incremented to support sequential access to some set of the cubbyholes. More generally, many important data operations require that memory addresses be treated as data, which can be stored in memory locations and manipulated in machine registers. The representation of list structure is one application of such address arithmetic.
To model computer memory, we use a new kind of data structure called a vector. Abstractly, a vector is a compound data object whose individual elements can be accessed by means of an integer index in an amount of time that is independent of the index.[1] In order to describe memory operations, we use two functions for manipulating vectors:[2]
We can use vectors to implement the basic pair structures required for a
list-structured memory. Let us imagine that computer memory is divided into
two vectors:
the_heads
and
the_tails.
We will represent list structure as follows: A pointer to a pair is an index
into the two vectors. The
head
of the pair is the entry in
the_heads
with the designated index, and the
tail
of the pair is the entry in
the_tails
with the designated index. We also need a representation for objects other
than pairs (such as numbers and
strings)
and a way to distinguish one kind of data from another. There are many
methods of accomplishing this, but they all reduce to using
typed pointers, that is, to extending the notion of
pointer
to include information on data type.[4] The data type enables the system to
distinguish a pointer to a pair (which consists of the pair
data type and an index into the memory vectors) from pointers to other
kinds of data (which consist of some other data type and whatever is
being used to represent data of that type). Two data objects are
considered to be the same
(===)
if their pointers are identical.
Figure 5.18
illustrates the use of this method to represent
list(list(1, 2), 3, 4),
whose box-and-pointer diagram is also shown. We use letter prefixes to
denote the data-type information. Thus, a pointer to the pair with
index 5 is denoted p5, the empty list
is denoted by the pointer e0, and a pointer to
the number 4 is denoted n4. In the
box-and-pointer diagram, we have indicated at the lower left of each pair
the vector index that specifies where the
head
and
tail
of the pair are stored. The blank locations in
the_heads
and
the_tails
may contain parts of other list structures (not of interest here).
A pointer to a number, such as n4, might consist of a type indicating numeric data together with the actual representation of the number 4.[5] To deal with numbers that are too large to be represented in the fixed amount of space allocated for a single pointer, we could use a distinct bignum data type, for which the pointer designates a list in which the parts of the number are stored.[6]
A string
might be represented as a typed pointer that designates a
sequence of the characters that form the string's printed
representation. The parser constructs such a sequence
when it encounters a string literal, and the
string-concatenation operator + and
string-producing
primitive functions such as
stringify
construct such a sequence.
Since we want two instances of a string to
be recognized as the same
string by
=== and we want
===
to
be a simple test for equality of pointers, we must ensure that if the
system sees the same string twice, it will use the same pointer (to
the same sequence of characters) to represent both occurrences. To
accomplish this, the system maintains a table, called the
string pool,
of all the strings it has ever encountered. When the system
is about to construct a string, it checks the string pool to see if it has ever
before seen the same string. If it has not, it
constructs a new string (a typed pointer to a new
character sequence) and enters this pointer in the string pool. If the
system has seen the string before, it returns the string pointer
stored in the string pool. This process of replacing strings by unique
pointers is called
string interning.
Given the above representation scheme, we can replace each
primitive
list operation of a register machine with one or
more primitive vector operations. We will use two registers,
the_heads
and
the_tails,
to identify the memory vectors, and will
assume that
vector_ref
and
vector_set
are available as primitive operations. We also assume that numeric
operations on pointers (such as incrementing a pointer, using a pair pointer
to index a vector, or adding two numbers) use only the index portion of
the typed pointer.
For example, we can make a register machine support the instructions assign($reg$$_1$, list(op("head"), reg($reg$$_2$))) assign($reg$$_1$, list(op("tail"), reg($reg$$_2$))) if we implement these, respectively, as assign($reg$$_1$, list(op("vector_ref"), reg("the_heads"), reg($reg$$_2$))) assign($reg$$_1$, list(op("vector_ref"), reg("the_tails"), reg($reg$$_2$))) The instructions perform(list(op("set_head"), reg($reg$$_1$), reg($reg$$_2$))) perform(list(op("set_tail"), reg($reg$$_1$), reg($reg$$_2$))) are implemented as perform(list(op("vector_set"), reg("the_heads"), reg($reg$$_1$), reg($reg$$_2$))) perform(list(op("vector_set"), reg("the_tails"), reg($reg$$_1$), reg($reg$$_2$)))
The operation pair is performed by allocating an unused index and storing the arguments to pair in the_heads and the_tails at that indexed vector position. We presume that there is a special register, free, that always holds a pair pointer containing the next available index, and that we can increment the index part of that pointer to find the next free location.[7] For example, the instruction assign($reg$$_1$, list(op("pair"), reg($reg$$_2$), reg($reg$$_3$))) is implemented as the following sequence of vector operations:[8] perform(list(op("vector_set"), reg("the_heads"), reg("free"), reg($reg$$_2$))), perform(list(op("vector_set"), reg("the_tails"), reg("free"), reg($reg$$_3$))), assign($reg$$_1$, reg("free")), assign("free", list(op("+"), reg("free"), constant(1)))
The === operation list(op("==="), reg($reg$$_1$), reg($reg$$_2$)) simply tests the equality of all fields in the registers, and predicates such as is_pair, is_null, is_string, and is_number need only check the type field.
Although our register machines use stacks, we need do nothing special here, since stacks can be modeled in terms of lists. The stack can be a list of the saved values, pointed to by a special register the_stack. Thus, save($reg$) can be implemented as assign("the_stack", list(op("pair"), reg($reg$), reg("the_stack"))) Similarly, restore($reg$) can be implemented as assign($reg$, list(op("head"), reg("the_stack"))) assign("the_stack", list(op("tail"), reg("the_stack"))) and perform(list(op("initialize_stack"))) can be implemented as assign("the_stack", constant(null)) These operations can be further expanded in terms of the vector operations given above. In conventional computer architectures, however, it is usually advantageous to allocate the stack as a separate vector. Then pushing and popping the stack can be accomplished by incrementing or decrementing an index into that vector.
const x = pair(1, 2); const y = list(x, x);
function count_leaves(tree) {
return is_null(tree)
? 0
: ! is_pair(tree)
? 1
: count_leaves(head(tree)) +
count_leaves(tail(tree));
}
function count_leaves(tree) {
function count_iter(tree, n) {
return is_null(tree)
? n
: ! is_pair(tree)
? n + 1
: count_iter(tail(tree),
count_iter(head(tree), n));
}
return count_iter(tree, 0);
}
arrays.We use the term vector in this book, as it is the more common terminology. The vector functions above are easily implemented using JavaScript's primitive array support.
tagged dataidea we introduced in chapter 2 for dealing with generic operations. Here, however, the data types are included at the primitive machine level rather than constructed through the use of lists.
Type information may be encoded in a variety of ways, depending on the details of the machine on which the JavaScript system is to be implemented. The execution efficiency of JavaScript programs will be strongly dependent on how cleverly this choice is made, but it is difficult to formulate general design rules for good choices. The most straightforward way to implement typed pointers is to allocate a fixed set of bits in each pointer to be a type field that encodes the data type. Important questions to be addressed in designing such a representation include the following: How many type bits are required? How large must the vector indices be? How efficiently can the primitive machine instructions be used to manipulate the type fields of pointers? Machines that include special hardware for the efficient handling of type fields are said to have tagged architectures.
digitis a number between 0 and the largest number that can be stored in a single pointer.