As we saw in
section 2.2.3,
sequences can serve as standard interfaces for combining program
modules. We formulated powerful abstractions for manipulating
sequences, such as map,
filter, and
accumulate, that capture a wide variety of
operations in a manner that is both succinct and elegant.
Unfortunately, if we represent sequences as lists, this elegance is
bought at the price of severe inefficiency with respect to both the
time and space required by our computations.
When we represent manipulations on sequences as transformations
of lists, our programs must construct and copy data structures (which
may be huge) at every step of a process.
To see why this is true, let us compare two programs for computing the
sum of all the prime numbers in an interval. The first program is
written in standard iterative style:
function sum_primes(a, b) {
return accumulate((x, y) => x + y,
0,
filter(is_prime,
enumerate_interval(a, b)));
}
In carrying out the computation, the first program needs to store only
the sum being accumulated. In contrast, the filter in the second
program cannot do any testing until
enumerate_interval
has constructed a complete list of the numbers in the interval.
The filter generates another list, which in turn is passed to
accumulate before being collapsed to form
a sum. Such large intermediate storage is not needed by the first program,
which we can think of as enumerating the interval incrementally, adding
each prime to the sum as it is generated.
The inefficiency in using lists becomes painfully apparent if we use
the sequence paradigm to compute the second prime in the interval from
10,000 to 1,000,000 by evaluating the expression
This expression does find the second prime, but the computational overhead
is outrageous. We construct a list of almost a million integers, filter
this list by testing each element for primality, and then ignore almost
all of the result. In a more traditional programming style, we would
interleave the enumeration and the filtering, and stop when we reached
the second prime.
Streams are a clever idea that allows one to use sequence
manipulations without incurring the costs of manipulating sequences as
lists. With streams we can achieve the best of both worlds: We can
formulate programs elegantly as sequence manipulations, while attaining
the efficiency of incremental computation. The basic idea is to arrange
to construct a stream only partially, and to pass the partial
construction to the program that consumes the stream. If the consumer
attempts to access a part of the stream that has not yet been
constructed, the stream will automatically construct just enough more
of itself to produce the required part, thus preserving the illusion
that the entire stream exists. In other words, although we will write
programs as if we were processing complete sequences, we design our
stream implementation to automatically and transparently interleave
the construction of the stream with its use.
To accomplish this, we will construct streams using pairs,
with the first item of the stream in the head of the pair.
However, rather than placing the value of the rest of the stream
into the tail of the pair, we will put there a promise
to compute the rest if it is ever requested.
If we have a data item
h and a stream
t, we construct a stream
whose head is
h and whose tail is
t by evaluating
pair(h, () => t)—the
tail
t of a stream is
wrapped in a function of no arguments,
so that its evaluation will be delayed.
The empty stream is
null, the same as the empty list.
To access the first data item of a nonempty stream,
we simply select the
head of the pair, as with a list.
But to access the tail of a stream, we need to evaluate the
delayed expression.
For convenience, we define
function stream_tail(stream) {
return tail(stream)();
}
This selects the tail of the pair and applies the function
found there to obtain the next pair of the stream
(or
null if the tail of the stream
is empty)—in effect,
forcing the function in the
tail of the pair to fulfill its promise.
We can make and use streams, in just the same way as we can make
and use lists, to represent aggregate data arranged in a sequence. In
particular, we can build stream analogs of the list operations from
chapter 2, such as list_ref,
map, and
for_each:
function stream_ref(s, n) {
return n === 0
? head(s)
: stream_ref(stream_tail(s), n - 1);
}
function stream_map(f, s) {
return is_null(s)
? null
: pair(f(head(s)),
() => stream_map(f, stream_tail(s)));
}
function stream_for_each(fun, s) {
if (is_null(s)) {
return true;
} else {
fun(head(s));
return stream_for_each(fun, stream_tail(s));
}
}
The function
stream_for_each is useful for
viewing streams:
関数stream_for_eachはストリームの内容を見るのに便利です。
function display_stream(s) {
return stream_for_each(display, s);
}
To make the stream implementation automatically and transparently
interleave the construction of a stream with its use, we have arranged
for the tail
of a stream to be evaluated when it is accessed by the
stream_tail
function rather than when the stream is constructed by
pair.
This implementation choice is reminiscent of our discussion of rational numbers
in section 2.1.2, where we saw
that we can choose to implement rational numbers so that the reduction
of numerator and denominator to lowest terms is performed either at
construction time or at selection time. The two rational-number
implementations produce the same data abstraction, but the choice has
an effect on efficiency. There is a similar relationship between
streams and ordinary lists. As a data abstraction, streams are the
same as lists. The difference is the time at which the elements are
evaluated. With ordinary lists, both the
head and the
tail
are evaluated at construction time. With streams, the
tail is evaluated at selection time.
We begin by calling
stream_enumerate_interval with
the arguments 10,000 and 1,000,000. The function
stream_enumerate_interval
is the stream analog of
enumerate_interval
(section 2.2.3):
That is, stream_enumerate_interval
returns a stream represented as a pair whose
head
is 10,000 and whose tail
is a promise to enumerate more of the
interval if so requested. This stream is now filtered for primes,
using the stream analog of the filter
function
(section 2.2.3):
The function stream_filter tests the
head of the stream (which is 10,000). Since
this is not prime, stream_filter
examines the tail of its input stream. The call to
stream_tail forces evaluation of the
delayed stream_enumerate_interval,
which now returns
The function stream_filter now
looks at the head of this stream,
10,001, sees that this is not prime either, forces another
stream_tail, and so on, until
stream_enumerate_interval yields
the prime 10,007, whereupon
stream_filter, according to its
definition, returns
This result is now passed to
stream_tail in our original
expression. This forces the delayed
stream_filter,
which in turn keeps forcing the delayed
stream_enumerate_interval until it
finds the next prime, which is 10,009. Finally, the result passed to
head in our original expression is
The function head returns 10,009, and the
computation is complete. Only as many integers were tested for
primality as were necessary to find the second prime, and the interval
was enumerated only as far as was necessary to feed the prime filter.
関数 head は 10,009 を返し、計算は完了です。
素数性の検査は 2 番目の素数を見つけるのに必要な整数についてだけ行われ、区間の列挙も
素数フィルタに供給するのに必要な範囲までしか行われませんでした。
In general, we can think of delayed evaluation as
demand-driven programming, whereby each stage in the
stream process is activated only enough to satisfy the next stage. What
we have done is to
decouple the actual order of events in the computation from the apparent
structure of our functions. We write functions as if the streams existed
all at once when, in reality, the computation is
performed incrementally, as in traditional programming styles.
When we construct stream pairs, we delay the evaluation of their tail
expressions by wrapping these expressions in a function. We force their
evaluation when needed, by applying the function.
This implementation suffices for streams to work as advertised, but
there is an important optimization that we shall consider where needed.
In many applications, we end up forcing the same delayed object many
times. This can lead to serious inefficiency in recursive programs
involving streams. (See
exercise 3.57.)
The solution is to build delayed objects so that the first time they are
forced, they store the value that is computed. Subsequent forcings will
simply return the stored value without repeating the computation. In
other words, we implement the construction of stream pairs as a
memoized function similar to the one described in
exercise 3.27. One way to accomplish this
is to use the following function, which takes as argument a function
(of no arguments) and returns a memoized version of the function.
The first time the memoized function is run, it saves the computed
result. On subsequent evaluations, it simply returns
the result.
Declare a function stream_map_2
that takes a binary function and two streams as arguments and returns
a stream whose elements are the results of applying the function
pairwise to the corresponding elements of the argument streams.
Similar to stream_map_optimized,
declare a function
stream_map_2_optimized by
modifying your
stream_map_2
such that the result stream employs memoization.
Note that our primitive function
display returns its argument
after displaying it.
What does the interpreter print in response to evaluating each
statement in the following sequence?
let x = stream_map_optimized(display, stream_enumerate_interval(0, 10));
stream_ref(x, 5);
stream_ref(x, 7);
There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.
let sum = 0;
function accum(x) {
sum = x + sum;
return sum;
}
const seq = stream_map(accum, stream_enumerate_interval(1, 20));
const y = stream_filter(is_even, seq);
const z = stream_filter(x => x % 5 === 0, seq);
stream_ref(y, 7);
display_stream(z);
What is the value of sum after each of the
above statements is evaluated?
What is the printed response to evaluating the
stream_ref and
display_stream expressions?
Would these responses differ if we had applied the function
memo
on every tail of every constructed stream pair, as suggested in the
optimization above? Explain.
There is currently no solution available for this exercise. This textbook adaptation is a community effort. Do consider contributing by providing a solution for this exercise, using a Pull Request in Github.
This should
bother you. The fact that we are defining such similar functions
for streams and lists indicates that we are missing some underlying
abstraction. Unfortunately, in order to exploit this abstraction, we
will need to exert finer control over the process of evaluation than we
can at present. We will discuss this point further at the end of
section 3.5.4.
In section 4.2, we'll
develop a framework that unifies lists and streams.
The numbers shown here do not really appear in the delayed
expression. What actually appears is the original expression, in an
environment in which the variables are bound to the appropriate numbers.
For example, low + 1 with
low bound to 10,000 actually appears
where 10001 is shown.
There are many possible implementations of streams
other than the one described in this section. Delayed evaluation, which
is the key to making streams practical, was inherent in
Algol 60's
call-by-name
parameter-passing method. The use of this mechanism to implement
streams was first described by
Landin (1965). Delayed evaluation for
streams was introduced into Lisp by
Friedman and Wise (1976). In their
implementation,
cons (the Lisp
equivalent of our
pair function)
always delays evaluating its arguments, so
that lists automatically behave as streams. The memoizing
optimization is also known as
call-by-need. The Algol community would refer to our original
delayed objects as
call-by-name thunks and to the optimized
versions as call-by-need thunks.
Exercises such
as 3.51 and 3.52
are valuable for testing our understanding of how delayed evaluation
works. On the other hand, intermixing delayed evaluation with
printing—and, even worse, with assignment—is extremely
confusing, and instructors of courses on computer languages have
traditionally tormented their students with examination questions such
as the ones in this section. Needless to say, writing programs that
depend on such subtleties is
odious programming style. Part of the power of stream processing is
that it lets us ignore the order in which events actually happen in
our programs. Unfortunately, this is precisely what we cannot afford
to do in the presence of assignment, which forces us to be concerned
with time and change.