Continuations

A calculation consists of steps executed in a certain order. Each step produces a result by applying a function to some number of inputs. The continuation of a step s (in a calculation) is the future history of s - the totality of steps after s which use the value produced by s.

For example, consider the following calculation in K:

(2*3)+4-5

When K is about to compute the step 4-5, the continuation of the computation is the function (2*3)+. When K is about to compute the step 2*3, the future of the computation is the function {x+-1} (since 4-5 has already been computed). And when K is about to compute the step 6+-1, the future of the computation is {x}, the identity function, since nothing is left to compute.

A tiny concatenative language

tcK is a tiny pseudo-continuation-passing-style version of the cK language. To keep it as simple as possible, support for Joy-style syntax and display has not been implemented (this can be added later). "Primitives" are simple K functions. Defined programs are projections of 'd' or 'l'. First-class continuations are triples of the form (x;y;z), where x is the data stack, y is the instruction stack, and z is either () or a continuation.

K (2 x 20 primitives: ~!@#$%^&*_-+=|:<,>.?)

monad		  X -> monad X
dyad		X Y -> X dyad Y

Stack

pop		X    -> 
popd		X Y  ->  Y
dup		X    ->  X X
dup2		X Y  ->  X Y X Y
dupd		X Y  ->  X X Y
swap		X Y  ->  Y X
pair		X Y  ->  (X;Y)

Combinators

dip		X Y ,P     ->  X | P Y
i		X ,P       ->  X | P
ci		B Q        ->    | Q[B?1]

Continuations = (x;y;(x;y;z))

cc		A | B | C      ->  A | B | A B C		cache continuation
c		A | B | X Y Z  ->  X A | Y | X Y Z		continue
z		A | B | X Y Z  ->  A ,X Y Z | B | ()		pop cached continuation
xyz		A | B | X Y Z  ->  X | Y | Z			go to

Continuations = (x;y;(y;z))

cc		A | B | C    ->  A | B | B C			cache continuation
c		A | B | Y Z  ->  A | Y | Y Z			continue
z		A | B | Y Z  ->  A ,Y Z | B | ()		pop cached continuation
xyz		A | B | Y Z  ->  A | Y | Z			go to

Miscellaneous

stop		   -> 						: to continue
p		X  ->  X					print
pp		X  ->						print and pop
p.		   ->						print .

Metalinguistic

d		P:d[(..)]					define program
l		P:d[("--x--";.."x"..)]				define pseudo-lambda
e		r:e(..)						evaluate program to result
s		r:s(..;(..);..)					timeshare list of programs

Recursion

Stackless recursion

Since K uses the C stack, there is a limit to the depth of descent of any recursion. The limit depends on the sizes of the stack frames generated in the course of the recursion:

  f:{f x+1}
  f 0
stack error
{f x+1}
 ^
>  x
3583
>  \

  g:{g[x;y+1]}
  g[!100000;0]
stack error
{g[x;y+1]}
 ^
>  y
2388
> \

A tcK version, which prints the value of its "argument" at each invocation:

  Rec:d[((p;1;+;`Rec);i)]
  e(0;Rec)
0
1
2
3
4
5
6
:
/ recurses forever

Notice that the call to Rec is made through a symbolic reference, `Rec. This is because K's lists are eager, and since Rec isn't defined until the expression d[(p;1;+;`Rec)] is evaluated (to the projection of 'd'), the right-most occurrence of 'Rec' would trigger a value error. The tcK evaluator treats word-symbols as equivalent to the functions they symbolize.

In a language which optimized tail-recursion, the recursive reference to 'f' in the function f:{f x+1} would be translated away in favor of a looping construct. Moreover, although tcK is written in what is called "continuation passing style" (every primitive, and indeed every program is called on the current continuation), it does not completely conform to the expectations implied by use of this term. That is, in CPS, a primitive like + will take its normal operands -- data items to be added together -- plus the current continuation. Instead of returning a result x+y, it will instead call the continuation with that value. Since the continuation is itself a function, we have a series of nested function-calls: f[g[h[...]]], with the continuation-call in tail-position. Again, since K is not tail-recursive, we can't implement in this style without falling prey to the C-stack.

The solution to this problem is to implement the implied tail-recursion explicitly in the evaluator. Instead of having the primitive call the continuation on the value it computes, the primitive modifies the incoming continuation and returns it to the evaluator. In this implementation, we think of a continuation as a mutable object rather than a function.

Consider the program Rec. We begin evaluation with the following continuation, the data-stack on the left, the instruction stack in the middle, and a (possibly nested) cached continuation on the right (in this example, ()).

,0			((p;1;+;`Rec);i)		()

The evaluator pushes (p;1;+;`Rec) onto the data stack, so we have

(0;(p;1;+;`Rec))	,i				()

Next, the combinator 'i' is executed. We call 'i' on the current continuation, and it returns a new continuation:

,0			(p;1;+;`Rec)			()

We then call 'p', which prints 0, and leaves us with:

,0			(1;+;`Rec)			()

We then push 1 onto the data stack, yielding

0 1			(+;`Rec)			()

We then call '+', and we have

,1			,`Rec				()

Finally, `Rec is evaluated and applied to (,1;();()). To see how the instruction stack is restored to its initial state, take a closer look at how Rec is defined. Rec is a projection of the 'd' function. The first argument to 'd' is the tcK code, expressed as a list, that we wish to associate with Rec. The 'd' function takes three more arguments, x, y, and z, which hold the data and instruction stacks, and the cached continuation. 'd' is just

  d:{[p;x;y;z](x;p,y;z)}

That is, d[p] simply pushes the defined program p on the instruction stack. So Rec[,1;();()] yields

,1			((p;1;+;`Rec);i)		()

which is what we need to continue the recursion.

Recurse 100 times

  Rec100:d[(dup;100;<;,(p.;1;+;`Rec100);ci)]

Ackermann's function

In K, this is

  ack:{:[~x;y+1;~y;_f[x-1;1];_f[x-1;_f[x;y-1]]]}

In tcK:

  Ack:d[(dup2;pair;~:
         ((popd;1;+)
          (pop;1;-;1;`Ack)
          (dupd;1;-;`Ack;(1;-);dip;`Ack))
         ci)]

Implicit continuations

'cc' takes the data stack x, the instruction stack y, and the cached continuation z and returns (x;y;(x;y;z)). 'c' takes (x;y;(x1;y1;z1)) and returns (x1,-1#x;y1;z), which causes execution to jump to the instruction *y1. Programs which use 'cc' and 'c' (but not 'gc' or 'sc') make implied use of continuations. We can simulate exceptions, control-structures, and time-sharing using only 'cc' and 'c'.

Exceptions

In K we can throw exceptions with ' and catch them with protected execute, .[;;:], but what we can't do is jump from a program deep in the execution stack to one of the programs above it. For example, suppose f0 calls f1, and f1 calls f2. If f2 encounters some condition, it wants to go immediately to the point at which f0 called f1, skipping the upward ascent through f1.

In tcK we do this with the continuation operators 'cc' and 'c'.

  F0:d[(cc;,`F1;ci;"done";pp)]
  F1:d[("F1";p;`F2;"never executes";p)]
  F2:d[("F2";p;0;c;"never executes";p)]

  e(0;F0)
done
()

  e(1;F0)
F1
F2
done
()

The first thing F0 does is cache the current continuation. It then pushes ,`F1 on the stack and calls 'ci'. If there's a 1 on top of the execution stack, it calls F1. F1 calls F2, which pushes 0 and calls 'c', which applies the continuation stored in F0. What was that continuation? Suppose we had executed e(1;F0). Then the continuation stored by F0 is:

  ,1	(,`F1;ci;"done";pp)	()

So the result of applying 'c' to 0 is the effect of setting the current continuation to

  ,0	(,`F1;ci;"done";pp)	()

Some writers have referred to this as "time-travel" to an earlier state of the program, but it looks more like teleportation to me: we move sideways out of the control flow and jump to a point above.

This time through the code we skip F1, since the element at the top of the stack is 0. We print "done" and pop the stack, and exit.

Control structures

K has two iterative control structures:

  do[i:5;\i-:1]
4
3
2
1
0

  a:10 20 30;while[#a;\#a:1_ a]
2
1
0

We can simulate these with 'cc' and 'c'

  Do:d[(cc;1;-;dup;p;,c;ci;"done";pp)]
  e(5;Do)
4
3
2
1
0

  While:d[(cc;1;swap;drop;dup;#:;p;0;>;,c;ci;"done";pp)]
  e(10 20 30;While)
2
1
0

Timesharing

The tcK interpreter is O(1), which means that we can evaluate a tcK program one instruction at a time. The state of the tcK machine at any time is a triple consisting of the data stack, the instruction stack, and the stored continuation C. Combining these two properties, we can timeshare any number of tcK programs, looping over the set and executing "the next instruction" in each program. A tcK program terminates when the instruction stack is empty (nothing left to do). So, if we have two programs P and Q, and P terminates before Q, we can keep executing P until Q terminates. Executing "the next instruction" in a terminated program is a no-op.

Here's an example. The program Pr loops through a list printing each element:

  Pr:d[(cc;dup;#:;0;>;,(dup;*:;pp;1;swap;_;c);ci)]
  e(10 20 30;Pr)
10
20
30
!0

We want to execute Pr on two (or more) lists, interleaving the prints. To do this, we use 's' instead of 'e', and we feed 's' a list of both (or all) programs:

  s((10 20 30;Pr);(-!5;Pr))
10
0
20
-1
30
-2
-3
-4
(,!0
 ,!0)

Explicit continuations

In tcK, continuations are first-class. A first-class continuation can be manufactured (or accessed from the cached continuation), placed on the data stack, manipulated, changed, passed along to subprograms, or returned to the cache. Two functions are supplied: 'z' pops the cached continuation and pushes it onto the data stack, and 'xyz' pops the data stack, where it expects a continuation.

Coroutines

Left:d[(dup;1;+;-:;p;swap;`Right)]
Right:d[(dup;1;+;p;`Left)]

  e(0;`Left)
-1
0
-2
1
-3
2
:

Implementation

At any point in the course of execution, three lists hold the current state:

x	the data stack
y	the instruction stack
z	cached continuation

Intuitively, the data stack is a list of the work done up to that point, and the instruction stack is a list of work yet to do.

As an example, consider the execution of the program K:

(4;5;-;2;3;*;+)

Initially, x is () and y is K. z is (), and in this example, will always be (), since we don't use 'cc'. We pop 4 off the instruction stack and, since it is not a function (or a symbol), we append it to the data stack and drop it from the instruction stack. So x is ,4 and y is (5;-;2;3;*;+). Next we pop 5, and since that's not a function, we push it onto the data stack and drop it from the instruction stack. So now we have x = (4;5) and y = (-;2;3;*;+). Next we pop -, and since that is a function, we apply it to (x;1_ y). - pops the last two elements from the data stack, subtracts the second from the first, appends that value to the data stack, and returns. So x is now ,-1 and y is (2;3;*;+). Eventually we will run out of input elements to process, and at that point execution halts, with the final result of the calculation in x.

The code for processing K is:

e:{*(a .)/(();x;())}
a:{:[~#y;(x;y;z);g@*y;f[x;y;z];(x,u@*y;1_ y;z)]}
g:{:[7=t:4:x;1;~4=t;0;~(*$x)_in"{(",j]}
f:{:[(#k)>i:k?*y;((v[i]_ x),,y[0]. v[i]#x;1_ y;z);y[0][x;1_ y;z]]}

'e' takes a program x and applies (a .) to (();x;()). Initially, both the data stack and the cached continuation are empty. (a .) returns a triple (x';y';z'), and we continue to apply (a .) until the result is the same twice in a row. Effectively, for well-formed programs, this is equivalent to the condition: continue processing while the instruction stack -- the second element of the pair -- is non-empty.

'a' takes three arguments: x, the data stack, y, the instruction stack, and z, the cached continuation. If the instruction stack is empty, we return (x;y;z) unchanged (thus triggering termination of the tight loop in the calling function 'e'.) If the first element of the instruction stack *y is a function or the name of a function, we call 'f'. If *y is not a function or a symbol, we append it's quotation to the data stack.

'g' takes a single argument x. If x is a function return 1, else if x is not symbolic return 0, else return 1 if x isn't the symbol of a primitive (j), a lambda ({}), or a composition (()).

'f' must be able to handle primitives - either monads or dyads - and defined programs or symbols of defined programs. If *y is a primitive, we apply it to the top one or two arguments of the stack, depending on its valence. Otherwise, we simply take *y as the element to be applied. In all cases, the arguments to the resulting function (or symbol) is the data stack x, the instruction stack y minus the element to which *y has been applied, and the cached continuation z.

The code for timesharing evaluation also uses 'a':

s:*:'(a .')/{(();x;())}'

{(();x;())}' takes each program and "triple-izes" it, then / converges (a .') on all of them. a .' executes a single instruction in each triple, and it does this repeatedly until the whole set converges. If a program terminates early, we continue to execute it without effect. Finally, we return a list of the data stacks.

Conclusions

Flow-of-control is the ultimate verb of programming. First-class continuations are nouns, data-structures the program can access, manipulate, and use. When a continuation is used, the flow of control is updated, and this alters the behavior of the program. In tcK, continuations are the sea in which programs swim*.

References

This project was undertaken in the course of my K implementation of David Madore's Unlambda language. His admirably lucid (but as of this date incomplete) paper on call/cc was my constant reading material for several days.

Colleagues have informed me that my implementation resembles (perhaps up to identity) the approach known as the "cactus" or "spaghetti" stack.

Thanks to Bakul Shah and Billy Tanksley for suggestions.

* Somewhere in my reading on this topic, I came across a throwaway remark by someone to the effect that, to grasp the concept of continuations one merely has to watch the film Memento. Clever. Or I suppose one could read Richard Garfinkle's novel of life in solid time, All of an Instant.