Thanks to Stephen Williams, intrepid arKaeologist, for excavating this paper in the internet ruins. -- I received quite a few comments on this paper, and a number of corrections. This is the last version I'll publish here, although if anyone is interested they can email me for the wisecracks I've removed. Thanks for everyone's patience. -- There is a paper by Bunda and Gerth in one of the old APL proceedings which describes the idea of a pairwise binding grammar (PBG). In that paper, the authors present grammars for the main dialects of APL and demonstrate how small variations in binding strength between a fixed set of types produce large differences in syntax. In work carried out a few years later, Dave Landetta used a PBG to develop a consistent extension of APL2 to arrays of functions. None of the APL implementors picked up on Dave's ideas, which has always puzzled me. A brief discussion with Phil Benkard (the inventor of these grammars) at last week's Minnowbrook Array Languages conference stimulated me to think about implementing a PBG in K for pure K. Now, the Bunda-Gerth grammars had to deal with such anomalies as bracket indexing and the axis operator, which are blessedly absent from pure K. Also, the operators of K are uniformly monadic; adverbs yes, conjunctions no. Consequently, the analytical problem of defining a PBG for pure K is relatively simple. But the programming problem is made no easier by this fact. An APL2 effort, which I undertook in 1986 in order to study Landetta's work, required more than a few lines of code, notwithstanding the availability of n-wise reduction, precisely the iteration pattern required by this problem. So I was curious to see how K might simplify the programming of a general PBG. To assist readability, I've adopted Roger Hui's convention of using a single space in front of variable names. Thus, "apply foo to arg". Pairwise binding grammars ------------------------- A PBG consists of a set of syntax types and a pair of tables B and C. The syntax types jointly and exhaustively classify the possible tokens. For example, in K the type of the token + is v: + is a verb. B[x;y] is the strength of the binding between entities of types x and y. C[x;y] is the type of the result of binding entities of types x and y. The PBG algorithm operates on a type vector. Given a list of tokens, the type vector for that list is produced by replacing each token in the list with its type. Given a type vector as input, algorithm produces a type as its output if the type vector is well-formed, and signals a syntax error if it is not. The PBG algorithm itself is very simple. At each step, we reduce by one the length of the type vector by replacing a pair of types in the vector by a single type. Intuitively, we choose this pair by scanning the type vector from the right, computing the binding strength of each pair, and checking whether that number is less than the one previously computed. When the binding strength drops, we replace the pair of types immediately to the right with the type of the singleton which results from binding that pair. Here is a schematic example: First, compute the binding strength vector from the type vector: a b c d e f g h i / schematic type vector j k l m n o p q / schematic binding strength vector Then, for each pair in the binding strength vector, see whether the left element is less than the right element: j .(`a;();:;b) indexed value a[b;...;b] -> .(`a;(b;...;b)) indexed assignment a[b;...;b]:c -> .(`a;(b;...;b);:;c) function application a[b;...;b] -> .(`a;(b;...b)) list-notation (a;...;a) -> (,a),...,,a Thus purified, K contains three syntactic types: n nouns names, literals, &c. v 20 verbs ~!@#$%^&*-_+=|:,<.>? a 3 adverbs /\' The binding strengths are: n v a - - - n 1 3 4 v 2 1 4 a 0 0 0 Noun-adverb and verb-adverb > noun-verb > verb-noun > noun-noun and verb-verb > adverb-anything. The result types are: n v a - - - n n v v v n v v a Noun-adverb and verb-adverb are verbs; noun-verb and verb-noun are verbs; noun-noun is a noun; verb-verb is a verb; adverb-anything and anything- adverb are nothing. Binding table first: B[`n`v`a;`n`v`a]:(1 3 4;2 1 4;0 0 0) The binding table B is a two-dimensional dictionary whose axes are the syntactic types of pure K. To find the binding strength for verb-adverb, index B by the symbols of those types, using either access: B .`v`a 4 or high-calorie bracket indexing: B[`v;`a] 4 Observe that the leading axis of B is the left type, and the second axis the right. Next, the type table: C[`n`v`a;`n`v`a]:(`n`v`v;`n`v`v;```) The type table C is a two-dimensional dictionary with the same structure as B. To find the type of the result of binding an adverb to a verb on its left: C .`v`a `v Now let's type the tokens of an expression in pure K: +/'a*-b+c / sum each of a times negate b plus c vaanvvnvn We'll represent the types as a vector of symbols, reversed relative to the token list: q:|`v`a`a`n`v`v`n`v`n q `n`v`n`v`v`n`a`a`v The type vector is reversed because, while we want to attack the expres- sion from the right, it is easier to program the scan from the left. Next, we need a function which, given a type vector, returns the index of the pair to bind. pair:{(<':{B[x;y]}':x)?1} pair q 1 That is, the first pair to bind is q 1 2, which correspond to b + in the token list. [ How does pair work? Let x be the reversed type vector: x:`n`v`n`v`v`n`a`a`v {B[x;y]}':x / index B pairwise by x 2 3 2 1 3 0 0 4 <':{B[x;y]}':x / pairwise less-than 0 1 1 0 1 0 0 (<':{B[x;y]}':x)?1 / first 1 = drop in strength 1 ] The function bind uses the result of pair to replace the bound pair with the type of the binding: bind:{,/.[(0,i+0 2)_ x;1;:;C .|x 0 1+i:pair x]} bind q `n `v `v `v `n `a `a `v Observe that `n`v`n... has been correctly reduced to `n`v... . [ How does bind work? We've located the point in the type vector where binding strength drops. We want to replace a pair with a singleton, the type of the result of binding that pair. We aim to partition the type vector into three subvectors: the first subvector is everything up to the pair to bind; the second subvector is the pair to bind; and the third subvector is the rest. Once we have this partition, we can replace the middle subvector with the result-type, which is C indexed by the reverse of the types to bind. The main subexpression of bind is an indexed amend of the partitioned vector: .[indices _ data;1;:;new] indices _ data cuts the data into partitions; 1 is the index into those partitions (remember: origin 0!); : is assignment, or replace; and new is the data to assign. The partition is then razed using ,/ ] The reduction should continue until a single symbol remains, which then represents the type of the result of evaluating the expression: type:{*{1<#x}bind/x} type q `n [ How does type work? K's extended reduce is called "over". Let x be any data, f be any monadic function, and g be any monadic func- tion returning 0 or 1. Then g f/x applies f to x until g x is not 1. The result of f x is fed back to f as its argument until g x is false. Hence, bind x until the length of the result is 1. The initial * says: return the first element of the result. ] To see the intermediate steps, scan bind over the type vector until the result has length 1: {1<#x}bind\q (`n `v `n `v `v `n `a `a `v `n `v `v `v `n `a `a `v `n `v `v `n `a `a `v `n `v `n `a `a `v `n `v `a `a `v `n `a `a `v `n `a `v `n `v ,`n) Pure K + parentheses -------------------- In pure K, parentheses turn verbs into nouns. For example: 3#(+) (+;+;+) The result, a list, contains nouns as parts. The expression which pro- duces this list is correctly parsed as follows: 3#(+) ----- nv(v) / type each token nvn / (v) -> n vn / nv -> v n / vn -> n To type an expression containing parentheses, recursively replace each parenthesized token list with a one element list of the type vector of that list. For example: (+ / ' a * - b + c) % + / a * b + c (,`v`a`a`n`v`v`n`v`n),`v`v`a`n`v`n`v`n We won't bother to reverse the type vectors, leaving that job to the parsing function. The algorithm to parse a general type list is: parse each sublist; if it completes, type it as a noun; parse the resulting type vector. The K functions which implement this algorithm are: list:{type@|@[x;&~@:'x;noun@|:]} noun:{:[4:x;type x;(_f@|:)'x];`n} Two examples: Parse (+/'a*-b+c)%+/a*b+c: r:(,`v`a`a`n`v`v`n`v`n),`v`v`a`n`v`n`v`n list r `n Parse 3#(+): s:`n`v,,,`v s (,`v `v `n) list v `n [ How does list work? The function applies type to the reverse of an amend of its argument: type@| ... The amend expression, @[x;&~@:'x;noun@|:] says: amend each non-atomic part of x with the noun-reverse of that part. For example, if x is r above: (`n `v `n `v `n `a `v `v `n `v `n `v `v `n `a `a `v) then x 8, corresponding to the parenthesized subexpression, is the only non-atomic item. To that item we apply noun-reverse. noun applies self-reverse recursively until it reaches a non-list; that argument is parsed. But no matter what the result, noun returns `n. That is because parentheses convert all types to noun. ] The complete set of K functions required to implement a PBG is: B[`n`v`a;`n`v`a]:(1 3 4;2 1 4;0 0 0) C[`n`v`a;`n`v`a]:(`n`v`v;`n`v`v;```) type:{*{1<#x}bind/x} bind:{,/.[(0,i+0 2)_ x;1;:;C .|x 0 1+i:pair x]} pair:{(<':{B[x;y]}':x)?1} list:{type@|@[x;&~@:'x;noun@|:]} noun:{:[4:x;type x;(_f@|:)'x];`n}