bagsql ====== 0. overview 1. tables and keytables 2. ksql operators 3. ksql functions 4. ksql joins 5. parsing 6. bagsql 7. transactions 8. debugging 9. notes 0. overview bagsql is a set of bag server functions which provide for the parsing and evaluation of k3 expressions on bags. the k3 expressions can include embedded queries in ksql, the k4 sql dialect, named there 'q-sql'. for example, where test.t is a bag whose value is a table or keytable (see below), then the following expression is evaluated against the bag test.t and returns a keytable to the calling function: {select p:sum i,q:sum i+1,r:avg i,s:sum i+j by f,g from test.t where g _lin 2 3 4,h>.5} 1. tables and keytables as we know, a k3 table is a dictionary of equal-length lists: t:.+(`f`g;(1 1 2 2 2;`a`a`b`b`c)) show t f g - - 1 a 1 a 2 b 2 b 2 c in k4, a table is the flip of a dictionary of equal-length lists, and a keytable is a dictionary whose domain and range are tables of equal length. the rows of the domain table are assumed to be unique. think of each row of the domain table as the key of the corresponding row of the range table. in k3 we can simulate k4 keytables with a pair of equal-length tables: d:.+(`first`last;(`ed`fred`ned`fred`ed;`smith`jones`smith`green`white)) r:.+(`age`height`weight;(50 30 41 42 60;6.2 6.1 5.8 5.9 6.2;110 120 150 190 200)) k:(d;r) show k first last | age height weight ----- ----- + --- ------ ------ ed smith | 50 6.2 110 fred jones | 30 6.1 120 ned smith | 41 5.8 150 fred green | 42 5.9 190 ed white | 60 6.2 200 we have two functions, key and dekey, which make keytables from tables and vice versa: show t:dekey k first last age height weight ----- ----- --- ------ ------ ed smith 50 6.2 110 fred jones 30 6.1 120 ned smith 41 5.8 150 fred green 42 5.9 190 ed white 60 6.2 200 k~key[t;`first`last] 1 we also have functions keys (0#` for anything other than a keytable, else !*: for a keytable) and iskey which is 1 for keytables else 0. unlike k4 keytables, there is no primitive support (i.e. x@y, x . y) for indexing the range of a keytable with the elements of the domain. 2. ksql operators in this section i give brief descriptions and examples of the six basic ksql operators - select, update, delete, upsert, insert, and exec. for additional information you may consult jeff borror's chapter on q-sql in the online version of Q for Mortals: https://code.kx.com/q4m3/9_Queries_q-sql/ naturally, there are some syntactic differences to negotiate, particularly in how dictionaries and tables are defined, but the k4 q dialect should be obvious to k3 programmers. furthermore, k4 contains many functions which are not included in k3 ksql. as needs arise, we can add them to our system or find equivalent methods native to k3, but for now we will want to see how useful the basic operations are in the context of the 1010 bag server[1]. 2.0 preliminaries ksql expressions (those beginning with select, update, delete, exec) must be enclosed in curly braces, thus: {select x from y} in what follows, i will elide the braces except in cases where actual execution is shown. each of the ksql primitives is, ultimately, a function of four arguments: select by from where e.g. select f+1,g by x,y,z from t where a<10,b>20 ^^^^^ ^^^^^ ^ ^^^^^^^^^ note that elements within an argument (other than t) are separated by commas. also observe that elements obey k syntax: select (f*2)+g-1 .. ultimately, each of the primitives is implemented by a k function: select:{[t;c;b;a] .. update:{[t;c;b;a] .. delete:{[t;c;b;a] .. exec:{[t;c;b;a] .. insert:{[t;d] .. upsert:{[t;u] .. the parser extracts the arguments from a ksql expression, parses them, and passes them as values to the relevant function. 2.1 select all the forms of select: a) select p:sum i,q:sum i+1,r:avg i,s:sum i+j by f,g from t where f=`c,g _lin 2 3 4,h>.5 b) select by f,g from t where f=`c,g _lin 2 3 4,h>.5 c) select p:sum i,q:sum i+1,r:avg i,s:sum i+j from t where f=`c,g _lin 2 3 4,h>.5 d) select from t where f=`c,g _lin 2 3 4,h>.5 e) select p:sum i,q:sum i+1,r:avg i,s:sum i+j by f,g from t f) select by f,g from t g) select p:sum i,q:sum i+1,r:avg i,s:sum i+j from t h) select from t a, b, e, and f return keytables whose domain is a table of unique pairs of f and g. a and e are aggregations (p, q, r, s), b and f group the columns of the range table. c, d, g, and h return tables. c and g return dictionaries of atoms. for example, the values parsed from the arguments of a), and how they are passed to the select function: t:@[_n;`f`g`h`i`j;{:[@z;x _draw z;z x _draw#z]}[100];(`a`b`c`d`e;7;0;0;0)] / table c:((=;`f;,`c);(_lin;`g;2 3 4);(>;`h;.5)) / where clause b:`f`g / by clause a:.+(`p`q`r`s;((+/;`i);(+/;(+;`i;1));({(+/x)%#x};`i);(+/;(+;`i;`j)))) / from clause r:select[t;c;b;a] 2.2 update update takes the same arguments as select but interprets them differently. a) update p:g*11,q:g+1111 by f,g from t where f=`c,g _lin 2 3 4,h>.5 b) update by f,g from t where f=`c,g _lin 2 3 4,h>.5 c) update p:g*11,q:g+1111 from t where f=`c,g _lin 2 3 4,h>.5 d) update from t where f=`c,g _lin 2 3 4,h>.5 e) update p:g*11,q:g+1111 by f,g from t f) update by f,g from t g) update p:g*11,q:g+1111 from t h) update from t statements b, d, f, and h return the same result as the corresponding select statements in 2.1. 2.3 delete a) delete from t where f=`c,g _lin 2 3 4,h>.5 b) delete f,g from t a) deletes all rows where the condition holds; b) deletes columns f and g. 2.4 upsert upsert and insert are dyadic functions and can be used as verbs, i.e. infix to their arguments. example: show t k | f g h - + -- --- - 1 | 10 60 a 2 | 20 70 b 3 | 30 80 c 4 | 40 90 d 5 | 50 100 e show u k h f - ----- --- 3 x 100 9 yyyyy 222 2 z 300 show eval"t upsert u" k | f g h - + --- --- ----- 1 | 10 60 a 2 | 300 70 z 3 | 100 80 x 4 | 40 90 d 5 | 50 100 e 9 | 222 yyyyy for u.k = 3 and 2, upsert replaces h and f; for u.k = 9, upsert appends a new record with nulls for missing columns. 2.5 insert show t:.+(`k`f;(!5;!5)) k f - - 0 0 1 1 2 2 3 3 4 4 show eval"t insert 100 200" k f --- --- 0 0 1 1 2 2 3 3 4 4 100 200 show eval"t insert(100 200 300;400 500 600)" k f --- --- 0 0 1 1 2 2 3 3 4 4 100 400 200 500 300 600 2.6 exec exec in k3-ksql works a little differently than it does in k4. in k3 there is no distinction between a table and a dictionary of equal length lists. therefore, with one exception to be noted below, given the same arguments, exec and select return exactly the same results. moreover, k3-sql select can return a dictionary of unequal length columns, which in q-sql is reserved exclusively for exec. the difference then between select and exec in k3-ksql is that in the case where the result of the query has a single column, exec returns the data alone, whereas select returns a one-column table. this highlights the most useful application of exec, which is to return a list in a where clause. show a.t f g - - 1 5 2 6 2 7 4 8 eval"{exec f from a.t}" 1 2 2 4 show eval"{exec distinct f,g from a.t}" f1 : 1 2 4 g : 5 6 7 8 2.6 nested queries show c.t:.+(`f`g;(!10;10#`a`b`c`d)) f g - - 0 a 1 b 2 c 3 d 4 a 5 b 6 c 7 d 8 a 9 b show d.t:.+(`x`y;(`a`b`c`c`d`e`d`f`f`f`a`b;!12)) x y - -- a 0 b 1 c 2 c 3 d 4 e 5 d 6 f 7 f 8 f 9 a 10 b 11 ksql expressions can be nested in the where clause: show eval"{select from c.t where g _lin {exec x from d.t where y<8,y>2}}" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ f g - - 2 c 3 d 6 c 7 d or in the table clause: show eval"{select y+1 from {select from d.t where y<5}}" ^^^^^^^^^^^^^^^^^^^^^^^^^^^ f1 -- 1 2 3 4 5 or in the from clause: show eval"{select sum {exec y from d.t} from d.t}" ^^^^^^^^^^^^^^^^^ f2 : 66 k can be used to construct values (but not column names): show eval"{select from(.+(`f`g;(`a`b`c`c;!4)))where f _lin`c`g}" f g - - c 2 c 3 show eval"{select from(.+(`f`g;(4#{exec x from d.t};!4)))}" f g - - a 0 b 1 c 2 c 3 3. ksql functions ksql contains a set of special purpose functions to be used in query expressions. dyads can be used infix, as verbs. 3.1 distinct show f.t:.+(`x`y;(1 2 2 3;`a`b`b`c)) x y - - 1 a 2 b 2 b 3 c show eval"distinct f.t" x y - - 1 a 2 b 3 c .ksql.distinct 1 2 2 3 1 2 3 3.2 xasc and xdesc show e.t:.+(`x`y;(`a`b`c`d;4 9 3 0)) x y - - a 4 b 9 c 3 d 0 show eval"`y xdesc{select from e.t}" x y - - b 9 a 4 c 3 d 0 3.3 xcol and xcols show x.t:.+(`c1`c2`c3;(`a`b`c;10 20 30;1.1 2.2 3.3)) c1 c2 c3 -- -- --- a 10 1.1 b 20 2.2 c 30 3.3 show eval"`new1`new2 xcol x.t" new1 new2 c3 ---- ---- --- a 10 1.1 b 20 2.2 c 30 3.3 show eval"`c3 xcols x.t" c3 c1 c2 --- -- -- 1.1 a 10 2.2 b 20 3.3 c 30 show eval"`c3`c2 xcols x.t" c3 c2 c1 --- -- -- 1.1 10 a 2.2 20 b 3.3 30 c 3.4 fby show data.p:@[_n;`name`color`city`weight;{:[@z;x _draw z;z x _draw#z]}[30];(`screw`bolt`nut;`red`white`blue;`london`nyc`rome`paris;100 _draw 1000)] name color city weight ----- ----- ------ ------ screw red nyc 266 screw red nyc 681 nut blue paris 0 nut white nyc 170 screw red nyc 834 nut white nyc 511 bolt red rome 879 screw white nyc 107 screw white nyc 839 screw red nyc 38 screw white rome 556 nut blue paris 400 nut red nyc 757 bolt red rome 335 screw blue paris 320 bolt white london 736 bolt white london 244 screw red london 552 bolt blue rome 839 nut white london 299 screw white london 736 screw red rome 232 bolt red london 266 nut red nyc 718 screw red nyc 280 screw red paris 895 bolt blue nyc 879 bolt blue paris 124 screw red rome 395 nut blue paris 266 show run"{select from data.p where weight=(max;weight)fby city}" name color city weight ----- ----- ------ ------ bolt red rome 879 bolt white london 736 screw white london 736 screw red paris 895 bolt blue nyc 879 show run"{select from data.p where weight=(max;weight)fby city,color=`blue}" name color city weight ---- ----- ---- ------ bolt blue nyc 879 show run"{select from data.p where weight=(max;weight)fby{select name,city from data.p},color=`blue}" name color city weight ---- ----- ----- ------ nut blue paris 400 bolt blue nyc 879 bolt blue paris 124 4. ksql joins 4.0 preliminaries k3-ksql implements all the q-sql joins except for window-join (wj) and its variants. should a need for this form of join arise we can implement it. the ksql joins are: lj - left join ij - inner join pj - plus join ej - equijoin uj - union join aj - asof join (also aj0 and asof) the next several sections document the behavior of these operations through examples. for individual explanations see borror chapter 9.9. 4.1 lj show t:.+(`k`f;(1 1 1 4 4 3 3 3 3 3;!10)) k f - - 1 0 1 1 1 2 4 3 4 4 3 5 3 6 3 7 3 8 3 9 show u:key[.+(`k`g;(1 2 3;`A`B`C))]`k k | g - + - 1 | A 2 | B 3 | C show lj[t]u k f g - - - 1 0 A 1 1 A 1 2 A 4 3 4 4 3 5 C 3 6 C 3 7 C 3 8 C 3 9 C 4.2 ij using t and u from 3.1: show ij[t]u k f g - - - 1 0 A 1 0 A 1 0 A 1 2 C 1 2 C 1 2 C 1 2 C 1 2 C 4.3 pj show t:.+(`k`a`b`c;(`c`b`a;100 200 300;10. 20. 30.;1 2 3)) k a b c - --- -- - c 1000 1 1 b 2000 2 2 a 3000 3 3 show u:key[.+(`k`a`b;(`a`b;10 20;1.1 2.2));`k] k | a b - + -- --- a | 10 1.1 b | 20 2.2 show pj[t]u k a b c - --- ---- - c 100 10 1 b 220 22.2 2 a 310 31.1 3 4.4 ej show t:.+(`k`c`q;(1 2 2 3 4;10 11 20 30 40;`x`xx`y`z`w)) k c q - -- -- 1 10 x 2 11 xx 2 20 y 3 30 z 4 40 w show u:.+(`k`c`v;(2 2 2 4 5;200 300 301 400 500;2 3 31 4 5)) k c v - --- -- 2 200 2 2 300 3 2 301 31 4 400 4 5 500 5 show ej[`k;t]u k c v q - --- -- -- 2 200 2 xx 2 200 2 xx 2 300 3 xx 2 300 3 y 2 301 31 y 2 301 31 y 4 400 4 w 4.5 uj show t:.+(`x`y`z;(`a`b;1 2;`x`y)) x y z - - - a 1 x b 2 y show u:.+(`x`y`w;(`c`d;3 4;`X`Y)) x y w - - - c 3 X d 4 Y show vvv:uj[t]u x y z w - - - - a 1 x b 2 y c 3 X d 4 Y show t:key[.+(`k`v`w;(1 2 3;`a`b`c;10 20 30))]`k k | v w - + - -- 1 | a 10 2 | b 20 3 | c 30 show u:.+(`k`w`x;(3 4;300 400;3.3 4.4)) k w x - --- --- 3 300 3.3 4 400 4.4 show uj[t]u k | v w x - + - --- --- 1 | a 10 2 | b 20 3 | c 300 3.3 4 | 400 4.4 4.6 aj show t:.+(`t`s`q;(1 3 4;`m`i`g;100 200 150)) t s q - - --- 1 m 100 3 i 200 4 g 150 show u:.+(`t`s`p;(0 1 1 2;`i`m`m`i;100 99 101 98)) t s p - - --- 0 i 100 1 m 99 1 m 101 2 i 98 show aj[t;u]`s`t s t q p - - --- --- i 3 200 98 m 1 100 101 g 4 150 show aj0[t;u]`s`t s t q p - - --- --- i 2 200 98 m 1 100 101 g 4 150 show t:.+(`t`s`q`p;(1 3 4;`m`i`g;100 200 150;45 160 55)) t s q p - - --- --- 1 m 100 45 3 i 200 160 4 g 150 55 show asof[t;.+(`s`t;(`m`i;1 3))] q p --- --- 100 45 200 160 5. parsing the k3 parser has been extended to support ksql syntax. ksql statements are enclosed in curly braces: {select f,g by h from t where f>10} the dyadic joins lj, ij, ej, pj can be written infix, as verbs: {select f,g by h from t where f>10} uj {select g,h from u where f<100} 6. bagsql the bag server function 'eval' takes zero or more keys K and a string representing a k or ksql expression e, and executes e over the bags K. example: / bag server functions put:{bs.ba[(`put;(x;y))];} eval:{bs.ba[(`eval;(x;y))]2} / keys A:("test";,`a) B:("test";,`b) F:("test";,`f) / put atoms and lambda put[A]10 put[B]20 put[F]{x+y} / evaluate k eval[(A;B;F);"test.f[test.a;test.b]"] 30 eval[(A;B;F);"test.a+test.b"] 30 / put a table T:("test";,`t) put[T]@[_n;`f`g`h`i`j;{:[@z;x _draw z;z x _draw#z]}[20];(`a`b`c`d`e;7;0;0;0)] / evaluate ksql show eval[,T;"{select a:sum i from test.t}"] a : 10.6990852192976 show eval[,T;"{select p:sum i,q:sum i+1,r:avg i,s:sum i+j by f,g from test.t where g _lin 2 3 4,h>.5}"] f g | p q r s - - + ----------------- ---------------- ----------------- ----------------- d 3 | 1.66140114329755 3.66140114329755 0.830700571648777 2.47279774164781 b 4 | 0.287223003804684 1.28722300380468 0.287223003804684 0.622003682423383 c 2 | 0.326501911040395 1.3265019110404 0.326501911040395 1.23171932203695 e 2 | 0.291887435596436 1.29188743559644 0.291887435596436 0.551521756220609 d 2 | 0.879141070879996 1.87914107088 0.879141070879996 1.66957073658705 for pedagogical purposes, and for convenience, the bag server also supports the functional form of select, which take parse-trees as arguments. for example: select:{[t;c;b;a]bs.ba[(`select;(t;(c;b;a)))]2} / where-clause, by-clause, calculated columns c:((=;`f;,`c);(_lin;`g;2 3 4);(>;`h;.5)) b:`f`g a:.+(`p`q`r`s`i_;((+/;`i);(+/;(+;`i;1));({(+/x)%#x};`i);(+/;(+;`i;`j));`i_)) show select[T;c;b;a] / select p:sum i,q:sum i+1,r:avg i,s:sum i+j by f,g from t where f=`c,g _lin 2 3 4,h>.5 f g | p q r s i_ - - + ----------------- --------------- ----------------- ---------------- -- c 2 | 0.326501911040395 1.3265019110404 0.326501911040395 1.23171932203695 7 7. transactions in the bag server assignments are always made to bags. e.g. let test.a = 10 and let A = ("test";,`a), then eval[A]"test.a:test.a+1;test.a+100" 11 111 the bag test.a is updated, then test.a+100 is evaluated. in ksql assignment returns the value assigned. we call the execution of one or more statements s, possibly containing assignments, a transaction, since the first failure encountered in the evaluation of s causes the sequence to abort and any updates made in the course of evaluation are discarded. e.g. again let test.a = 10, then eval[A]"test.a:test.a+1;!-1;test.a+100" a domain error on !-1 interrupts evaluation, and the bag is not updated: get A 10 new bags can be created using assignment. e.g. again, let test.a = 10, then: eval[A]"test.k:test.a+1" 11 get("test";,`k) 11 8. debugging submitting a bad query to the bag server results in the following error message in the client: "{I ksql eval error: }" for example T:("test";,`t) e:{bs.ba[(`eval;(x;y))]2} e[,T;"!-1"] "{I ksql eval error: domain" there are two approaches to resolving ksql errors. first, cd to the subdirectory bs of the working copy of 1010: cd /bs then: e eval the eval.k script establishes a connection to the running bag server (if there is one) and also loads the scripts necessary for standalone ksql execution: if[1>#_i;`0:"provide 1010 version on command line\n";_exit[1]] .l.version:_i[0] \l kzsload .l.load( `ut_path ; `k1010 ) .l.load( `dblib_path ; `dbm ; `dbi ; 1 ) .l.load( `dblib_path ; `opsxml ; `opx ; 1 ) .l.load( `lib_path ; `xml ; `xml ; 1 ) .l.load( `bs_path ; `bsload ; `bsl ; 1 ) if[*.[bsl.connect;($.l`default_bag_server;`bs);:] `0:,"!!!!! Bag server not running !!!!!";_exit 0] \d ksql \l ksql \d .k \d kparse \l kparse \d .k KSQL:`.k.ksql \l show run:{[e]ksql.eval kparse.ksql e} get:{bs.ba[(`get;,x)]2} put:{bs.ba[(`put;(x;y))];} eval:{bs.ba[(`eval;(x;y))]2} select:{[t;c;b;a]bs.ba[(`select;(t;(c;b;a)))]2} 'run' executes a ksql expression in the workspace. 'get' and 'put' read and write keys to the bag server. 'eval' executes a ksql expression in the bag server. 'select' takes parse-trees as arguments and executes a selection in the bag server. for example, define the keys: A:("test";,`a) B:("test";,`b) F:("test";,`f) put values at keys in the bag server: put[A]10; put[B]20; put[F]{x+y}; get the values at those keys: get A 10 get B 20 get F {x+y} evaluate in the bag server: eval[(A;B;F)]"test.f[test.a;test.b]" 30 eval[(A;B)]"test.a+test.b" 30 define local variables: test.a:10 test.b:20 test.f:{x+y} run in the workspace: run"test.f[test.a;test.b]" 30 run"test.a+test.b" 30 should a ksql error occur in the bag server, get the values of the keys and run the query locally. a more substantial example: define the bag: test.t:@[_n;`f`g`h`i`j;{:[@z;x _draw z;z x _draw#z]}[20];(`a`b`c`d`e;7;0;0;0)] T:("test";,`t) put[T]test.t evaluate in the bag server: show eval[,T;"{select p:sum i,q:sum i+1,r:avg i,s:sum i+j by f,g from test.t where g _lin 2 3 4,h>.5}"] show eval[,T;"`f xasc test.t"] show eval[,T;"`j`i xcols test.t"] run locally: show run"{select p:sum i,q:sum i+1,r:avg i,s:sum i+j by f,g from test.t where g _lin 2 3 4,h>.5}" show run"`f xasc test.t" show run"`j`i xcols test.t" 9. notes [1] and elsewhere, e.g. translating from ksql to the 1010 query language.