// intermediate form (IF) from hierarchical (HN) or functional (FN) notation. / variables P:"\\";Q:"\"";I:".";D:"." / escape, quote, indent, dot L:"{";R:"}";A:"[";B:"]" / functional open close pairs S:":";T:";" / set, separate / constants CD:("";F:"/";E:"=" / XML open, close, end, sep X:" ";Y:"\n";Z:"\t\r" / blanks K:"" / empty / utilities sqz:{y@&~&/'y _lin\:K,x} / no blank rows cut:{1_'(&(~quo y)&y=x)_ y:x,y} / cut y on x lc:{(y _lin K,x)?0} / leading characters dlc:{lc[x;y]_ y} / delete leading characters dtc:{|dlc[x;|y]} / delete trailing characters dlb:{dlc[X,Y,Z]x} / delete leading blanks dtb:|dlb@|: / delete trailing blanks db:dlb dtb@ / delete leading and trailing blanks lr:{1_-/+\'(~quo y)&/:x=\:y:X,y} / left-right hierarchy qu:{~b=b|(~=)\b:(~-1!es x)&x=Q} / quoted contents es:{1_0{(~x)&y=P}\x} / mark valid escapes quo:{qu[x]|x=Q} / with quotes qe:{:[x~Q,Q;K;x]} / "\"\"" -> "" dq:{:[Q~*x;1_-1_ x;x]} / ".." -> .. uq:{@[x;&quo x;:;X]} / .."xxx".. -> ..xxx.. nl:{@[x;&(~quo x)&Y=x;:;T]} / \n -> ; bl:{@[x;&(~quo x)&x _lin Y,Z;:;X]} / blanks -> " " at:{:[1=#x;*x;x]} / ,x -> x qf:{((~quo x)&x=y)?1} / x?unquoted y xq:{:[Q~*x;. x;x]} / tolerate unquoted strings / converters IH:{ho'hb sqz[X,Z]cut[Y]x} / intermediate form <- hierarchical form ho:{ if[Q~*a:dlc[I]x 0;:. a] / x=".." n:`$(k:a?S)#a / name if[1=#x;:(n;.();,dq db(k+3)_ a)] / singleton object (s::v) b:oa'd:(&(*i)=i:lc[I]'c)_ c:1_ x / b[i]=1 iff d[i]=attr=name value else object a:. nv',/d@&b;c:_f'd@&~b / attrs and objects (n;a;c)} / name attrs objects hb:{(&~x[;0]=I)_ x} / hierarchy blocks nv:{(`$dlc[I]i#x;dlb qe(1+i:qf[x]S)_ x:db x)} / "name value" -> (`name;"value") oa:{:[S=*n:(1+n?S)_ n:dlb x 0;0;~&/X=n]} / attrtibute? HI:{1_,/Y,'{1_ k3[Y]. x}'x} / hierarchical form <- intermediate form k3:{[x;y;z;w]x,($y),S,(,/ka[x]'[$!z;z[]]),,/k0[x,I]'w} / recursion step ka:{x,I,y,S,:[#z;z;Q,Q]} / attribute-name k0:{:[4:y;x,5:y;k3[x]. y]} / attribute-value IF:{ro@fo'fs nl x} / intermediate form <- functional form fo:{ if[ name + body if[Q~*x;:(n;.();,. x)] / n:"..." i::[A=*x;uq[x]?/B;0];a:dtb dlc[X,T,A]i#x;o:dtc[T,R]db(i+1)_ x / body -> attributes + objects (n;al a;ol o)} / -> intermediate form fs:{dlc[X,T]'(0,1+-1_&0{(x=0)&y=1}':lr[L,R]x)_ x:T,x} / functions al:{:[~#x;.();@[.{(`$x;y)}.'cut[S]'cut[T]x;_n;dq]]} / attribute list ol:{*(0<#*-1#)(or .)/(();x)} / object list or:{y:db@:[T=*y;1_ y;y];i:no y;(x,,fo i#y;i _ y)} / object recursion no:{:[~(i:x?L) xml xo:{:[#z;:[~-3=4:*z;x;Y _in*z;x;K],O,F,($y),C;F,C]} / objects close xa:{:[#x;C;K]} / attribute close xb:{:[~-3=4:*y;,/xml[x,X].'y;st[x,X;*y]]} / string or objects st:{:[(*y)_in X,Y,Z;x,CD[0],y,CD 1;Q,y,Q]} / CDATA or x di:{,/{X,($x),E,Q,y,Q}'[!x;x[]]} / dict -> .. x="y" .. IX:{*ir[();()]bl x} / intermediate form <- xml ir:{(ij .)/(x;y;z)} / (result;tags;xml) ij:{:[~#z:dlb z;(x;y;z);ik[x;y].(0,1+qf[z]C)_ z]} / terminate or (first;rest) ik:{[x;y;z;w]:[f z;fx;g z;gx;h z;hx;ix][x;y;z]w} / forms: ../> or g:F=*|2# / "Example Intermediate Form" 2. Hierarchical Notation (HN). A query in IF has three components - the name, the attribute dictionary, and a list of zero or more queries. We can think of the attribute dictionary and the query-list as children of the name, and use indentation to show parent-child relations. For clarity, let's use "." instead of " " to mark and count the levels of indentation: dynamic: .var:1 .layout: ..note: ..."Example Intermediate Form" ..widget: ...name_:select ...class_:slider ...min_:1 ...max_:3 ...step_:1 ...fill_:up ...value_:@var ..widget: ...class_:grid ...label_:test ...base_:x.y.z ...update_:manual ...invmsg_:'Update!' ...sel: ....value:store={@var} ..widget: ...class_:button ...type_:submit ...text_:Update! We use ":" to mean "has the value". Thus, the attribute var has the value 1. Empty attributes have the form: name:"" No more than one name-value pair is permitted on a single physical line; HN does not support literal separators. Notice that the use of quotation marks is optional, required only when we need to quote a value which contains unprintable elements, for example, " " or newline. The functions HI and IH convert IF to and from HN: i~IH HI i 1 3. Functional Notation (FN). We forego the iconic representation of hierarchy by using mated punctuation symbols to represent the parent-child relation, exactly as we do in K with list and lambda notations. For clarity, I will use two sets of symbols: "{" and "}" to enclose the pair of objects which are children of the name, and "[" and "]" to enclose the set of attributes. As in HN, I will use ":" to mean "has the value". In FN ";" and newline are equivalent marks which separate name-value pairs. Thus, the functional notation (FN) for i is: dynamic:{[var:1] layout:{ note:{"Example Intermediate Form"} widget:{[name_:select;class_:slider;min_:1;max_:3;step_:1;fill_:up;value_:@var]} widget:{[class_:grid;label_:test;base_:x.y.z;update_:manual;invmsg_:'Update!'] sel:{[value:store={@var}]}} widget:{[class_:button;type_:submit;text_:Update!]}}} Moreover, since the value of an attribute is always simple (i.e. never a list), we can use a single set of mated symbols without fear of ambiguity: dynamic:((var:1) layout:( note:("Example Intermediate Form") widget:((name_:select;class_:slider;min_:1;max_:3;step_:1;fill_:up;value_:@var)) widget:((class_:grid;label_:test;base_:x.y.z;update_:manual;invmsg_:'Update!') sel:((value:store={@var}))) widget:((class_:button;type_:submit;text_:Update!)))) Notice once again that the use of quotation marks is optional, required only when we need to quote a value which contains either unprintable characters or one of the literal symbols used to indicate grouping and separation. Empty attributes have the form: name: Unlike K lambdas, the attribute list need not be on the same physical line as the name: name: [s:v;.. The functions FI and IF convert IF to and from FN. i~IF FI i 1 Dotted names can be used to organize the named components of a query. The appearance of a name in a query causes the value associated with that name to be substituted in the query. For example: j:" base:{[table:pub.demo.weather.stations]} sel:{[value:state='NY']} x.t.widget:{[name_:select;class_:slider;min_:1;max_:3;step_:1;fill_:up;value_:@var]} x.u.widget:{[class_:grid;label_:test;base_:pib.demo.retail.item;update_:manual;invmsg_:'Please click Update!'] sel:{[value:store={@var}]}} x.v.widget:{[class_:button;type_:submit;text_:Update!]} y.w.layout:{[]x.t;x.u;x.v} dynamic:{[var:1]y.w}" IF j ((`base .,(`table;"pub.demo.weather.stations";) ()) (`sel .,(`value;"state='NY'";) ()) (`dynamic .,(`var;,"1";) ,(`layout .() ((`widget .((`name_;"select";) (`class_;"slider";) (`min_;,"1";) (`max_;,"3";) (`step_;,"1";) (`fill_;"up";) (`value_;"@var";)) ()) (`widget .((`class_;"grid";) (`label_;"test";) (`base_;"pib.demo.retail.item";) (`update_;"manual";) (`invmsg_;"'Please click Update!'";)) ,(`sel .,(`value;"store={@var}";) ())) (`widget .((`class_;"button";) (`type_;"submit";) (`text_;"Update!";)) ()))))) 4. XML `0:,XI i "Example Intermediate Form" i~IX XI i 1 See section 7 for an "analytical commentary" on this function. 5. API IH intermediate form <- hierarchical notation IF intermediate form <- functional notation HI hierarchical notation <- intermediate form FI functional notation <- intermediate form XI xml <- intermediate form IX intermediate form <- xml XH xml <- intermediate form <- hierarchical notation XF xml <- intermediate form <- functional notation IFH intermediate form <- functional or hierarchical notation or xml XFH xml <- intermediate form <- functional or hierarchical notation or xml 6. Literals. Special symbols used in HN and FN are settable. The defaults are: P "\\" escape Q "\"" quotation I "." indentation D "." name context L "{" open query list R "}" close query list A "[" open attribute list B "]" close attribute list S ":" has the value T ";" separator The literals which are not settable are: X " " blank Y "\n" newline Z "\t\r" tab, return K "" empty and the XML marks: CD:("" F "/"; E "=" 7. IX: Analytical Commentary. The official 1010 XML converters XML2IF and IF2XML have been implemented efficiently in C. IX and XI have been included in the kxml suite to simplify experimentation with the alternative notations FN and HN. This section contains a line-by-line analysis of the XML parser IX. IX:{*ir[();()]bl x} IX produces an IF from a character-vector representation of XML. The bl function replaces new line and carriage returns with blanks. ir:{(ij .)/(x;y;z)} The ir function is a case of monadic over, or "converge". The function-operand is (ij .), which maps the triple (x;y;z) to the arguments (called "x", "y", and "z") of ij. z is the XML string to be processed. y is a list of the tags processed so far. x is the IF processed so far. The idea behind ir is that we will process the XML up to the point where we reach a closing tag of the form , at which point we will search y from the end to find a matching : y index - ----- : : i : : The stretch of x from i to the end should then be grouped, and y truncated at i: x index - ----- : : A i : : B j thus: x y index - - ----- : : : (A;..;B) i The code in the analysis below makes use of certain utilities and global variables. These are defined in the kxml.k script. ij:{:[~#z:dlb z;(x;y;z);io[x;y].(0,1+qf[z]C)_ z]} We want IX to terminate when all of z has processed, hence the exit condition ~#z. Otherwise, we split z at the first unquoted occurrence of ">" (C) and process the parts. io:{[x;y;z;w]:[f z;fx;g z;gx;h z;hx;ix][x;y;z]w} f:F=*-2# g:F=*|2# h:O=*: The first part has four possible patterns: pattern is like predicate process with ------- ------- --------- ------------ self-closing: .. /> f x fx closing tag g x gx opening tag < ... h x hx none of the above next object ix fx:{[x;y;z;w](x,,ia -2_ z;y,,K;w)} A self-closing object has the form , so fx will return a triple (a;b;c). a is the result so far, appended to which is the IF triple (`xyz;atribute dictionary;()). b is the tag-stack, appended to which is "". c is the XML string to be processed. gx:{[x;y;z;w]i:il[y]z;ir[(i#x),,,/(*:;,1_)@\:i _ x;(i#y),,K]w} A closing tag is simply . We find the index i of the last occurrence in the stack of tags of the form