Memoization =========== 0. Memoization. Memoization recursively and exhausively matches an input query Q against a list of stored queries S1 .. Sn, each of which is associated with a base table U1 .. Un. In the case of a successful match (see section 1 for an explanation of matching) of Q with S, the ops of Q which match S are removed from Q, and the resulting query Q* uses S's base table U. In the very simplest case, query Q uses base table T and has ops O, and there is a stored query S which also uses base table T and has ops P, a subset of O, and has been materialized as U. Then the resulting memoized query has base table U and ops O-P (i.e. the ops which are in Q and not in S). There are two basic operations in memoize.k: a) set - a query can be stored in the bag server iff it has the form: : and furthermore does not contain a transpose op, a crosstab operation (tabu with cbreaks), or a willbe, link, amend, or merge with materialize.) b) get - a query can be matched against one or more queries in the bag server iff it has the form: : 1. Simple Case. Suppose the stored query S has been evaluated on source table T and the result table stored as U: S: A B C D E and suppose the input query, also to be evaluated on T, is Q: 1 2 D 3 C B 4 5 6 A E a b c then Q matches S if the result of evaluating: D C B A E 1 2 3 4 5 6 a b c is identical to the result of evaluating Q. In that case, the query S: 1 2 3 4 5 6 a b c is evaluated on U. 2. Exceptions and Conditions. There are three independent conditions a)-c) for determining whether an input query Q matches a stored query S. a) &/S _lin Q. In other words, all the ops of S must exist in Q. b) None of the non-matching ops in the initial matching segment of Q can be sticky. For example, Q *will not* match S if (in the schematic example above) any of 1-6 is sticky. For example, suppose S is: CA, Los Angeles, 10 CA, Los Angeles, 10 CA, San Francisco, 10 CA , San Francisco, 10 NJ , Newark, 10 CT, Hartford, 10 CT, Hartford, 10 CT, Hartford, 10 TX, Houston, 10 TX, Austin, 10
Then the following query A will match: memoizeget state city sales a ----- ------------- ----- -- CA San Francisco 10 20 but this query B will not: memoizeget state city sales a ----- ------------- ----- -- CA San Francisco 10 10 B will not match because the selection op, which is sticky, precedes the precomputed materialized table associated with the query S. N.B.: This can be optimized. a sticky op which occurs in the initial matching segment but which is not followed by an op which is sensitive to stickiness does not disqualify the match. NYI. c) Suppose the stored query S is A B C D E F. And suppose that C is sticky. Then, where {X .. Y} denotes any of the sequences containing all of X .. Y in any order, any of the queries {A B} C {D E F} will match Q. In other words, sticky ops like C must occur in the same relative positions in both queries, and bound the same non-sticky queries (although they are not required to be in the same order). 3. Column Order. There is no guarantee that the column order of the result query will match the column order of the input query. For example, suppose the input query Q is: a B where a and B are willbes with names x and y, and S is: B then the resulting query Q* will consist of: a The column order of Q will be x y, and that of Q* will be y x. We have three options in dealing with this: 1. Trace the resulting columns for each op in R, then append a colord to the resulting query S. 2. Disqualify input queries which do not terminate with a colord. 3. Do nothing, but document the behavior. 4. Recursive 'set'. Here the idea is that what we store in Memo can differ from the query which we execute and materialize. Suppose that we have the stored query S = A B C materialized as T and we now seek to store Q = A B C D and materialize it as U. We see that Q matches S at A B C. In this case we store Q but execute the new query consisting of base table T, opts = D, and materialize the result as U. 5. Recursive 'get'. Here the idea is that if a whole-query match fails, we still have opportunities to memoize by matching the subqueries of link, merge, or amend ops. Suppose that the input query Q is A B C and fails to match any query. And suppose that B is a link (or merge or amend) with ops. The memoization algorithm is invoked recursively on B`ops. 6. Exhaustive 'get'. Here the idea is that a memoized query can be capable of further memoization. For example, suppose that the input query Q is A B C D E can be memoized against the stored query R = A B materialized as T. The result is Q* = C D E with base table T. Now suppose there is a further stored query S = C D materialized as U. The result of memoizing Q* is E with base table U. We implement this idea with the function: memoize:{[t;o](memoize_ .)/(t;o)} 7. The Query Cache. Query results are materialized and the query and associated information is stored in a cache. We propose to store the cache as a table Memo in the bag server. The cache is a table with four columns: !Memo `t `o `p `d Each record of Memo represents a precomputed query. Memo.t[i] is the base table path of the i-th query. Memo.o[i] is the list of ops comprising the i-th query. Memo.p[i] is the materialized table path of the table computed by the i-th query. Memo.d[i] is a list of pairs of dependency information for all input tables to the i-th query. N.B. Memo.d is used in the matching algorithm to determine whether the cached query is "stale". In principle, it could also be used to automatically recompute all stale dependencies. For example, in the example above, the record for the query which computes the table xyz is: Memo[`t;2] `.db.mdb_regression_tester.mdb_regression_test_folder.auto3 Memo[`o;2] ((`willbe .((`name;`xx;) (`value;,"1";) (`spflag;0;) (`materialize;0;) (`label;"xx";) (`format;();) (`fixed;;) (`desc;,"1";))) (`willbe .((`name;`xy;) (`value;,"2";) (`spflag;0;) (`materialize;0;) (`label;"xy";) (`format;();) (`fixed;;) (`desc;,"2";))) (`willbe .((`name;`xz;) (`value;,"3";) (`spflag;0;) (`materialize;0;) (`label;"xz";) (`format;();) (`fixed;;) (`desc;,"3";)))) Memo[`p;2] `mdb_regression_test_folder.xyz Memo[`d;2] ,(`.db.mdb_regression_tester.mdb_regression_test_folder.auto3;-484054871) 8. Operation. The script memoize.k is loaded into the k tree directory .memo. The meta op indicating memoizeset or memoizget must be first op in the query (following the base op). The following code is added to Cover00: : ops[w]:{(`note;.();x)}'ops@w:&-3=4::'ops /deal with stray text by making it a note if[~1~`.lim@`memoize .memo.set[table]ops q:.memo.get[table]ops if[~ops~q 1 u:.+(`sid`uid`pswd;(sid;uid;dbd.dbi.pwe.EncryptPW[.,(`sid;sid);pw])) .memo.res[u]. q .opstack,:.((`type;`memoize);(`pre;(table;ops));(`post;q)) table:*q;ops:q 1]] : The following line is added to mdb_limits.k: /Memoize if[_n ~ @[`.lim;`memoize]; .lim[`memoize]:0];.limhelp[`memoize]:"1 if memoization is operative" and the following line to mdb_hints.k: if["memoize"~x;.lim[`memoize]:0] 9. Notes. Currently there is no provision for removing a table from the bag server memoization table. 'sets' append and modify records in the bag server memoization table and are stored immediately in the bag server. There is also no instrumentation for determining whether a query has been memoized, and if so, what the memoized query looks like.