Memoization =========== 0. Memoization. Memoization recursively and exhausively matches an input query Q against a list of stored queries S1 .. Sn, each of which is associated with a base table U1 .. Un. In the case of a successful match (see section 1 for an explanation of matching) of Q with S, the ops of Q which match S are removed from Q, and the resulting query Q* uses S's base table U. In the very simplest case, query Q uses base table T and has ops O, and there is a stored query S which also uses base table T and has ops P, a subset of O, and has been materialized as U. Then the resulting memoized query has base table U and ops O-P (i.e. the ops which are in Q and not in S). In the form of a diagram, where the first two rows are the inputs and the last row is the memoized output: query base ops -> table ----- ---- --- ----- S T P U Q T O ----------------------------- R U O-P See the Appendix: Use Cases below for a description of how to use Memoization. 1. Simple Case. Suppose the stored query S has been evaluated on source table T and the result table stored as U: S: A B C D E and suppose the input query, also to be evaluated on T, is Q: 1 2 D 3 C B 4 5 6 A E a b c then Q matches S if the result of evaluating: D C B A E 1 2 3 4 5 6 a b c is identical to the result of evaluating Q. In that case, the query R: 1 2 3 4 5 6 a b c is evaluated on U. 2. Exceptions and Conditions. There are four independent conditions a)-d) for determining whether an input query Q matches a stored query S. a) &/S _lin Q. In other words, all the ops of S must exist in Q. b) &/c _lin C, where c = !T, and T is the base table of the input query Q, and C = !U, where U is the materialized result of executing the stored table S. c) None of the non-matching ops in the initial matching segment of Q can be sticky. For example, Q *will not* match S if (in the schematic example above) any of 1-6 is sticky. For example, suppose S is: CA, Los Angeles, 10 CA, Los Angeles, 10 CA, San Francisco, 10 CA , San Francisco, 10 NJ , Newark, 10 CT, Hartford, 10 CT, Hartford, 10 CT, Hartford, 10 TX, Houston, 10 TX, Austin, 10
Then the following query A will match: state city sales a ----- ------------- ----- -- CA San Francisco 10 20 but this query B will not: state city sales a ----- ------------- ----- -- CA San Francisco 10 10 B will not match because the selection op, which is sticky, precedes the precomputed materialized table associated with the query S. N.B.: This can be optimized. a sticky op which occurs in the initial matching segment but which is not followed by an op which is sensitive to stickiness does not disqualify the match. NYI. d) Memoization is sensitive to the interaction of selection statements with willbe statements containing sticky functions and with tabulation statements. Suppose the stored query S (stored in U) is: and suppose the input query Q on T is: We say that Q matches S because Q matches the evaluation of: so the query R can be evaluated on U. This is so because g_sum on break-column city is indifferent to whether the selection occurs before or after the g_sum. Now consider the following example: Table T has columns state and city, and no city belongs to more than one state. The stored query S is as above, and the input query Q is: The canselect attribute lists those columns whose extensions are supersets of the extensions of the break columns. We can therefore know that the evaluation of Q will match the evaluation of: and therefore that we may replace Q on T with R on U: The same constraint applies to tabulations: selections which reference columns which are either break columns in a tabu, or which appear as the values of the canselect attribute in the tabu, may appear either before or after the tabu. The difference in the case of tabu statements is that the result of the tabu must contain the columns referenced in the candselect attribute. [examples needed here] 3. Column Order. There is no guarantee that the column order of the result query will match the column order of the input query. For example, suppose the input query Q is: a B where a and B are willbes with names x and y, and S is: B then the resulting query Q* will consist of: a The column order of Q will be x y, and that of Q* will be y x. We have three options in dealing with this: 1. Trace the resulting columns for each op in R, then append a colord to the resulting query S. 2. Disqualify input queries which do not terminate with a colord. 3. Do nothing, but document the behavior. 4. Recursive 'set' (nyi) Here the idea is that what we store in Memo can differ from the query which we execute and materialize. Suppose that we have the stored query S = A B C materialized as T and we now seek to store Q = A B C D and materialize it as U. We see that Q matches S at A B C. In this case we store Q but execute the new query consisting of base table T, opts = D, and materialize the result as U. 5. Recursive 'get'. Here the idea is that if a whole-query match fails, we still have opportunities to memoize by matching the subqueries of link, merge, or amend ops. Suppose that the input query Q is A B C and fails to match any query. And suppose that B is a link (or merge or amend) with ops. The memoization algorithm is invoked recursively on B`ops. 6. Exhaustive 'get'. Here the idea is that a memoized query can be capable of further memoization. For example, suppose that the input query Q is A B C D E can be memoized against the stored query R = A B materialized as T. The result is Q* = C D E with base table T. Now suppose there is a further stored query S = C D materialized as U. The result of memoizing Q* is E with base table U. We implement this idea with the function: memoize:{[t;o](memoize_ .)/(t;o)} 7. The Query Cache. Query results are materialized and the query and associated information is stored in a cache. We propose to store the cache as a table Memo in the bag server. The cache is a table with five columns: !Memo `t `o `p `d `c Each record of Memo represents a precomputed query. Memo.t[i] is the base table path of the i-th query. Memo.o[i] is the list of ops comprising the i-th query. Memo.p[i] is the materialized table path of the table computed by the i-th query. Memo.d[i] is a list of pairs of dependency information for all input tables to the i-th query. Memo.c[i] is !Memo.p i. N.B. Memo.d is used in the matching algorithm to determine whether the cached query is "stale". In principle, it could also be used to automatically recompute all stale dependencies. For example, in the example above, the record for the query which computes the table xyz is: Memo[`t;2] `.db.mdb_regression_tester.mdb_regression_test_folder.auto3 Memo[`o;2] ((`willbe .((`name;`xx;) (`value;,"1";) (`spflag;0;) (`materialize;0;) (`label;"xx";) (`format;();) (`fixed;;) (`desc;,"1";))) (`willbe .((`name;`xy;) (`value;,"2";) (`spflag;0;) (`materialize;0;) (`label;"xy";) (`format;();) (`fixed;;) (`desc;,"2";))) (`willbe .((`name;`xz;) (`value;,"3";) (`spflag;0;) (`materialize;0;) (`label;"xz";) (`format;();) (`fixed;;) (`desc;,"3";)))) Memo[`p;2] `mdb_regression_test_folder.xyz Memo[`d;2] ,(`.db.mdb_regression_tester.mdb_regression_test_folder.auto3;-484054871) Memo[`c;2] `year `gm `ford `cry 8. Code. The script memoize.k is loaded into the k tree directory .memo. The following code has been added to Cover: : SetQueryHashes[table;ops] q:Memoize[table;ops;sid;uid;pw];table:*q;ops:q 1 <---------- added r:Cover00[table;ops;InitialBlocks[];InitialVars[];sid;uid;pw;pvlen;ctf;cols;rows] : and the following function to mdb_cover.k: Memoize:{[table;ops;sid;uid;pw] q:.memo.chk[table]ops if[~ops~q 1 u:.+(`sid`uid`pswd;(sid;uid;dbd.dbi.pwe.EncryptPW[.,(`sid;sid);pw])) .memo.res[u]. q .opstack,:.((`type;`memoize);(`pre;(table;ops));(`post;q)) table:*q;ops:q 1] (table;ops)} The following line has been added accum_materialize: : .memo.col tp : The following line has been added to mdb_limits.k: /Memoization if[_n ~ @[`.lim;`memoize]; .lim[`memoize]:0];.limhelp[`memoize]:"0|1" The following line has beeb added to mdb_hints.k: if["memoize"~x;.lim[`memoize]:0] 9. Notes. Currently there is no provision for removing a table from the bag server memoization table. 'sets' append and modify records in the bag server memoization table and are stored immediately in the bag server. There is also no official instrumentation for determining whether a query has been memoized, and if so, what the memoized query looks like. After running a query which contains , screen into the accum and execute: r 0 -- Appendix: Use Cases. A. Exact Match. Suppose query Q has the form memoize_ : and executed. B. Set. Suppose query Q has the form memoize_ ... : ... : memoize empty : C. Get. If the query Q does not begin with a meta containing memoize or memoize_ and 1~.lim`memoize then the pattern-matching algorithm described in the body of this document will be applied, attempting to match Q against candidates stored in the bag server.