memoization
===========

The following revision of memoization has been proposed:

- a decision layer in memoize: set if its the first run, get if the ops match a past run (based on your current algorithm of matching-- no need to expand at the moment)

- log all memoize queries for a cache hit/miss

- add a silent/non-materialize mode, where we can run #1 + 2 above without writing any data: it’ll allow us to dry run/test 10,000 daily CIP queries without writing any tables, and propose algorithm updates to perhaps increase the hit %

I'll ignore the first two requirements for now and show why I believe the first is probably not feasible.

1. it's a good idea to first understand how the matching algorithm works:

In the very simplest case, query Q uses base table T and has ops O, and there is a stored query S which also uses base table T and has ops P, a subset of O, and has been materialized as U.  Then the resulting memoized query has base table U and ops O-P (i.e. the ops which are in Q and not in S).

In the form of a diagram, where the first two rows are the inputs and the last row is the memoized output:

    query   base    ops  -> table
    -----   ----    ---     -----
    S       T       P       U
    Q       T       O 
    -----------------------------
    R       U       O-P

Note that the matching algorithm M is NOT symmetrical:  from M(S,Q) it does not follow that M(Q,S).  This is critical in what follows.

2. The current proposal is that memoization would be implemented this way:

A series of queries Q1 .. Qn are executed.  If Qi does not match any stored query then it is added to the store (the bag server) and executed as is.  Otherwise we perform the substitution described above and execute it.

Now consider the following simple example.

We have two queries, Q1 and Q2.  Q1 = A B and Q2 = A.  A and B are sequences of ops, so that the ops of Q2 form a proper subset of Q1.  

Suppose the store is empty, and we execute Q1 and Q2 in that order.  Q1 matches nothing in the store, so it is added.  Q2 does not match Q1 so it is added.  The result is that we have two queries in the store and no memoizations.

But now consider the following scenario:  The store is empty.  First we execute Q2.  It matches nothing in the store so it is added.  Now we execute Q1.  Q2 matches Q1, so we perform the substitution and execute the resulting memoized query.

This example shows that successful memoization is highly order-dependent.  Two different sequences of queries will have different effects.  For example, suppose that our query set is:

    Q Q1 Q2 ... Q10000 

and that Q = A and Q1 = A B1, Q2 = A B2, ..., Q10000 = A B10000.

Executing the set in the order

    Q1 Q2 ... Q10000 Q

will result in there being 100001 queries in the store and no memoization, whereas executing the set in the order

    Q Q1 Q2 ... Q10000

will result in there being one query in the store and 10,000 memoizations.

3. Memoization was designed with the following procedure in mind:

Analysis of the query set would reveal a set of common subqueries, which could be factored out and stored in the bag server ("memoizeset").  Subsequent queries, which contained the stored queries as subqueries would then match successfully ("memoizeget").