It's also possible to implement regular expression support in K. Regular expression implementations tend to involve arrays and finite state machines, so K is almost an ideal language for working with regular expressions.
One approach to regular expressions involves
"deterministic" state transitions (where the state
transition corresponds to a character from some string of
characters), and "non-deterministic" state transitions
(where the state transition happens without any corresponding
In this code, I initially repreent a regular expression as an array where the first dimension represents a "state index", the second dimension is indexed by character number, and the third dimension represents the set of states which are reached from that "state index" and that "character number".
The state machine itself represents a set of states. The initial state is 0 (which is always a "non-deterministic" state), and the final state is the one with the highest numbered state index.
Here's an example regular expression matching some floating point numbers:
num: rep[re[,"0123456789"]] sign: maybe[re[,"+-"]] fp: fix seq[(sign;num;maybe[seq[(re[,"."];num)]];maybe[seq[(re[,"Ee"];sign;num)]])] match[fp; "1"] / 1 match[fp; "3.1415e0"] / 1 match[fp; "dog"] / 0
This would display most of the interesting state transitions:
fp[;256,_ic " 1-e"]
Note that this example (fp), while relatively simple to
implement, is very bulky. It's possible to use more compact
representations. For example, instead of indexing by character
index, you could identify character classes where all characters
within a class are treated identically.
And you might get rid of all states which have a non-deterministic state transition. It's worthwhile thinking about why this could work. The most important thing to note: the only "non-deterministic state" I care about is state 0 -- that's where I find the initial deterministic states.
So Shrink creates an alternate regular expression data representation which is a three element array:
 the list of starting states  index by character number to get character class  first dimension is indexed by character class second dimension is indexed by (new) state index third dimension is list of (new) states
And, Match uses this alternative representation.
FP: Shrink fp Match[FP; "1"] / 1 Match[FP; "3.1415e0"] / 1 Match[FP; "dog"] / 0
It's possible to extract subexpressions, as well. I've not yet
implemented this, but here's a design sketch:
Regular expressions have associated with them a dictionary of subexpressions (initially empty). Each subexpression also has an offset corresponding to the terminal state for that subexpression.
When combining regular expressions, this offset needs to be updated.
When matching regular expressions, use Next\ instead of Next/ so that the terminal states associated with subexpressions can be located.
Run the corresponding subexpression backwards from the character for that terminal state to find the corresponding begining.