26. Loopless Code VII: Sequential Machines

J provides a primitive to handle one class of programs, ill-suited for parallel processing, that can be described systematically: sequential machines. Dyad ;: takes a sequential-machine description in x and a stream of input in y, and produces the result called for by the machine description.

A brief overview is as follows. The output array is initialized to empty. An initial row number (also called a state number) is chosen. Each item of input is converted into a column number. The column numbers are processed, one per iteration. In an iteration, the next column number is supplied to the row/action table: the row number and column number specify an item of the table, which is a 2-element vector giving the new row number and an action code. The action is performed and the row number is updated, completing the iteration. Execution continues until a 'quit' action is encountered or all the column numbers have been supplied. At that point the output array, which contains the accumulated results of the actions performed, becomes the result of the verb.

To illustrate the use of dyad ;: we will build a machine to recognize hex constants in character input, where a hex constant is '0x' followed by any positive number of hexadecimal digits.

The machine description, the x argument to dyad ;:, is a list of boxes f;s;m;ijrd . m and ijrd may be omitted.

m controls the conversion of the items of y to column numbers. In the general case, m is a list of boxes; then the column number produced for an item of y is the index of the first box of m whose contents have an item equal to the item of y (if there is no such box, a column number of #m is used). If y is a string, m may be a numeric list containing one column number for each character code and the column numbers are (a. i. y) { m . If y is numeric, m may be empty or omitted, and y specifies the column numbers directly.

In our example problem, we see that the input characters are in 4 classes: the character 0; the character x, the hexadecimal characters 0123456789abcdefABCDEF, and all other characters. We will assign these column numbers 3, 2, 1, and 0 respectively. The conversion control m can be generated by the statements

m =. a. e. '0x123456789abcdefABCDEF'

m =. m + a. e. '0x'

m =. m + a. e. '0'

and can be verified by

(a. i. '0x2aq') { m

3 2 1 1 0

ijrd gives the initial values of 4 variables used by the sequential machine: the input pointer i, the word pointer j, the row number r. and the final column d. The input pointer is the index in y of the next item to be processed, and is incremented by 1 at the end of each iteration. The word pointer is the index in y of the first item in the current word; when the action code calls for producing output, the output will start with item j. When j is _1, there is no current word. The row number, as noted above, is used to index the row/action table. The final column d is an extra column number that will be processed after the last item of y has been processed; it is the column for 'end-of-input'. If ijrd is omitted, it defaults to 0 _1 0 _1, which means start processing the first item of y, with no current word, and starting in row number 0, with no final column. In our example problem, we would like end-of-input to be treated as a non-hexadecimal character, so we will use a final column of 0 and our ijrd will be 0 _1 0 0 ..

s gives the row/action table. This table has as many rows as needed to encode all the states of the sequential machine, and as many columns as there are possible columns numbers of mapped input. Each 1-cell of s is a 2-element list. The first element will become the row number for the next iteration. The second is the action code, indicating the action to be performed. The action code is a number from 0 to 6, with meanings as follows:

Action code	Addition to output array	Change to j (after any addition to the output)
0	none	none
1	none	j =. i
2	add single word	j =. i
3	add single word	j =. _1
4	add multiple words	j =. i
5	add multiple words	j =. _1
6	stop--no further iterations are performed

Executing an 'add' action when j is _1 produces a domain error.

The action code indicates when a value is appended to the output array. The value that is appended depends on the f parameter (which came from the x argument to dyad ;:) and the values of the iteration variables. The values appended for different values of f are (r=row number, c=column number, j=word pointer, i=input pointer):

f	Value appended	Description
0	the items of y between j and i-1, boxed	Boxed word of y
1	the items of y between j and i-1	Unboxed word of y
2	j , i-j	Index and length of word
3	c + r * number of columns in s	Coded row and column
4	j , (i-j) , c + r * number of columns in s	Index and length of word, and coded row and column
5	i , j, r , c , (<r,c){s	Input pointer, word pointer, row, column, new row, action

The indicated data is appended to the output array whenever an 'add single word' action is executed. The 'add multiple words' action also adds the data to the output except that a sequence of consecutive 'add multiple words' actions executed from the same row causes only a single item to be added to the output array: the 'add multiple words' actions executed after the first one modify the item that was added by the first.

If no 'stop' action has been encountered, then after the last item of y has been processed, the end-of-input action is performed: if d is nonnegative, one last iteration is performed with d as the column number; otherwise, one final 'emit multiple words' action is performed if j is not _1.

We can build the state table s for our sample problem now. There will be 4 rows. First, we wait for 0; then we expect x; then we expect a hexadecimal digit, signaling start-of-word if we see it; then we wait for a non-hexadecimal-digit, and output the word when we get one. The state table will look like this:

Row number	Description	Column 0 other	Column 1 hexdigit	Column 2 x	Column 3 0
0	Waiting for 0	0 0	0 0	0 0	1 1
1	Expecting x	0 0	0 0	2 0	0 0
2	Expecting first digit	0 0	3 0	0 0	3 0
3	Waiting for nondigit	0 3	3 0	0 3	3 0

This state table is generated by:

s =. 1 4 2 $ 0 0 0 0 0 0 1 1

s =. s , 4 2 $ 0 0 0 0 2 0 0 0

s =. s , 4 2 $ 0 0 3 0 0 0 3 0

s =. s , 4 2 $ 0 3 3 0 0 3 3 0

and we use it with

(0;s;m;0 _1 0 0) ;: 'qqq0x30x30x40x0xxxx'

+----+----+

|0x30|0x40|

+----+----+

(0;s;m;0 _1 0 0) ;: 'qqq0x30x30x40x0x34a'

+----+----+-----+

|0x30|0x40|0x34a|

+----+----+-----+

Note in the second example that 0x34a was emitted by the end-of-input action.