Words | ;: 1 _ _ | Sequential Machine |
;:y is the list of boxed words in the list y according to the rhematic rules of Part I and the rules regarding NB. . The function also applies reasonably well to ordinary text. For a suitable left argument x , the result of x;:y is equivalent to ;:y . Thus: mj=: 256$0 NB. X other mj=: 1 (9,a.i.' ')}mj NB. S space and tab mj=: 2 ((a.i.'Aa')+/i.26)}mj NB. A A-Z a-z excluding N B mj=: 3 (a.i.'N')}mj NB. N the letter N mj=: 4 (a.i.'B')}mj NB. B the letter B mj=: 5 (a.i.'0123456789_')}mj NB. 9 digits and _ mj=: 6 (a.i.'.')}mj NB. D . mj=: 7 (a.i.':')}mj NB. C : mj=: 8 (a.i.'''')}mj NB. Q quote sj=: _2]\"1 }.".;._2 (0 : 0) ' X S A N B 9 D C Q ']0 1 1 0 0 2 1 3 1 2 1 6 1 1 1 1 1 7 1 NB. 0 space 1 2 0 3 2 2 3 2 2 2 6 2 1 0 1 0 7 2 NB. 1 other 1 2 0 3 2 0 2 0 2 0 2 0 1 0 1 0 7 2 NB. 2 alp/num 1 2 0 3 2 0 2 0 4 0 2 0 1 0 1 0 7 2 NB. 3 N 1 2 0 3 2 0 2 0 2 0 2 0 5 0 1 0 7 2 NB. 4 NB 9 0 9 0 9 0 9 0 9 0 9 0 1 0 1 0 9 0 NB. 5 NB. 1 4 0 5 6 0 6 0 6 0 6 0 6 0 1 0 7 4 NB. 6 num 7 0 7 0 7 0 7 0 7 0 7 0 7 0 7 0 8 0 NB. 7 ' 1 2 0 3 2 2 3 2 2 2 6 2 1 2 1 2 7 0 NB. 8 '' 9 0 9 0 9 0 9 0 9 0 9 0 9 0 9 0 9 0 NB. 9 comment ) x=: 0;sj;mj y=: 'sum=. (i.3 4)+/ .*0j4+pru 4' x ;: y +---+--+-+--+---+-+-+-+-+-+---+-+---+-+ |sum|=.|(|i.|3 4|)|+|/|.|*|0j4|+|pru|4| +---+--+-+--+---+-+-+-+-+-+---+-+---+-+ (x ;: y) -: ;: y 1 (5;sj;mj) ;: y 0 _1 0 2 2 1 1 0 2 2 2 0 2 0 2 2 2 0 3 0 2 0 1 2 4 3 1 6 1 0 5 3 1 1 0 3 6 _1 0 0 1 1 7 6 1 2 2 2 8 7 2 6 1 0 9 7 1 5 6 2 10 9 6 1 0 5 11 _1 0 5 6 1 12 11 6 0 1 4 13 12 1 0 1 2 14 13 1 0 1 2 15 14 1 1 0 3 16 _1 0 6 1 1 17 16 1 0 1 2 18 17 1 5 6 2 19 18 6 2 6 0 20 18 6 5 6 0 21 18 6 0 1 4 22 21 1 2 2 2 23 22 2 2 2 0 24 22 2 2 2 0 25 22 2 1 0 3 26 _1 0 5 6 1 |
x;:y implements a sequential machine (finite state machine, finite state automaton). x is the specification of a machine, including the state transition table, and y is the input. A sequential machine solves the problem of recognizing the “words” in the input. The machine starts in some initial state and processes the input one item at a time; given the current state and input item, the new state and output are determined by the state transition table. The machine then proceeds to process the next input item. In detail: y is any array and x=.f;s;m;ijrd is a boxed list from which ijrd or both m and ijrd may be elided. f is a function code, an integer between 0 and 5. (Explained below.) m is a list of the input mapping; each box of m contains the items of y that are mapped to the same index. That is, the mapped input is my=. (y i.~;m) { (#m),~(#&>m)#i.#m . If y is a string (a list of literals), m may also be a list of non-negative integers corresponding to each atom of the alphabet a. , and the mapped input is my=.(a.i.y){m . Finally, if m is the empty list or is elided (and y is a numeric list), then the mapped input my is just y itself. s is a 3-dimensional, 2-column array of non-negative integers of the state transition and output table. It has shape p,q,2 where p bounds the states and q bounds the mapped inputs. That is, p>0{"1 s , and q>#m if m is a list of boxes or q>m if m is a list of integers. ijrd is an integer parameter list with up to 4 elements. i is the initial iteration counter and index into the input y , r is the initial state, j is the initial index of the current word, and d is an end-of-input action parameter (see below). It is required that (0<:i)*.i<#y and (_1=j)+.(0<:j)*.j<i . If ijrd is elided, then the defaults are 0 _1 0 _1 . x;:y iterates over the atoms of my,
the mapped input. r is the current state and j
is the beginning index of a word; initially, r is 0
and j is _1 (or as specified by ijr).
At iteration i , the current mapped input
is c=.i{my and the relevant state
table entry is e=.s{~<r,c (a 2-element integer list). The new state is 0{e , and the
output code 1{e signifies one of the following: ew(i,j,r,c) (“emit word”) signals index error if j is _1 , and accumulates to the result information on the current word according to the function code f :
ev(i,j,r,c) (“emit vector”) is similiar, but the current word and any intervening items are catenated to the previous word if the previous emit was ev and the state at that time was r. After the last atom of my has been processed, i
is #y , and the end-of-input action, if any, is performed:
If d=.3{ijrd is non-negative, the action is a single iteration
with c=.d ; if d is negative and j
is not _1 , then Function code f=5 specifies trace; the result of x;:y is a 6-column integer matrix, and for each iteration there is a row i,j,r,c,s{~<r,c . This matrix usually has #y rows, but can have fewer if output code 6 was encountered or if any of output codes 2 to 5 were encountered and j was _1 . Thus (0;s;m);:y is a list of boxed items of y and (2;s;m);:y is a 2-column integer table of indices and lengths, and: ((0;s;m);:y) -: (2;s;m) (,"0@;: <;.0 ]) y |
s=: '*: @ -: @ i. 2 3' do=: ". do s 0 0.25 1 2.25 4 6.25 ;: s +--+-+--+-+--+---+ |*:|@|-:|@|i.|2 3| +--+-+--+-+--+---+ ; ;: s *:@-:@i.2 3 p=: 'When eras die, their legacies/' q=: 'are left to strange police' r=: 'Professors in New England guard' s=: 'the glory that was Greece' ;: p +----+----+---+-+-----+--------+-+ |When|eras|die|,|their|legacies|/| +----+----+---+-+-----+--------+-+ > ;: p,q When eras die , their legacies / are left to strange police |.&.;: p / legacies their , die eras When
The following examples of the dyad ;: separate quotes from non-quotes. The quote character is mapped to 1 and other characters are mapped to 0; column 0 of the state table is for “other” (non-quotes) and column 1 is for the quote.
sq=: 4 2 2$ 1 1 2 1 1 0 2 2 2 0 3 0 1 2 2 0 <"1 sq +---+---+ |1 1|2 1| +---+---+ |1 0|2 2| +---+---+ |2 0|3 0| +---+---+ |1 2|2 0| +---+---+ ] y=: '''The Power of the Powerless'' by Havel and ''1984'' by Orwell' 'The Power of the Powerless' by Havel and '1984' by Orwell (0;sq;''''=a.) ;: y +----------------------------+--------------+------+----------+ |'The Power of the Powerless'| by Havel and |'1984'| by Orwell| +----------------------------+--------------+------+----------+ (2;sq;''''=a.) ;: y 0 28 28 14 42 6 48 10 (3;sq;''''=a.) ;: y 6 3 6 2 (4;sq;''''=a.) ;: y 0 28 6 28 14 3 42 6 6 48 10 2 sqx=: 4 2 2 $ 1 1 2 0 1 0 2 3 2 0 3 0 1 1 2 0 <"1 sqx +---+---+ |1 1|2 0| +---+---+ |1 0|2 3| +---+---+ |2 0|3 0| +---+---+ |1 1|2 0| +---+---+ (1;sqx;''''=a.) ;: y by Havel and by Orwell f=: (1;sqx;''''=a.)&;: g=: (+: ~:/\)@(''''&=) # ] (f -: g) y 1
A similar machine can be used to extract only the quoted strings. To prevent an unmatched quote from being recognized as the start of a quoted string, the ijrd field can be used to treat the end-of-string as a non-quote character:
] y=: '''Preposterous!'' He couldn''t go on.' 'Preposterous!' He couldn't go on. sq=: 4 2 2$ 1 1 2 1 1 0 2 1 2 0 3 0 1 3 2 0 (0;sq;''''=a.) ;: y +---------------+---------+ |'Preposterous!'|'t go on.| +---------------+---------+ (0;sq;(''''=a.);0 _1 0 0) ;: y +---------------+ |'Preposterous!'| +---------------+
The labs “Sequential Machines” and “Huffman Coding” contain further examples on the use of sequential machines.