Contents
;: y (Words)
Partitions a string into tokens according to the syntax of a J sentence.
The monadic sequential machine implements lexical analysis to split a j sentence into j words. It works somewhat with English to break sentences into words. ;: has a defined obverse.
;: '.2' NB. Is .2 a j representation of number?
+-+-+
|.|2|
+-+-+
;: 'Reserved:::...:.J...Words...' NB. How do suffixes work?
+----------------+----+--------+
|Reserved:::...:.|J...|Words...|
+----------------+----+--------+
;: 'f::g' NB. This is NOT an adverse specification!
+---+-+
|f::|g|
+---+-+
;: 'f :: g' NB. adverse
+-+--+-+
|f|::|g|
+-+--+-+
;: '(+/ % #) 1 2 3 4 88.4' NB. the vector is a single word
+-+-+-+-+-+-+------------+
|(|+|/|%|#|)|1 2 3 4 88.4|
+-+-+-+-+-+-+------------+
a=: ' The sequential machine sometimes works with English. '
;: a
+---+----------+-------+---------+-----+----+--------+
|The|sequential|machine|sometimes|works|with|English.|
+---+----------+-------+---------+-----+----+--------+
embed=: >@:{.@:[ , ] , >@:{:@:[
('>>>';'<<<') embed a
>>> The sequential machine sometimes works with English. <<<
deb NB. delete extra blanks
#~ (+. (1: |. (> </\)))@(' '&~:)
NB. Use under to demonstrate inverse.
(deb -: ]&.:;:) a NB. The result is the same as for deb
1
(deb -: [&.:;:) 'These will, of course, differ.'
0
'><' embed [&.:;: a
>The sequential machine sometimes works with English.<
#book=: 1!:1 < 'A_Christmas_Carol_NT.txt'
166529
ts 'deb book'
0.002849 788224
ts '[&.:;:book' NB. 1/10 the speed and double space. Keep the current deb definition.
0.033055 1.3343e7
ts NB. Time and space from the book of j phrases.
6!:2 , 7!:2@]Suppose you meet +/ .* in a fork. Or is it a hook? You must know whether there are an even or an odd number of verbs to distinguish hook from fork. This, in turn, determines the data routing.
;:'+/ .*' +-+-+-+-+ |+|/|.|*| +-+-+-+-+
You must still know that the adverb / influences + before it, and that conjunction of lone . binds +/ to * producing a single verb.
Common uses
1. Partition an open list into boxed strings
provided we can assume there's no punctuation to mishandle.
;: 'alpha bravo charlie' ┌─────┬─────┬───────┐ │alpha│bravo│charlie│ └─────┴─────┴───────┘
See Also
x ;: y (Sequential Machine)
Partitions a string (or more general list) y according to the rules of a given finite-state (Turing) machine x
I learned to use the sequential machine through a series of errors. Here I'll first present a working machine. If you don't understand it, continue reading to see the mistakes and substeps I took to make it work. My experience might help your understanding.
Project: define STATE to validate these assertions:
assert 0 1 0 -: (1 ; STATE) ;: 0 0 0 1 1 0 assert 1 0 1 -: (1 ; STATE) ;: -. 0 0 0 1 1 0 assert 0 1 0 1 -: (1 ; STATE) ;: 0 0 1 1 1 0 0 0 0 1 1
A solution is
STATE=: (1 1 ,: 2 1) , (1 1 ,: 2 2) ,: (1 2 ,: 2 1)
How did I find it? The remaining examples trace my learning curve. First I tried a two state machine. The machine starts in state 0. This incorrect logic seemed reasonable for my first try.
In state 0, if an item is of class 0 remain in state 0 and advance i . Otherwise, transition to state 1 , emit word, and advance i . State 1 would work in opposition.
STATE=: (0 1 ,: 1 2) ,: (0 2 ,: 1 1) ((>;:'Next state and action') ; ;:'Class0 Class1'),(;: 'State0 State1'),.<"1 STATE ┌──────┬──────┬──────┐ │Next │Class0│Class1│ │state │ │ │ │and │ │ │ │action│ │ │ ├──────┼──────┼──────┤ │State0│0 1 │1 2 │ ├──────┼──────┼──────┤ │State1│0 2 │1 1 │ └──────┴──────┴──────┘ assert 0 1 0 -: (1 ; STATE) ;: 0 0 0 1 1 0 assert 1 0 1 -: (1 ; STATE) ;: 1 1 1 0 0 1 |index error | assert 1 0 1-:(1;STATE) ;: 1 1 1 0 0 1
The first item, class 1 in state 0 causes transition to state 1 AND emit word. Variable j started as _1 , emit word signaled index error. The ijrd parameter won't help because i must be greater than j.
I decided the project was too difficult. Let's start with a machine that stops.
(0 ; (,: ,: 0 6) ) ;: 0 NB. by default, an integer is its class
(0 ; (1 2 2 $ 0 6) ; <<0 ) ;: 5 32 4 NB. stop on any integer
stop=: ;:~ 0 ; (1 2 2$0 6) ; [: < [: < {. NB. stop as above, take data type from y
stop 'abc'SUCCESS! Next baby step: define a machine that returns text up to 'b' then stops. State table proposal:
In state 0
- if the first character is retainable, set j=.i and go to state 1. otherwise it is 'b'. Quit without output.
In state 1
- while not 'b' remain in state 1 without output. otherwise transition to state 2 and emit word. Action 3 sets j to _1 to prevent emit word if 'b' is at the end of the character vector.
In state 2 stop.
FUNCTION=: 1 STATE=: (1 1 ,: 0 6) , (1 0 ,: 2 3) ,: (0 6 ,: 0 6) MAP=: a. = 'b' (FUNCTION ; STATE ; MAP)&;:&.>;:'Xa Xab Xabc b ba cba' ┌──┬──┬──┬┬┬─┐ │Xa│Xa│Xa│││c│ └──┴──┴──┴┴┴─┘ (FUNCTION;STATE;MAP);:'my-cat-bart' my-cat- (FUNCTION;STATE;MAP);:'buy-my-cat-bart'
SUCCESS! Changing the map would change the stop character set.
Next baby step: duplicate the transliteration deletion action of the unix command tr -d [characterSet] .
- Class 0: characters to keep.
- Class 1: characters to delete.
- State 0: start of sequence to analyze.
- State 1: retention.
- State 2: elision.
In state 0, the beginning of the items to analyze
- 1 1 Transition to state 1, mark the start of word
- 0 0 Found a deletable item. Remain in state 0
In state 1, retention
- 1 0 Stay in state 1, no output. The machine increments i .
- 2 3 Found unwanted item. Enter state 2, emit word.
In state 2, elision
- 1 1 Desirable. Enter state 1, j=.i to mark start of next word.
- 2 0 Unwanted. Stay put.
FUNCTION=: 1
STATE=: 3 2 2 $ 1 1 0 0 1 0 2 3 1 1 2 0
MAP=: a. = 'b'
(FUNCTION;STATE;MAP)&;:&.>'buy my cat bart';'bb0123bb45bbb';'666'
┌─────────────┬──────┬───┐
│uy my cat art│012345│666│
└─────────────┴──────┴───┘
NB. same STATE table removes 'a' and 'b' .
ELISIONS=: 'ab'
(FUNCTION ; STATE ; <<a.-.ELISIONS) ;: 'buy my cat bart'
uy my ct rt
NB. Challenge:
NB. Rewrite the state table to eliminate unnecessary state 2.SUCCESS! Let's finish the original project. Item classes are the integers 0 and 1 . Plan: report previous item when it differs from the current item. Solution: use 3 states, advance the start of word at every item. Emit word at each state change.
STATE=: (1 1 ,: 2 1) , (1 1 ,: 2 2) ,: (1 2 ,: 2 1) [A=: (,: -.) 0 0 0 1 1 0 0 0 0 1 1 0 1 1 1 0 0 1 (1 ; STATE) ;:"_ 1 A 0 1 0 1 0 1 NB. insert the explicit item map (1 ; STATE ; < < 0 ) ;:"_ 1 A 0 1 0 1 0 1 NB. group 'a' with 'b' (1 ; STATE ; <<'ab' ) ;: 'abbacdeab' aeb
SUCCESS!
Suppose you think, as I thought, to specify both items in the map. Index error arises because there are now 3 item classes, the 0 and 1 specified, and the implicit set of all else. Use smcheck defined in the Sequential Machine lab.
(1 ; STATE ; < 0 ; 1 ) ;:"_ 1 A |index error | (1;STATE;<0;1) ;:"_ 1 A [X=: 1 ; STATE ; < 0 ; 1 +-+---+-----+ |1|1 1|+-+-+| | |2 1||0|1|| | | |+-+-+| | |1 1| | | |2 2| | | | | | | |1 2| | | |2 1| | +-+---+-----+ X smcheck 1 1 1 0 0 0 0 1 |assertion failure | q>#m
Quick reference for common problems
Use smcheck , defined in the Sequential Machine lab.
Index error causes
- Emit with j equal to _1.
- Data type mismatch of y items and the item map boxes.
Incorrectly specified boxed map. Remember to use an extra box < to put a box of boxes as the y argument of link . The item classes are correctly boxed in the next examples.
- q doesn't bound the item classes. I still commit the error of "specifying all the classes" when actually the additional implicit class of all else extends the character classes and it is my state table that falls short.
STATE=: (1 1 ,: 2 1) , (1 1 ,: 2 2) ,: (1 2 ,: 2 1) (1 ; STATE ; <0;1) ;: 0 0 0 1 1 0 NB. WRONG, 3 item classes in map, 2 = q in STATE |index error | (1;STATE;<0;1) ;:0 0 0 1 1 0 (1 ; STATE ; <<0) ;: 0 0 0 1 1 0 NB. RIGHT 0 1 0
- State table contains possible transition to undefined state.
(1 ; ,:^:2 ] 0 6);:0 NB. OK, state 0 exists (1 ; ,:^:2 ] 99 6);:0 NB. NO! state 99 undefined |index error | (1;,:^:2]99 6) ;:0
Rank error causes
- State table must be rank 3.
Length error causes
- Character map must tally 256 .
NB. curtailed map is too short (1 ; STATE ; }: 'b' = a.) ;: 'abcdebf' |length error | (1;STATE;}:'b'=a.) ;:'abcdebf'
Common uses
See: Generate useful tokens from input
See Also
Explore the "Sequential Machine" and "Huffman Code" labs in menu: Studio > Labs...
Entry in the J Dictionary for ;:
