;: y (Words)

Back to: Vocabulary

Partitions a string into tokens according to the syntax of a J sentence.

The monadic sequential machine implements lexical analysis to split a j sentence into j words. It works somewhat with English to break sentences into words. ;: has a defined obverse.

   ;: '.2'   NB. Is .2 a j representation of number?
+-+-+
|.|2|
+-+-+
   ;: 'Reserved:::...:.J...Words...'   NB. How do suffixes work?
+----------------+----+--------+
|Reserved:::...:.|J...|Words...|
+----------------+----+--------+
   ;: 'f::g'    NB. This is NOT an adverse specification!
+---+-+
|f::|g|
+---+-+
   ;: 'f :: g'  NB. adverse
+-+--+-+
|f|::|g|
+-+--+-+
   ;: '(+/ % #) 1 2 3 4 88.4'  NB. the vector is a single word
+-+-+-+-+-+-+------------+
|(|+|/|%|#|)|1 2 3 4 88.4|
+-+-+-+-+-+-+------------+


   a=: '   The   sequential  machine  sometimes     works with   English.  '

   ;: a
+---+----------+-------+---------+-----+----+--------+
|The|sequential|machine|sometimes|works|with|English.|
+---+----------+-------+---------+-----+----+--------+

   embed=: >@:{.@:[ , ] , >@:{:@:[
   ('>>>';'<<<') embed a
>>>   The   sequential  machine  sometimes     works with   English.  <<<


   deb  NB. delete extra blanks
#~ (+. (1: |. (> </\)))@(' '&~:)

   NB. Use under to demonstrate inverse.
   (deb -: ]&.:;:) a    NB. The result is the same as for deb
1
   (deb -: [&.:;:) 'These will, of course, differ.'
0

   '><' embed [&.:;: a
>The sequential machine sometimes works with English.<

   #book=: 1!:1 < 'A_Christmas_Carol_NT.txt'
166529

   ts 'deb book'
0.002849 788224

   ts '[&.:;:book'  NB. 1/10 the speed and double space.  Keep the current deb definition.
0.033055 1.3343e7

   ts  NB. Time and space from the book of j phrases.
6!:2 , 7!:2@]

Suppose you meet +/ .* in a fork. Or is it a hook? You must know whether there are an even or an odd number of verbs to distinguish hook from fork. This, in turn, determines the data routing.

   ;:'+/ .*'
+-+-+-+-+
|+|/|.|*|
+-+-+-+-+

You must still know that the adverb / influences + before it, and that conjunction of lone . binds +/ to * producing a single verb.

Common uses

1. Partition an open list into boxed strings

   ;: 'alpha bravo charlie'
┌─────┬─────┬───────┐
│alpha│bravo│charlie│
└─────┴─────┴───────┘

See Also


x ;: y (Sequential Machine)

Back to: Vocabulary

Partitions a string (or more general list) y according to the rules of a given finite-state (Turing) machine x

I learned to use the sequential machine through a series of errors. Here I'll first present a working machine. If you don't understand it, continue reading to see the mistakes and substeps I took to make it work. My experience might help your understanding.

Project: define STATE to validate these assertions:

assert 0 1 0 -: (1 ; STATE) ;: 0 0 0 1 1 0
assert 1 0 1 -: (1 ; STATE) ;: -. 0 0 0 1 1 0
assert 0 1 0 1 -: (1 ; STATE) ;: 0 0 1 1 1 0 0 0 0 1 1

A solution is

   STATE=: (1 1 ,: 2 1) , (1 1 ,: 2 2) ,: (1 2 ,: 2 1)

How did I find it? The remaining examples trace my learning curve. First I tried a two state machine. The machine starts in state 0. This incorrect logic seemed reasonable for my first try.

In state 0, if an item is of class 0 remain in state 0 and advance i . Otherwise, transition to state 1 , emit word, and advance i . State 1 would work in opposition.

   STATE=: (0 1 ,: 1 2) ,: (0 2 ,: 1 1)
   ((>;:'Next state and action') ; ;:'Class0 Class1'),(;: 'State0 State1'),.<"1 STATE
┌──────┬──────┬──────┐
│Next  │Class0│Class1│
│state │      │      │
│and   │      │      │
│action│      │      │
├──────┼──────┼──────┤
│State0│0 1   │1 2   │
├──────┼──────┼──────┤
│State1│0 2   │1 1   │
└──────┴──────┴──────┘
   
   assert 0 1 0 -: (1 ; STATE) ;:  0 0 0 1 1 0
   assert 1 0 1 -: (1 ; STATE) ;: 1 1 1 0 0 1
|index error
|   assert 1 0 1-:(1;STATE)    ;: 1 1 1 0 0 1

The first item, class 1 in state 0 causes transition to state 1 AND emit word. Variable j started as _1 , emit word signaled index error. The ijrd parameter won't help because i must be greater than j.

I decided the project was too difficult. Let's start with a machine that stops.

   (0 ;  (,: ,: 0 6) ) ;: 0   NB. by default, an integer is its class

   (0 ;  (1 2 2 $ 0 6) ; <<0 ) ;: 5 32 4  NB. stop on any integer

   stop=: ;:~ 0 ; (1 2 2$0 6) ; [: < [: < {.  NB. stop as above, take data type from y

   stop 'abc'

SUCCESS! Next baby step: define a machine that returns text up to 'b' then stops. State table proposal:

In state 0

In state 1

In state 2 stop.

   FUNCTION=: 1
   STATE=: (1 1 ,: 0 6) , (1 0 ,: 2 3) ,: (0 6 ,: 0 6)
   MAP=: a. = 'b'
   (FUNCTION ; STATE ; MAP)&;:&.>;:'Xa Xab Xabc b ba cba'
┌──┬──┬──┬┬┬─┐
│Xa│Xa│Xa│││c│
└──┴──┴──┴┴┴─┘
   (FUNCTION;STATE;MAP);:'my-cat-bart'
my-cat-
   (FUNCTION;STATE;MAP);:'buy-my-cat-bart'

SUCCESS! Changing the map would change the stop character set.

Next baby step: duplicate the transliteration deletion action of the unix command tr -d [characterSet] .

In state 0, the beginning of the items to analyze

In state 1, retention

In state 2, elision

   FUNCTION=: 1
   STATE=: 3 2 2 $ 1 1  0 0   1 0  2 3   1 1  2 0
   MAP=: a. = 'b'
   (FUNCTION;STATE;MAP)&;:&.>'buy my cat bart';'bb0123bb45bbb';'666'
┌─────────────┬──────┬───┐
│uy my cat art│012345│666│
└─────────────┴──────┴───┘

   NB. same STATE table removes 'a' and 'b' .

   ELISIONS=: 'ab'
   (FUNCTION ; STATE ; <<a.-.ELISIONS) ;: 'buy my cat bart'
uy my ct rt
      
   NB. Challenge:
   NB. Rewrite the state table to eliminate unnecessary state 2.

SUCCESS! Let's finish the original project. Item classes are the integers 0 and 1 . Plan: report previous item when it differs from the current item. Solution: use 3 states, advance the start of word at every item. Emit word at each state change.

   STATE=: (1 1 ,: 2 1) , (1 1 ,: 2 2) ,: (1 2 ,: 2 1)

   [A=: (,: -.) 0 0 0 1 1 0
0 0 0 1 1 0
1 1 1 0 0 1

   (1 ; STATE) ;:"_ 1 A
0 1 0
1 0 1

   NB. insert the explicit item map
   (1 ; STATE ; < < 0 ) ;:"_ 1 A
0 1 0
1 0 1

   NB. group 'a' with 'b'
   (1 ; STATE ; <<'ab' ) ;: 'abbacdeab'
aeb

SUCCESS!

Suppose you think, as I thought, to specify both items in the map. Index error arises because there are now 3 item classes, the 0 and 1 specified, and the implicit set of all else. Use smcheck defined in the Sequential Machine lab.

   (1 ; STATE ; < 0 ; 1 ) ;:"_ 1 A
|index error
|   (1;STATE;<0;1)    ;:"_ 1 A

   [X=: 1 ; STATE ; < 0 ; 1
+-+---+-----+
|1|1 1|+-+-+|
| |2 1||0|1||
| |   |+-+-+|
| |1 1|     |
| |2 2|     |
| |   |     |
| |1 2|     |
| |2 1|     |
+-+---+-----+
   X smcheck 1 1 1 0 0 0 0 1
|assertion failure
|   q>#m

Quick reference for common problems

Use smcheck , defined in the Sequential Machine lab.

Index error causes

   STATE=: (1 1 ,: 2 1) , (1 1 ,: 2 2) ,: (1 2 ,: 2 1)
   (1 ; STATE ; <0;1) ;: 0 0 0 1 1 0   NB. WRONG, 3 item classes in map, 2 = q in STATE
|index error
|   (1;STATE;<0;1)    ;:0 0 0 1 1 0

   (1 ; STATE ; <<0) ;: 0 0 0 1 1 0    NB. RIGHT
0 1 0

   (1 ; ,:^:2 ] 0 6);:0    NB. OK, state 0 exists

   (1 ; ,:^:2 ] 99 6);:0   NB. NO!  state 99 undefined
|index error
|   (1;,:^:2]99 6)    ;:0

Rank error causes

Length error causes

   NB. curtailed map is too short
   (1 ; STATE ; }: 'b' = a.) ;: 'abcdebf'
|length error
|   (1;STATE;}:'b'=a.)    ;:'abcdebf'

Common uses

See: Generate useful tokens from input

See Also


CategoryVoc

Vocabulary/semico (last edited 2012-03-10 04:35:24 by DavidLambert)