x E. y finds all occurrences of x as a substring in y . For example:
x=: 'our'
y=: 'fourscore and ten years ago, our fathers'
x E. y
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
y ,: ' 1' {~ x E. y
fourscore and ten years ago, our fathers
1 1 Substrings can overlap if a non-trivial prefix of x occurs as a suffix of x . (A trivial prefix is the empty string or the entire of x .) Suppose only the non-overlapping substrings (NOS) are to be found? In the following example, the only 1s we want are those at indices 2 11 24 33 46 .
x=: 'abcabcabc'
y=: '--',60$(20$x),'--'
(|:":,.i.#y) , y ,: ' 1' {~ x E. y
1111111111222222222233333333334444444444555555555566
01234567890123456789012345678901234567890123456789012345678901
--abcabcabcabcabcabcab--abcabcabcabcabcabcab--abcabcabcabcabca
1 1 1 1 1 1 1 1 1 1 1 Let s=: x I.@E. y be the indices of the (possibly overlapping) occurrences of x in y . If j in s is the index of an NOS, the index of the next NOS can be found through the use of the dyad I. (interval index).
] s=: x I.@E. y
2 5 8 11 24 27 30 33 46 49 52
] i=: s I. s+#x
3 4 4 4 7 8 8 8 11 11 11
s ,: i{s,_1
2 5 8 11 24 27 30 33 46 49 52
11 24 24 24 33 46 46 46 _1 _1 _1If 2 is (the index of) an NOS, then the next NOS is 11; if 5 is an NOS, the next NOS is 24; if 8 is an NOS, the next NOS is 24; etc. Since the first element of s is the index of an NOS, the indices of all the NOSes can be found by following the path that starts there; that is, by computing the transitive closure of that first element.
2 |
5 |
8 |
11 |
24 |
27 |
30 |
33 |
46 |
49 |
52 |
_1 |
11 |
24 |
24 |
24 |
33 |
46 |
46 |
46 |
_1 |
_1 |
_1 |
_1 |
nos=: 4 : 0
s=. x I.@E. y
i=. s I. s+#x
(i.#y) e. (s,_1) {~ (i,_1) {~^:a: 0
)
y ,: ' 1'{~ x nos y
--abcabcabcabcabcabcab--abcabcabcabcabcabcab--abcabcabcabcabca
1 1 1 1 1
t ,: ' 1'{~ 'aaa' nos t=: 59$'a'
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
t ,: ' 1'{~ 'no overlaps' nos t=: 59$'_no overlaps '
_no overlaps _no overlaps _no overlaps _no overlaps _no ove
1 1 1 1 The function can be written tacitly as follows:
tc =: ,&_1 {~^:a: 0:
nos1=: i.@#@] e. #@[ (tc@(]I.+) { _1,~]) I.@E.
See also
Contributed by RogerHui.
