Originally published at http://olegykj.sourceforge.net/ as regexs.ijs
Regular expressions extended for Perl/awk/sed-like substitution. Features an option to process executable replacements.
In Unix shell tools sed and perl, there is a mechanism to describe search pattern and substitution in one operation. Often both patterns take advantage of sub-patterns to manipulate with string fragments. This bring convenience to often used text transformations such as reordering words, removing subparts, etc.
Although the same result could be achieved programmatically with existing regex operations, it would involve additional low-level logic and familiarity with the J regex API of numerous verbs. So the proposed rxs tools provides a high-level operations without going into J implementation details.
The rxs verb also features a powerful e (execute) option, that applies the specified J expression to each match and merges the results.
With appropriate use of the rxs verb, it can satisfy the need of most regex use cases and replace the need for using the low-level verbs.
NB. Regular expressions extended for Perl-like substitution
NB. Version 3 for j601+.
NB. Author Oleg Kobchenko. Originally http://olegykj.sourceforge.net/
NB. to do: \xHH
require'strings regex'
coclass 'jregex'
NB. =========================================================
NB.*rxmain v return ()-less mat from ()-ful pattern
rxmain=: ,:"1@:({."2)
NB. =========================================================
NB.*rxs v make Perl-like s/PAT/REPL/OPT substitution
NB. use:
NB. '/PAT/REPL/OPT' rsx str
NB. PAT - the usual POSIX pattern used in J regex
NB. REPL - the POSIX sed-like replacement string
NB. \1-\9 corresponding parens content
NB. \0 or & whole match
NB. \_ whole match in string representation (for 'e')
NB. \t TAB \n LF
NB. \r CR \f FF
NB. \other other
NB. OPT - any of 'ige' for ignore case, global, execute
NB. see: examples
RBEGE=: <;._1' \n LF \r CR \t TAB \f FF'
RBEGX=: '\n';LF;'\r';CR;'\t';TAB;'\f';FF
rxs=: 4 : 0
esc=. {.x
'pat rpl opt'=. 3{. <;._1 x
str=. tolower^:('i'e. opt) y
pat=. tolower^:('i'e. opt) pat
mat=. pat rxmatch`rxmatches@.('g'e. opt) str
if. (0=#mat) +. _1=1{.,mat do. y return. end.
subs=. ,:^:(2: > #@$) mat rxfrom y
mat=. rxmain mat
newr=. ''
if. 'e' e. opt do.
r=. rpl rplc '\\';esc;RBEGE
for_i. i.#mat do.
pairs=. '&';5!:5<'t' [ t=. >(<i,0){subs
pairs=. pairs,'\_';'('&,@(,&')')@(5!:5) <'t' [ t=. i{subs
for_j. i.{:$subs do.
pairs=. pairs, ('\',":j);5!:5<'t' [ t=. >(<i,j){subs
end.
pairs=. pairs,'\';'';esc;'\'
re=. r rplc pairs
for_j. i.+/'e'E.opt do.
re=. (,@":@:".) :: ('__'"_) re
end.
newr=. newr,<re
end.
else.
r=. rpl rplc '\\';esc;RBEGX
for_i. i.#mat do.
pairs=. '&';>(<i,0){subs
for_j. i.{:$subs do.
pairs=. pairs, ('\',":j);>(<i,j){subs
end.
pairs=. pairs,'\';'';esc;'\'
newr=. newr,<r rplc pairs
end.
end.
newr mat rxmerge y
)
rxs_z_=: rxs_jregex_
Note 'Examples' NB. run indented lines and compare results
«examples»
)
str=. 'hello Mr John Dow hi miz Sarah Bernard hi mr none'
'/(mr|miz) ([a-z]+) ([a-z]+) */\3, \2 (\1) -- was: \0\n' rxs str
hello Mr John Dow hi miz Sarah Bernard hi mr none
'/(mr|miz) ([a-z]+) ([a-z]+) */\3, \2 (\1) -- was: \0\n/i' rxs str
hello Dow, John (Mr) -- was: Mr John Dow
hi miz Sarah Bernard hi mr none
'/(mr|miz) ([a-z]+) ([a-z]+) */\3, \2 (\1) -- was: \0\n/ig' rxs str
hello Dow, John (Mr) -- was: Mr John Dow
hi Bernard, Sarah (miz) -- was: miz Sarah Bernard
hi mr none
p1=. '!(mr|miz) (([a-z]+) )?([a-z]+) *'
r1=. '!\4,s,(":#\4),s, \3, s,\1,s,'' used: '',(":+/a:~:\_),\n [ s=.''/'''
o1=. '!gie'
(p1,r1,o1) rxs str
hello Dow/3/John/Mr/ used: 5
hi Bernard/7/Sarah/miz/ used: 5
hi none/4//mr/ used: 3
'/([^ ]+) ([^ ]+)/\2,''-'',\1/e' rxs 'q''123 z456'
z456-q'123
'/([^ ]+) ([^ ]+)/\2,''-'',\1/ee' rxs '123 456' NB. multiple /e
333
'/(\w+) (\w+) (.*)/\2, \1 \3' rxs 'Henry Rich xxx'
Rich, Henry xxx
See Also
Regex links in Parsing Guide
perl s/// operator Google search
Regexp Quote-Like Operators in perlop
Swap first/last names user feedback
Contributed by OlegKobchenko
