Addons/xml/sax

xml/sax - XML parser based on Expat library

SAX (Simple API for XML) parser addon. There is both flat API and object oriented, SAX-like interface. Binaries for Windows, Linux x86 and Darwin PPC included.
Based on Expat 2.0.0, see [WWW] http://expat.sourceforge.net/
See also: examples in [JSvnAddons]test folder in SVN; change [JSvnAddons]history.

  1. Installation
  2. Usage
  3. Examples
    1. sax_test2.ijs
    2. sax_test3.ijs
    3. table.ijs
    4. rss.ijs
    5. chess.ijs
    6. stop.ijs
    7. prajg.ijs
  4. See Also
  5. Authors

Installation

Use JAL/Package Manager or download the xml_sax archive from [JAL]j601/addons and extract it into the ~addons/xml/sax folder (or ~addons/xml for j504).

Usage

SAX (Simple API for XML) is originally a Java framework by David Megginson derived from expat processing model. This paradigm results in systematically faster XML processing than DOM, as the SAX stream has a tiny memory footprint. See [WWW] http://www.saxproject.org/.

SAX parsing works within the push model, i.e. the API calls you. You provide the callback functions by overriding the base class, see saxclass definition. For the XML nodes events, these functions are called on.

A higher-level visitor design pattern can be obtained if you define verbs with names of elements of interest and a prefix and call then from start/endElement. This would be similar to wd calling on event verbs.

In your class you maintain the state and selectively process the events. The event for text between tags is called characters. It is demoed in the table and rss examples.

In rss example, a simple stack of nested elements is maintained in the S list. Then characters processes the text accroding to the current context.

You can pass the result for process in the output of endDocument, which is the last event called.

Examples

These are listings and results of some examples found in the test folder.

sax_test2.ijs

NB. object oriented sax parser specialization
NB. extended to use attributes and levels

require '~addons/xml/sax.ijs'

saxclass 'psax2'

showattrs=: (''"_)`(;:^:_1@:(([ , '='"_ , ])&.>/"1))@.(*@#)

startDocument=: 3 : 0
  L=: 0
)

startElement=: 4 : 0
  smoutput (L#'  '),'[',y.,' ',(showattrs attributes x.),']'
  L=: L+1
)

endElement=: 3 : 0
  L=: L-1
  smoutput (L#'  '),'[/',y.,']'
)

NB. =========================================================
cocurrent 'base'

TEST1=: 0 : 0
<root><test a="11"/><test b="12"/></root>
)

0 : 0  NB. Test
process_psax2_ TEST1
process_psax2_ fread jpath '~addons/xml/sax/test/chess.xml'
)

sax_test3.ijs

NB. object oriented sax parser specialization
NB. extended to use text characters

require '~addons/xml/sax/sax.ijs'

saxclass 'psax3'

showattrs=: (''"_)`(}.@;@:((',' , [ , '='"_ , ])&.>/"1))@.(*@#)

startDocument=: 3 : 0
  L=: 0
  IGNOREWS=: 1
)

startElement=: 4 : 0
  smoutput (L#'  '),'',y,'(',(showattrs attributes x),') {'
  L=: L+1
)

endElement=: 3 : 0
  L=: L-1
  smoutput (L#'  '),'}'
)

characters=: 3 : 0
  smoutput (L#'  '),y
)

NB. =========================================================
cocurrent 'base'

TEST3=: 0 : 0
<body><p a="11">s123</p>Between<q b="12" c="3">z456</q></body>
)

0 : 0  NB. Test
process_psax3_ TEST3
process_psax3_ fread jpath '~addons/xml/sax/test/table.xml'
)

table.ijs

NB. using element character content
NB. inter-tag and surrounding whitespace is ignored

require '~addons/xml/sax/sax.ijs format'

saxclass 'ptable'

endElement=: 3 : 0
  if. y.-:'tr' do. TD=: '' [ TR=: TR,TD end.
)

characters=: 3 : 'TD=: TD,<y.'

startDocument=: 3 : 'TR=: empty TD=: i.0 [ IGNOREWS=: 1'
endDocument=: 3 : 'TR'

NB. =========================================================
cocurrent 'base'

TEST4=: 0 : 0
<table><tr>  <td>0 0 </td>  <td> 0 1</td>  </tr>
      <tr>   <td>1 0 </td>  <td> 1 1</td>  </tr></table>
)

0 : 0  NB. Test
process_ptable_ TEST4
process_ptable_ fread jpath '~addons/xml/sax/test/table.xml'
)

rss.ijs

NB. using element character content
NB. selective processing based on element hierarchy position

require '~addons/xml/sax/sax.ijs format'

saxclass 'prss'

startDocument=: 3 : 0
  S=: ''
)

startElement=: 4 : 0
  S=: S,<y.
  if. y.-:'item' do. smoutput '' end.
)

endElement=: 3 : 0
  S=: }:S
)

characters=: 3 : 0
  s2=. _2{.S
  if. s2-:;:'channel title'       do. smoutput 'Channel: ',y. elseif.
      s2-:;:'channel description' do. smoutput fold y. elseif.
      s2-:;:'channel pubDate'     do. smoutput 'Date: ',y. elseif.
      s2-:;:'item title'          do. smoutput 'Topic: ',y. elseif.
      s2-:;:'item description'    do. smoutput fold y. elseif.
      s2-:;:'item link'           do. smoutput 'URL: ',y. end.
)

NB. =========================================================
cocurrent 'base'

TEST3=: 0 : 0
<channel><title>qq</title><pubDate>1/1/2006</pubDate></channel>
)

0 : 0  NB. Test
process_prss_ TEST3
process_prss_ fread jpath '~addons/xml/sax/test/cnn.rss'
)

chess.ijs

NB. chess -- a more complete example of custom parser
NB. transforms XML chess board into a J character matrix

require '~addons/xml/sax/sax.ijs viewmat'

saxclass 'pchess'

COLORS=: ;:'whitepieces blackpieces'
PIECES=: ;:'pawn rook night bishop queen king'
SYMBOLS=: 'PRNBQKprnbqk'

startElement=: 4 : 0
  e=. <y.
  if. 2>C=. COLORS i.e do. COLOR=: C*6 return. end.
  if. 6>P=. PIECES i.e do. PIECE=: SYMBOLS{~COLOR+P return. end.
  if. -.'position'-:y. do. return. end.

  r=. <:0".       x.getAttribute 'row'
  c=. 'abcdefgh'i.x.getAttribute 'column'
  empty BOARD=: PIECE (<r,c) } BOARD
)

startDocument=: 3 : 0
  BOARD=: '. '{~ ~:/~2|i.8
)

endDocument=: 3 : 0
  |.BOARD
)

NB. =========================================================
cocurrent 'base'

0 : 0  NB. Test
process_pchess_ fread jpath '~addons/xml/sax/test/chess.xml'
viewbmp jpath'~addons/xml/sax/test/chess.bmp'
)

stop.ijs

NB. interrupt on found data or error
NB. sax_test2 extended to stop parsing.
NB. Note: end element event is still handled

require '~addons/xml/sax/sax.ijs'

saxclass 'pstop'

showattrs=: (''"_)`(' ' , ;:^:_1@:(([ , '='"_ , ])&.>/"1))@.(*@#)

startDocument=: 3 : 0
  L=: 0
  V=: 'not found'
)

startElement=: 4 : 0
  smoutput (L#'  '),'[',y,(showattrs attributes x),']'
  if. y-:,'p' do.
    select. x getAttribute 'n'
    case. ,'b' do. stop '' [ V=: x getAttribute 'v'
    case. _1   do. stop 1001;'Attribute "n" missing'
    end.
  end.
  L=: L+1
)

endElement=: 3 : 0
  L=: L-1
  smoutput (L#'  '),'[/',y,']'
)

endDocument=: 3 : 0
  smoutput 'Value of n=b is ',":V
)

NB. =========================================================
cocurrent 'base'

TEST4=: 0 : 0
<body><p n="a" v="11"/><p n="b" v="22"/><p n="c" v="33"/></body>
)
TEST4a=: 0 : 0
<body><p n="a" v="11"/><p n="c" v="33"/></body>
)
TEST4b=: 0 : 0
<body><p n="a" v="11"/><p v="22"/><p n="c" v="33"/></body>
)

0 : 0  NB. Test
process_pstop_ TEST4
process_pstop_ TEST4a
process_pstop_ TEST4b
)

prajg.ijs

I would like to add to Oleg's excellent examples with a bit of code I recently used to process large XML namespace documents generated by a Cognos namespace utility. The following script blows through large namespace documents and builds a parent child symbol table. The simplicity of this code is in stark contrast to the ugly industrial XML it processes. Don't be deceived by Oleg's terse examples this is a very powerful and useful utility. JohnBaker

NB. Finds all user superclasses to root in Cognos namespace report XML.
NB. John Baker J6.01 2007/06/07 uses Oleg's SAX addon

require 'xml/sax format'

saxclass 'prajg'

startDocument=: 3 : 0
  S=: ''          NB. element path
  PCTAB=: 0 2$''  NB. parent child table
  P=: ''          NB. parents
  CHILDUC=: ;: 'ChildrenUserClasses Userclass'
  NSUC=:    ;: 'NamespaceReport Userclass'   
  MBRU=:    ;: 'Members User' 
)

startElement=: 4 : 0
  S=:   S,<y
  s2=.  _2{.S
  if.  s2 -: CHILDUC   do.
    class=. x getAttribute 'name'
    PCTAB=: PCTAB,({:P),<class
    P=: P,<class
  elseif. s2 -: MBRU do.
    user=. '**user: ',x getAttribute 'name'
    PCTAB=: PCTAB,({:P),<user   
  elseif. s2 -: NSUC   do.
    class=. x getAttribute 'name'
    P=: P,<class
  end.
)


endElement=: 3 : 0
  S=: }:S
  NB. pop parent when ChildrenUserClasses ends
  if. y-:'ChildrenUserClasses' do. P=: }:P end.
)


NB.return parent child table as symbols
endDocument=: 3 : 0
s: PCTAB
)

NB.===================================
cocurrent 'base'

See Also

Authors

last edited 2008-04-24 20:17:16 by DevonMcCormick