The SAX XML parser exhibits annoying, though acceptable behavior when working on a text node containing an embedded ampersand character, designated by &. This behavior, that there may be multiple callbacks from a "characters" callback, is documented on this page, which, ironically, is rendered very poorly.
Here's a method for dealing with this, adapted from some example code written by Oleg.
NB.* xmlEGboxed.ijs: use elements or attributes to fill boxed table
NB. http://www.jsoftware.com/pipermail/programming/2008-December/013300.html
require 'xml/sax format'
saxclass 'pboxed'
startDocument=: 3 : 0
LASTL=: L=: 0 [ S=: '' NB. Level counter L, leading paths S.
HREF=: '' NB. Stores attributes to get HREFs.
Z=: i.0 2 NB. Will contain final result.
)
endDocument=: 3 : 'Z'
startElement=: 4 : 0
L=: >:L [ S=: S,<y
if. y-:'bookmark' do.
HREF=: x getAttribute 'href' end.
)
endElement=: 3 : 0
L=: <:L [ S=: }:S
)
characters=: 3 : 0
s2=. _2{.S
if. s2 -: ;:'bookmark title' do.
if. L~:LASTL do. Z=: Z,y;HREF NB. Either initialize or
else. Z=: (<y,~>(<_1 0){Z) (<_1 0)}Z end. NB. accumulate more.
end.
LASTL=: L
)
NB. =========================================================
cocurrent 'base'This code is designed to accumulate bookmarked URLs with their corresponding titles.
Here's some sample XML with embedded ampersands.
egSmall=: 0 : 0
<?xml version="1.0"?>
<!DOCTYPE xbel PUBLIC "+//IDN python.org//DTD XML Bookmark Exchange Language 1.1//EN//XML" "http://pyxml.sourceforge.net/topics/dtds/xbel-1.1.dtd">
<xbel>
<title>Bookmarks</title>
<desc>Bookmarks</desc>
<folder id="rdf:#$FvPhC3" folded="no">
<title>Bookmarks Toolbar Folder</title>
<desc>Add bookmarks to this folder & see them displayed on the Bookmarks Toolbar
</desc>
<bookmark href="http://www.bogus.org/HeyHo/LetsGo.html">
<title>Getting Started & Then Some</title>
</bookmark>
<bookmark href="http://fxfeeds.mozilla.com/" modified="1209052290">
<title>Headlines & Deadlines</title>
</bookmark>
</folder>
<bookmark href="http://www.jsoftware.com/" added="1146880810" visited="1209017433">
<title>J Home & Homeboys</title>
</bookmark>
</xbel>
)Here's the result of using the code on this example:
load 'xmlEGBoxed.ijs' process_pboxed_ egSmall +---------------------------+--------------------------------------+ |Getting Started & Then Some|http://www.bogus.org/HeyHo/LetsGo.html| +---------------------------+--------------------------------------+ |Headlines & Deadlines |http://fxfeeds.mozilla.com/ | +---------------------------+--------------------------------------+ |J Home & Homeboys |http://www.jsoftware.com/ | +---------------------------+--------------------------------------+
