Trees
Contents
the question
We know J is wonderful for handling arrays. If there were a prize for array handling, only J and APL would be in the running, and you wouldn't see J sweating.
But what about other data structures? What about trees? Trees are ubiquitous in computer science. They're fundamental to many algorithms. Can J handle them?
That is, how convenient and efficient is it to construct (represent) and manipulate (process) tree data structures in J?
representation
First, we should understand that J can represent trees. The most obvious ways is with nested boxes:
5!:2{.;:'toJ'
+-------------------------------+--+------------------------------+
|+-+-------------------------+-+|@:|+-----+----------------------+|
|| |+---------------------+-+|]|| ||+-+-+|+--+-+---------------+||
|| ||+---------------+-+-+|}|| || |||#|~|||-.|@|+---------+-+-+|||
|| |||+--+-+--------+|@|]|| || || ||+-+-+|| | ||+--+-+--+|@|,||||
|| ||||I.|@|+--+-+-+|| | || || || || || | ||| |&|E.|| | ||||
|| |||| | ||e.|&| ||| | || || || || || | ||+--+-+--+| | ||||
|| |||| | |+--+-+-+|| | || || || || || | |+---------+-+-+|||
|| |||+--+-+--------+| | || || || || |+--+-+---------------+||
|| ||+---------------+-+-+| || || |+-----+----------------------+|
|| |+---------------------+-+| || | |
|+-+-------------------------+-+| | |
+-------------------------------+--+------------------------------+
5!:4{.;:'toJ'
+- 10{a.
| +- I.
| +- @ -+ +- e.
+-+- } ----- @ -+ +- & ------+- 13{a.
| | +- ]
| +- ]
-- @: -+
| +- ~ ----- #
| |
+-+ +- -.
| | +- 13 10{a.
+- @ ---+ +- & -+- E.
+- @ -+- ,
We can see here that trees and boxes are just two ways to represent the same thing. But boxes may not be the best representation, and we'll see why.
Another possible representation may be storing the tree as a list. For example, a binary tree can be stored in an list which maintains certain relationships among indices. For example, see Wikipedia's entry on binary trees. More than binary, arbitrary trees can be stored in lists, and J's sparse arrays can probably alleviate the memory problems normally associated with this representation.
processing
So it's possible to represent trees in J. But what about processing trees? How easy would it be, say, to rebalance a heap? Is it efficient?
Trees are different from arrays, whatever their representation. They normally require small granularity of focus and lots of small decisions. Whereas arrays reward thinking big; processing as much data as possible at once. J allows you to express big thoughts concisely, and execute them efficiently. But once you start thinking small, J punishes you. After all, it takes 5 machine words to store one byte of data in J.
But that's not the only way J can punish you. Of course we're interested in whether tree processing is efficient, but, more importantly, we want to know "is it convenient?". Ay, there's the rub.
obstacles
A while ago, I indicated 4 major obstacles to tree processing in J. Three of them address the "efficient" part of the question, and one addressed the "convenient" part. Most of these were obstacles were removed in subsequent versions of J (released after I wrote my message); see the next section.
At the time, the obstacles were:
J is horrifically slow at processing arrays of boxes. Note, in this example, taking the shape of an array of boxes is about 5 orders of magnitude slower than doing so for an open array.
a =. 4000 4000 $ 0 b =. 4000 4000 $ < 0 $a NB. This returns in no time at all 4000 4000 $b NB. You, the human typing this, will notice this delay. 4000 4000 6!:2 '$ a' 1.64825e_5 6!:2 '$ b' NB. 4 _SECONDS_, to take the _SHAPE_? 4.1239 load 'numeric' 1 round -&:(10&^.@:(6!:2)@:('$'&,))/ 'ba' NB. 5 _orders of magnitude_ 5There is a dearth of tree-processing primitives. For example, we still don't have }::.
'a' 2 }:: ;/'aab' |nonce error | 'a' 2}::;/'aab'
The box depth is limited to 270, i.e.
<:@:(>:@:L.@:(<^:]) ::]^:_) 0 270
You can work around this with L:, but you don't want to due to #0.
The stack depth is limited to 510, i.e.
$:@:>: :: <: 0 510
Which makes recursion untrustworthy.
recent improvements
In recent versions of J, most of these problems were fixed, as I laid out in this message:
Boxed references were improved in J6.01.
First, the original objection didn't mention that we already had {::,L.,L:, and S:. Now, we still don't have }::, but J5.03 introduced the finite state machine, dyadic ;:.
FSM is useful for problems which historically been difficult or slow in J (i.e. scalar, "loopy" algorithms that make lots of small decisions). For example, the Essay on Huffman coding and corresponding Lab demonstrate efficient tree processing in J.
Additionally, J6.02 introduced M., a memoizing adverb (though its utility is currently limited). But there's more to this story; see the next section.
I didn't mention this in the message, but somewhere along the line, the box depth limit was raised (or eliminated?). I can't pinpoint which version from the release notes, but at least in J6.01, the sentence in obstacle #2 never bottoms out (i.e. it doesn't seem to have a limit, questions of efficiency aside).
The function call depth was changed to 10,000 in J5.04 and the recursion limit was substantially increased in J6.01.
reconsideration
But maybe my message missed the point. These improvements made tree processing in J possible, but not easy. That is, they improved efficiency, not convenience. And future enhacements are likely to continue that pattern.
J is a notation for manipulating rectangular arrays. It is a tool of thought which encourages you to think in arrays, and solve problems with arrays. Its whole philosophy and all its tools are oriented towards that structure; consequently, using a different data structure will be painful.
I doubt even Roger found it trivial to write the Huffman verbs. So, even with even with FSM, obstacle #1 is still outstanding. J gives you the tools to process arrays; if you want to process trees, you'll have to write your own (or wait for a good samaritan to write an addon).
Which is just a long way to say that J does not encourage tree-thinking.
conclusion
Every programming language has a philosophy, and none of them can be everything to everyone.
If you're already an array-thinker, and you're familiar with J, and still conclude your program (primarily) needs to manipulate trees, perhaps you should write it in another language. One of J's primary advantages is that it makes (array) programming easy. If its notation is inapplicable to your problem (more or less), there is not much reason left to use it.
I hear Lisp is pretty good with trees. Don't know, myself.
see also
With all that said, it is certainly possible to process trees in J, and there are many published examples ranging from simple to complex.
Here is a selection of J resources dealing with trees and implementing tree algorithms in J:
David Steinbrook & Eugene McDonnell's paper From Trees into Boxes.
Oleg's Essay on Dendrites and Roger's on minimal spanning trees.
Chris and Oleg on Treemaps and related algorithms.
Oleg's Essay on LL parsers.
BJonas' Scheme parsing script (in fact, the guide to parsing in J contains numerous examples).
Ken Iverson's treatment of trees in his seminal paper, "A Programming Language" which Devon has helpfully summarized on this Wiki.
The Dictionary on directed graphs; in particular transitive closure and distance.
This last, in particular, demonstrates that when a datastructure can be represented as an array, J will work just as heroically upon it as any other array.
