Chapter 30: Sparse Arrays
30.1 Introduction
The sparse array facility of J allows a large array
to be stored in the computer in a moderate amount of
memory if many of the array's elements
are all the same. In this case a value which occurs many times
need be stored only once.
For an example, sparse representation might be considered for
a connection matrix describing a network. In this chapter
we will look at the J machinery for handling sparse arrays.
Suppose that D is a matrix with most of its elements
the same:
] D =: 2 3 4 (2 2; 3 6; 4 4) } 16 16 $ 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0
0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
This array can be stored in a compact form, called a "sparse array",
where only its non-zero elements occupy storage.
An ordinary array which is not sparse may be called a "dense" array.
There is a built-in function, $. (dollar dot)
to compute a sparse array from a dense.
S =: $. D
For many purposes dense matrix D and sparse matrix S are equivalent:
S matches D, and therefore it has the same dimensions, and
gives the same result on indexing:
S -: D |
($S) -: ($D) |
((< 0 0){ S) -: (<0 0) { D |
1 |
1 |
1 |
30.2 Sparse Array is Compact
Compared to matrix D, matrix S is economical in storage because the value
which occurs many times in D is stored only once in S.
This value is known as the "sparse element" of S, or the "zero" element
of S.
It happens to be 0 in the case of S, but need not be 0 always.
We can measure the size of the storage occupied by an array
with the built-in 7!:5.
We see that the size of S (which the sparse form of D)
is smaller than the size D itself:
7!:5 <'S' |
7!:5 <'D' |
384 |
2048 |
30.3 Inspecting A Sparse Array
There is a useful function datatype in the standard library.
It shows the type of its argument.
datatype D |
datatype S |
integer |
sparse integer |
Recall that the built verb 3!:0 also gives the type of its argument.
For a sparse array, the possible types reported by 3!:0 are
1024 | sparse boolean |
2048 | sparse character |
4096 | sparse integer |
8192 | sparse floating point |
16384 | sparse complex |
32768 | sparse boxed |
If we display S in the usual way , we see, not the familiar representation of a matrix,
but instead
a list of index-value pairs, one pair for each (in this example)
non-zero element.
S
2 2 | 2
3 6 | 3
4 4 | 4
This display does not show that the sparse element of S is in fact integer zero.
To show this, we can extract the sparse element with the verb 3 & $. .
se =: 3 $. S |
datatype se |
0 |
integer |
If we now compute a new matrix from S
T =: S + 5
we see that T is sparse, and the sparse element of T is not zero but 5
T |
3 $. T |
2 2 | 7
3 6 | 8
4 4 | 9 |
5 |
Another way to view a sparse array
is simply to convert it to dense with 0 & $.
view =: 0 & $.
T |
view T |
2 2 | 7
3 6 | 8
4 4 | 9 |
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 7 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 8 5 5 5 5 5 5 5 5 5
5 5 5 5 9 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 |
30.4 Computing with Sparse Arrays
Computations with sparse arrays are pretty much the same as
with dense arrays, except that they tend to produce sparse arrays
as results. We saw this with S+5 above. Here is another example.
Summing over T produces a vector of column-sums which is sparse
] V =: +/ T
2 | 82
4 | 84
6 | 83
but the "zero" element of V is the sum of a column of "zero" elements
of T
3 $. V
80
At the time of writing, there are still some limitations on what
can be done with sparse arrays compared with dense arrays.
See the Dictionary under $. for more information.
30.5 Constructing A Sparse Array
At this point it will be helpful to define a few terms.
First note that, according to context,
the numerals 0 or 1 or 0.0 or 1.0
could be valid as boolean or integer or real. However
in the absence of any context the J system takes them all to be in fact
boolean.
datatype 0 |
datatype 1 |
datatype 0.0 |
datatype 1.0 |
boolean |
boolean |
boolean |
boolean |
It will be useful to define some values of unambiguous type.
INTEGERZERO =: 3 - 3 |
datatype INTEGERZERO |
0 |
integer |
INTEGERONE =: 3 - 2 |
datatype INTEGERONE |
1 |
integer |
REALZERO =: 0.0*0.1 |
datatype REALZERO |
0 |
floating |
REALONE =: ^ 0 |
datatype REALONE |
1 |
floating |
Returning now to sparse arrays, the recommended method of constructing them
is to begin by making an empty array of
the required shape and type, but with no actual data.
An empty array is built by evaluating the expression
1 $. shape;axes;zero
where
If zero is omitted the default is REALZERO.
If both axes and zero are omitted, the default is all axes sparse
and REALZERO.
So to build a 6 by 6 matrix, sparse in all dimensions (that is, on axis 0 and axis 1),
of type integer with "zero" element of 0 we can write:
U =: 1 $. 6 6 ; 0 1; INTEGERZERO
At this point, U is empty, that is, all "zero",
so displays as nothing:
U
Populate it by inserting a few non-zero elements into it
U =: 4 5 6 7 ( 0 0 ; 1 1; 2 2; 3 3) } U
and check that U is what we expect by viewing it:
view U
4 0 0 0 0 0
0 5 0 0 0 0
0 0 6 0 0 0
0 0 0 7 0 0
0 0 0 0 0 0
0 0 0 0 0 0
30.6 Sparse and Dense Axes
An array may be sparse on some axes and dense on others.
In the following example W is sparse on its first axis
and dense on its second, because
its list of sparse axes is just 0
saw =: ,0 NB. sparse axes for W
W =: 1 $. 3 5; saw ; INTEGERZERO
W =: 4 5 6 (0 1; 0 2; 1 3) } W
It looks as expected:
view W
0 4 5 0 0
0 0 0 6 0
0 0 0 0 0
but we see that it is stored as two dense rows only:
W
0 | 0 4 5 0 0
1 | 0 0 0 6 0
Compare with an array sparse on second axis axis only,
because its list of sparse axes is 1
saz=: ,1 NB. sparse axes for Z
Z =: 1 $. 3 5; saz; INTEGERZERO
Z =: 4 5 6 (0 1; 0 2; 1 3) } Z
Z looks just like W
view Z
0 4 5 0 0
0 0 0 6 0
0 0 0 0 0
but we see it is stored as three dense colums.
Z
1 | 4 0 0
2 | 5 0 0
3 | 0 6 0
30.7 Deconstructing a Sparse Array
As we noted above, if we display U itself, we see, not the familiar representation of a matrix,
but instead
a list of index-value pairs, one pair for each non-zero
element.
U
0 0 | 4
1 1 | 5
2 2 | 6
3 3 | 7
We can extract the index from each pair to get what is
called the index-matrix of U. This is an ordinary dense array
4 $. U
0 0
1 1
2 2
3 3
To extract the value from each pair
5 $. U
4 5 6 7
As we noted above, 0 & $. will produce a dense array
from a sparse:
0 $. U
4 0 0 0 0 0
0 5 0 0 0 0
0 0 6 0 0 0
0 0 0 7 0 0
0 0 0 0 0 0
0 0 0 0 0 0
30.8 Sparse Array From Relation
Next we look at representing data as a sparse array
as an alternative to representing data as a relation (that is,
a table).
The point is that the sparse array may be more convenient
than the relation for some computations with the data.
Thus we are interested in converting between sparse arrays and relations.
For example, suppose that a given relation R represents
sales of various commodities in various cities
'Pa Qu Ro Sy' =: s: ' Paris Quebec Rome Sydney'
'Ap Ba Ch Da' =: s: ' Apples Bananas Cherries Damsons'
R =: (". ;. _2) 0 : 0
Ap ; Pa; 99
Ap ; Qu ; 50
Ba ; Qu ; 10
Ch ; Ro ; 19
Da ; Sy ; 110
Da ; Pa ; 88
)
R
+---------+-------+---+
|`Apples |`Paris |99 |
+---------+-------+---+
|`Apples |`Quebec|50 |
+---------+-------+---+
|`Bananas |`Quebec|10 |
+---------+-------+---+
|`Cherries|`Rome |19 |
+---------+-------+---+
|`Damsons |`Sydney|110|
+---------+-------+---+
|`Damsons |`Paris |88 |
+---------+-------+---+
We can convert the relation R to a sparse array as follows.
Firstly, we need to establish the domain -the set of all possible values -
of the first column. It can be computed from R :
] Fru =: > ~. 0 { |: R
`Apples `Bananas `Cherries `Damsons
Similarly for the domain of the second column:
] Cit =: > ~. 1 { |: R
`Paris `Quebec `Rome `Sydney
Now the first column converted to indices into its domain:
] r =: Fru i. > 0 { |: R
0 0 1 2 3 3
Similarly for the second column:
] c =: Cit i. > 1 { |: R
0 1 1 2 3 0
and the values from the third
] v =: > 2 { |: R
99 50 10 19 110 88
Now we build an empty sparse array of dimensions #Fru by #Cit .
By default the sparse axes will be 0 and 1 and the "zero" element will be REALZERO .
The function 1&$. produces the empty array.
A =: (1 & $.) (#Fru) , (#Cit)
Insert the values by amending in the ordinary way:
A =: v (<"1 r,.c) } A
and check we have what we expect:
view A
99 50 0 0
0 10 0 0
0 0 19 0
88 0 0 110
To display A with labelling of rows and columns,
the list of row-labels is Fru computed above,
and the list of column-labels is Cit :
(a:, <"0 Cit), (<"0 Fru) ,. (<"0 view A)
+---------+------+-------+-----+-------+
| |`Paris|`Quebec|`Rome|`Sydney|
+---------+------+-------+-----+-------+
|`Apples |99 |50 |0 |0 |
+---------+------+-------+-----+-------+
|`Bananas |0 |10 |0 |0 |
+---------+------+-------+-----+-------+
|`Cherries|0 |0 |19 |0 |
+---------+------+-------+-----+-------+
|`Damsons |88 |0 |0 |110 |
+---------+------+-------+-----+-------+
Now we have finished producing the sparse array
from the original relation, so we can can compute with our
data as an array.
For example, total value of sales for each city is given by:
+/ A
0 | 187
1 | 60
2 | 19
3 | 110
This is sparse, so taking the usual view :
view +/ A
187 60 19 110
30.9 Relation from Sparse Array
To complete the circle, we look next at how
to produce a relation from a sparse array, A for example.
A
0 0 | 99
0 1 | 50
1 1 | 10
2 2 | 19
3 0 | 88
3 3 | 110
The first step is to get the index-matrix for the non-zero elements.
] INDS =: 4 $. A
0 0
0 1
1 1
2 2
3 0
3 3
and next the values.
] VALS =: 5 $. A
99 50 10 19 88 110
The first column of the relation we produce by indexing
the domain Fru which we computed above.
The second column is produced similarly from Cit.
] c0 =: (0 { |: INDS) { Fru
`Apples `Apples `Bananas `Cherries `Damsons `Damsons
] c1 =: (1 { |: INDS) { Cit
`Paris `Quebec `Quebec `Rome `Paris `Sydney
So finally we see that the relation recovered from the sparse array is
(<"0 c0) ,. (<"0 c1) ,. (<"0 VALS)
+---------+-------+---+
|`Apples |`Paris |99 |
+---------+-------+---+
|`Apples |`Quebec|50 |
+---------+-------+---+
|`Bananas |`Quebec|10 |
+---------+-------+---+
|`Cherries|`Rome |19 |
+---------+-------+---+
|`Damsons |`Paris |88 |
+---------+-------+---+
|`Damsons |`Sydney|110|
+---------+-------+---+
This is the end of Chapter 30
|