Sparse Arrays in J

Sparse Arrays in J

Introduction
Representation
Assertions
The Verb $.
Further Examples
Sparse Linear Algebra
Implementation Status

Introduction

We describe a sparse array extension to J using a representation that “does not store zeros”. One new verb $. is defined to create and manipulate sparse arrays, and existing primitives are extended to operate on such arrays. These ideas are illustrated in following examples:

   ] d=: (?. 3 4$2) * ?. 3 4$100
 0 75  0 53
 0  0 67 67
93  0 51 83

convert d to sparse and assign to s

the display of s gives the indices of the
“non-zero” cells and the corresponding values

   d -: s
1

d and s match

   o. s
0 1 | 235.619
0 3 | 166.504
1 2 | 210.487
1 3 | 210.487
2 0 | 292.168
2 2 | 160.221
2 3 | 260.752

π times s

   o. d
      0 235.619       0 166.504
      0       0 210.487 210.487
292.168       0 160.221 260.752

π times d

   (o. s) -: o. d
1

function results independent of representation

   0.5 + o. s
0 1 | 236.119
0 3 | 167.004
1 2 | 210.987
1 3 | 210.987
2 0 | 292.668
2 2 | 160.721
2 3 | 261.252

   <. 0.5 + o. s
0 1 | 236
0 3 | 167
1 2 | 210
1 3 | 210
2 0 | 292
2 2 | 160
2 3 | 261

   (<. 0.5 + o. s) -: <. 0.5 + o. d
1

function arguments can be dense or sparse

   (d + s) -: 2*s
1

   (d + s) -: 2*d
1

   +/ s
0 |  93
1 |  75
2 | 118
3 | 203

   +/"1 s
0 | 128
1 | 134
2 | 227

familiar algebraic properties are preserved

reverse

transpose

   $.^:_1 |: s
 0  0 93
75  0  0
 0 67 51
53 67 83

   (|:s) -: |:d
1

$.^:_1 converts a sparse array to dense

ravel; a sparse vector

To the top.

Representation

A sparse array y may be boolean, integer, floating point, complex, literal, or boxed, and has the (internal) parts sh;a;e;i;x;flag where:

`sh`	Shape, `$y` . Elements of the shape must be less than `2^31` , but the product over the shape may be larger than `2^31` .
`sh`	Axe(s), a vector of the sorted sparse (indexed) axes.
`e`	Sparse element (“zero”). `e` is also used as the fill in any overtake of the array.
`i`	Indices, an integer matrix of indices for the sparse axes.
`x`	Values, a (dense) array of usually non-zero cells for the non-sparse axes corresponding to the index matrix `i` .
`flag`	Various bit flags.

For the sparse matrix s used in the introduction,

   ] d=: (?. 3 4$2) * ?. 3 4$100
 0 75  0 53
 0  0 67 67
93  0 51 83

   ] s=: $. d
0 1 | 75
0 3 | 53
1 2 | 67
1 3 | 67
2 0 | 93
2 2 | 51
2 3 | 83

The shape is 3 4 ; the sparse axes are 0 1 ; the sparse element is 0 ; the indices are the first two columns of numbers in the display of s ; and the values are the last column.

Scalars continue to be represented as before (densely). All primitives accept sparse or dense arrays as arguments (e.g. sparse+dense or sparse$sparse). The display of a sparse array is a display of the index matrix (the i part), a blank column, a column of vertical lines, another blank column, and the corresponding value cells (the x part).

Letting the sparse element be variable rather than fixed at zero makes many more functions closed on sparse arrays (e.g. ^y or 10+y or -.y), and familiar results can be produced by familiar phrases (e.g. <.0.5+y for rounding to the nearest integer).

To the top.

Assertions

The following assertions hold for a sparse array, and displaying a sparse array invokes these consistency checks on it.

`imax =: _1+2^31`	the largest internal integer
`rank =: #@$`	rank
`type =: 3!:0`	internal type

`1 = rank sh`	vector
`sh -: <. sh`	integral
`imax >: #sh`	at most `imax` elements
`(0<:sh) *. (sh<:imax)`	bounded by `0` and `imax`

`1 = rank a`	vector
`a e. i.#sh`	bounded by `0` and rank-1
`a -: ~. a`	elements are unique
`a -: /:~ a`	sorted

`0 = rank e`	atomic
`(type e) = type x`	has the same internal type as `x`

`2 = rank i`	matrix
`4 = type i`	integral
`(#i) = #x`	as many rows as the number of items in `x`
`({:$i) = #a`	as many columns as there are sparse axes
`(#i) <: */a{sh`	# rows bounded by product over sparse axes lengths
`imax >: */$i`	# elements is bounded by `imax`
`(0<:i) *. (i <"1 a{sh)`	bounded by `0` and the lengths of the sparse axes
`i -: ~.i`	rows are unique
`i -: /:~ i`	rows are sorted

`(rank x) = 1+(#sh)-#a`	rank equals 1 plus the number of dense axes
`imax >: */$x`	# elements is bounded by `imax`
`(}.$x)-:((i.#sh)-.a){sh`	item shape is the dimensions of the dense axes
`(type x) e. 1 2 4 8 16 32`	internal type is boolean, character, integer, real, complex, or boxed

To the top.

The Verb $.

The ranks of $. are infinite. The inverse of n&$. is (-n)&$. .

$.y converts a dense array to sparse, and conversely $.^:_1 y converts a sparse array to dense. The identities f -: f&.$. andf -: f&.($.^:_1)hold for any function f, with the possible exception of those (like overtake {.) which use the sparse element as the fill.

0$.y applies $. or $.^:_1 as appropriate; that is, converts a dense array to sparse and a sparse array to dense.

1$.sh;a;e produces a sparse array. sh specifies the shape. a specifies the sparse axes; negative indexing may be used. e specifies the “zero” element, and its type determines the type of the array. The argument may also be sh;a (e is assumed to be a floating point 0) or just sh (a is assumed to be i.#sh — all axes are sparse — and e a floating point 0).

2$.y gives the sparse axes (the a part);
(2;a)$.y (re-)specifies the sparse axes; (2 1;a)$.ygives the number of bytes required for (2;a)$.y ; (2 2;a)$.ygives the number of items in the i part or the x part for the specified sparse axes a (that is, #4$.(2;a)$.y).

3$.y gives the sparse element (the e part); (3;e)$.y respecifies the sparse element.

4$.y gives the index matrix (the i part).

5$.y gives the value array (the x part).

6$.y gives the flag (the flag part).

7$.y gives the number of non-sparse entries in array y ; that is, #4$.y or #5$.y .

8$.y removes any completely “zero” value cells and the corresponding rows in the index matrix.

To the top.

Further Examples

   ] d=: (0=?. 2 3 4$3) * ?. 2 3 4$100
13 0 0 0
21 4 0 0
 0 0 0 0

 3 5 0 0
 0 0 6 0
 0 0 0 0

convert d to sparse and assign to s

   d -: s
1

match is independent of representation

   2 $. s
0 1 2

sparse axes

   3 $. s
0

sparse element

index matrix; columns correspond to the sparse axes

   5 $. s
13 21 4 3 5 6

corresponding values

   ] u=: (2;2)$.s
0 | 13 21 0
  |  3  0 0
  |
1 |  0  4 0
  |  5  0 0
  |
2 |  0  0 0
  |  0  6 0

make 2 be the sparse axis

index matrix

corresponding values

   ] t=: (2;0 1)$.s
0 0 | 13 0 0 0
0 1 | 21 4 0 0
1 0 |  3 5 0 0
1 1 |  0 0 6 0

make 0 1 be the sparse axes

   7 {. t
0 0 | 13 0 0 0
0 1 | 21 4 0 0
1 0 |  3 5 0 0
1 1 |  0 0 6 0

   $ 7 {. t
7 3 4

take

   7 {."1 t
0 0 | 13 0 0 0 0 0 0
0 1 | 21 4 0 0 0 0 0
1 0 |  3 5 0 0 0 0 0
1 1 |  0 0 6 0 0 0 0

   0 = t
0 0 | 0 1 1 1
0 1 | 0 0 1 1
1 0 | 0 0 1 1
1 1 | 1 1 0 1

take with rank

the sparse element of 0=t is 1

   +/ , 0 = d
18

answers are independent of representation

   0{t
0 | 13 0 0 0
1 | 21 4 0 0

from

   _2 (<1 2 3)}t
0 0 | 13 0 0  0
0 1 | 21 4 0  0
1 0 |  3 5 0  0
1 1 |  0 0 6  0
1 2 |  0 0 0 _2

amend

   ] p=: (i.!n) A. i.n=: 3
0 1 2
0 2 1
1 0 2
1 2 0
2 0 1
2 1 0

all permutations of order 3

   C.!.2 p
0 1 1 0 0 1

the parity of each permutation

   $.^:_1 (_1^C.!.2 p) (<"1 p)} 1 $. n$n
 0  0  0
 0  0  1
 0 _1  0

 0  0 _1
 0  0  0
 1  0  0

 0  1  0
_1  0  0
 0  0  0

The last expression computes the complete skewed tensor of order 3.

   s=: 1 $. 20 50 1000 75 366

   $s
20 50 1000 75 366

20 countries, 50 regions, 1000 salespersons,
75 products, 366 days in a year

   */ $s
2.745e10

the product over the shape can be greater than 2^31

   r=: ?. 1e5 $ 1e6
   i=: ?. 1e5 5 $ $ s
   s=: r (<"1 i)} s

revenues
corresponding locations
assign revenues to corresponding locations

   7 {. ": s
0 0  5 30 267 | 128133
0 0 26 20 162 | 319804
0 0 31 37 211 | 349445
0 0 37 10 351 | 765935
0 0 56  6  67 | 457449
0 0 66 54 120 |  38186
0 0 71 74 246 | 515473

the first 7 rows in the display of s ;
the first row says that for country 0, region 0,
salesperson 5, product 30, day 267,
the revenue was 128133

   +/ , s
|limit error
| +/    ,s

total revenue
the expression failed on ,s because it would have required a vector of length 2.745e10

   +/@, s                     
5.00289e10

total revenue
f/@, is supported by special code

   +/+/+/+/+/ s
5.00289e10

   +/^:5 s
5.00289e10

   +/^:_ s
5.00289e10

   +/ r
5.00289e10

total revenue

  +/"1^:4 s
 0 | 2.48411e9
 1 | 2.55592e9
 2 | 2.55103e9
 3 | 2.52089e9
 4 | 2.49225e9
 5 | 2.45682e9
 6 | 2.52786e9
 7 | 2.45425e9
 8 | 2.48729e9
 9 | 2.50094e9
10 | 2.51109e9
11 | 2.59601e9
12 | 2.49003e9
13 | 2.58199e9
14 | 2.44772e9
15 | 2.47863e9
16 | 2.46455e9
17 | 2.5568e9
18 | 2.43492e9
19 | 2.43582e9

total revenue by country

   t=: +/^:2 +/"1^:2 s

   $t
1000

   7{.t
0 | 4.58962e7
1 | 4.81548e7
2 | 3.97248e7
3 | 4.89981e7
4 | 4.85948e7
5 | 4.69227e7
6 | 4.22094e7

total revenue by salesperson

To the top.

Sparse Linear Algebra

Currently, only sparse matrix multiplication and the solutions of tri-diagonal linear system are implemented. For example:

   f=: }. @ }: @ (,/) @ (,."_1 +/&_1 0 1) @ i.

indices for a 5 by 5 tri-diagonal matrix

   s=: (?. 13$100) (<"1 f 5)} 1 $. 5 5;0 1

   $s
5 5

The phrase 1$.5 5;0 1 makes a sparse array with shape 5 5 and sparse axes 0 1 (sparse in both dimensions); <"1 f 5 makes boxed indices; and x (<"1 f 5)}y amends by x the locations in y indicated by the indices (scattered amendment).

   ] d=: $.^:_1 s
13 75  0  0  0
45 53 21  0  0
 0  4 67 67  0
 0  0 93 38 51
 0  0  0 83  3

the dense representation of s

   ] y=: ?. 5$80
10 60 36 42 17

   y %. s
1.27885 _0.0883347 0.339681 0.202906 0.0529263

y %. d	answers are independent of representation
`1.27885 _0.0883347 0.339681 0.202906 0.0529263`

   s=: (?. (_2+3*1e5)$1000) (<"1 f 1e5)} 1 $. 1e5 1e5;0 1

   $s
100000 100000

   y=: ?. 1e5$1000

s is a 1e5 by 1e5 matrix

   ts=: 6!:2 , 7!:2@]

time and space for execution

   ts 'y %. s'
0.28 5.2439e6

0.28 seconds; 5.2 megabytes (Pentium 266 Mhz)

To the top.

Implementation Status

As of 1999-09-03 12:00 , the following facilities support sparse arrays:

`= d`	`=.`	`=:`
`<`	`<.`	`<:`
`>`	`>.`	`>:`
`_`	`_.`	`_:`

`+`	`+.`	`+:`
`*`	`*.`	`*:`
`-`	`-. m`	`-:`
`%`	`%. d`	`%:`

`^`	`^.`
`$ m`	`$.`	`$:`
`~`		`~: d`
`\|`	`\|.`	`\|:`

	`..`	`.:`
`:`	`:.`	`::`
`,`	`,.`	`,:`

`#`
`!`	`!.`	`!:`
`/ m`
`\ m`	`\. m`

`[`	`[.`	`[:`
`]`	`].`
`{ d`	`{.`	`{:`
`} d`	`}.`	`}:`

`"`	`".`	`": m`
`		`:
`@`	`@.`	`@:`
`&`	`&.`	`&:`

`j. m`
`o.`
`r. m`
`_9: to 9:`

`3!:0`
`3!:1`
`3!:2`
`3!:3`
`4!:55`

Notes:

•	Sparse literal and boxed arrays not yet implemented.
•	The dyad `%.` only implements the case of triadiagonal matrices.
•	Boxed left arguments for `\|:` (diagonal slices) not yet implemented.
•	The monads `f/` and `f/"r` are only implemented for `+ * >. <. +. *. = ~: ,` (and only boolean arguments for `=` and `~:`); on an axis of length 2, the monads `f/` and `f/"r` are implemented for any function.
•	`{` and `}` only accept the following index arguments: integer arrays, `<"1` on integer arrays, and scalar boxed indices (respectively, item indexing, scattered indexing, and index lists `a0;a1;a2;…`).

To the top.

created:	1998-11-13 07:10
updated:	2019-10-22 23:00