Differences between revisions 22 and 23
 ⇤ ← Revision 22 as of 2008-11-15 16:50:29 → Size: 8415 Editor: RogerHui Comment: improved wording ← Revision 23 as of 2008-12-08 10:45:33 → ⇥ Size: 8417 Editor: anonymous Comment: converted to 1.6 markup Deletions are marked like this. Additions are marked like this. Line 5: Line 5: [http://groups.google.com/group/comp.lang.apl/browse_thread/thread/1836e834b63cfb29/f2a9ba479d6419b0#f2a9ba479d6419b0 a post to comp.lang.apl] by Ajay Askoolum [[http://groups.google.com/group/comp.lang.apl/browse_thread/thread/1836e834b63cfb29/f2a9ba479d6419b0#f2a9ba479d6419b0|a post to comp.lang.apl]] by Ajay Askoolum Line 10: Line 10: [[TableOfContents(2)]] <> Line 15: Line 15: From the [wiki:JDic:dmcapdot M. page] of the dictionary: From the [[JDic:dmcapdot|M. page]] of the dictionary: Line 366: Line 366: [[BR]] <
> Line 369: Line 369: * [:../Combinations:Combinations] * [:../Combination Index:Combination Index] * [:../Next Binary String:Next Binary String][[BR]] * [[../Combinations|Combinations]] * [[../Combination Index|Combination Index]] * [[../Next Binary String|Next Binary String]]<
>

The following problem was described in a post to comp.lang.apl by Ajay Askoolum on 2008-04-23:

Take the UK national lottery -- it has 6!49 combinations where the first (1,2,3,4,5,6) and the last (44,45,46,47,48,49) have each got a sum and the number of combinations that have the same respective sum is exactly 1. The objective was to list each sum between the two with the number of combinations ... quickly.

## Solution

From the M. page of the dictionary:

```comb=: 4 : 0 M.   NB. All size x combinations of i.y
if. (x>:y)+.0=x do. i.(x<:y),x else. (0,.x comb&.<: y),1+x comb y-1 end.
)```

A direct method:

```cs=: 4 : '({.,#)/.~ +/"1 x comb y'

3 comb 5
0 1 2
0 1 3
0 1 4
0 2 3
0 2 4
0 3 4
1 2 3
1 2 4
1 3 4
2 3 4
3 cs 5
3 1
4 1
5 2
6 2
7 2
8 1
9 1```

6 cs 49 would be a non-trivial computation. The intermediate array 6 comb 49 requires 335611584 = */4 6,6!49 bytes. There is a much more efficient method that derives from the doubly-recursive definition of comb : To compute x cs1 y , compute (x-1) cs1 y-1  and x cs1 y-1 , then combine them appropriately. Thus:

```cs1=: 4 : 0 M.
if. (x>:y)+.0=x do. (x<:y)#,:2!x,2
else. (~.@[ ,. +//.)/ |: (((x-1),0)+"1 x cs1&<: y),(x,0)+"1 x cs1 y-1 end.
)

3 (cs -: cs1) 5
1
6 (cs -: cs1) 10
1
6 (cs -: cs1) 11
1```

To correct for the 1-origin indexing in the original problem, just add x to column 0 of the result.

```   6!:2 't=. 6 0 (+"1) 6 cs1 49'
0.0252536
\$t
259 2
3{.t
21 1
22 1
23 2
_3{.t
277 2
278 1
279 1```

Due to the use of M. , reload the script or otherwise redefine cs1 to get the true timing (without this the reported time would be lower).

## Expanded Commentary

The problem is to find the distinct sums of all combinations and their counts. For example:

```   3 comb 5          NB. all size 3 combinations of i.5
0 1 2
0 1 3
0 1 4
0 2 3
0 2 4
0 3 4
1 2 3
1 2 4
1 3 4
2 3 4

+/"1 ] 3 comb 5   NB. sum of each combination
3 4 5 5 6 7 6 7 8 9

3 cs 5            NB. distinct sums and counts
3 1
4 1
5 2
6 2
7 2
8 1
9 1```

cs can be defined succinctly:

`cs=: 4 : '({.,#)/.~ +/"1 x comb y'`

The key is the use of /. , the key adverb. In x u/. y , items of x specify keys for corresponding items of y and u is applied to each collection of y having identical keys. Here, x and y are both the combination sums, and u is the verb ({.,#) , first catenated with count. Thus:

```   s=: +/"1 ] 3 comb 5
s
3 4 5 5 6 7 6 7 8 9
s ({.,#)/. s
3 1
4 1
5 2
6 2
7 2
8 1
9 1```

Use of < (box) elucidates the action of the adverb:

```   s </. s
┌─┬─┬───┬───┬───┬─┬─┐
│3│4│5 5│6 6│7 7│8│9│
└─┴─┴───┴───┴───┴─┴─┘
</.~ s
┌─┬─┬───┬───┬───┬─┬─┐
│3│4│5 5│6 6│7 7│8│9│
└─┴─┴───┴───┴───┴─┴─┘```

cs , while direct, is not very efficient. The intermediate array x comb y has shape (x!y),x , requiring */4,x,x!y bytes. In the original "national lottery" problem, x=:6 and y=:49 , requiring */4,6,6!49 or 335611584 bytes. We derive an alternative computation cs1 which is sufficiently efficient to calculate 50 cs1 100 in a couple of seconds.

The method derives from the classic doubly-recursive definition of comb , "classic" because the algorithm can be found in Gilman & Rose, APL\360: An Interactive Approach, 1969.

```comb=: 4 : 0 M.   NB. All size x combinations of i.y
if. (x>:y)+.0=x do. i.(x<:y),x else. (0,.x comb&.<: y),1+x comb y-1 end.
)```

x comb y is 1+(x-1) comb y-1 prefaced with an initial column of 0 , catenated to 1+x comb y-1 . M. is the memo adverb; u M. is the same as u but but may keep a record of the arguments and results for reuse. The use of M. makes this double-recursive definition competitive with more complicated statements of comb .

The result of cs1 is a 2-column matrix of the sums and counts. Analogs are needed for the two recursive steps 0,.1+(x-1) comb y-1  and  1+x comb y-1 . For the first step, prefacing with a column of 0 changes neither the sums nor the counts, and adding by 1 increases each sum by x-1 (because there are x-1 entries in each combination) and changes the counts not at all. Thus: ((x-1),0)+"1 (x-1) cs1 y-1 . For the second step, adding by 1 increases each sum by x and does not change the counts. Thus: (x,0)+"1 x cs1 y-1 . It remains to merge these parts together to form a single (sums,counts) result.

If p=.p0,p1 is the catenation of the two parts, whose column 0 s is the catenated sums and whose column 1 c is the catenated counts, then the overall list of sums is just the nub of the sums, i.e. ~.s , and the overall list of counts is s+//.c , the counts accumulated according to the sums. This is accomplished by (~.@[ ,. +//.)/ |:p , inserting the verb (~.@[ ,. +//.) between the rows of the transpose of p . Putting it all together:

```cs1=: 4 : 0 M.
if. (x>:y)+.0=x do. (x<:y)#,:2!x,2
else. (~.@[ ,. +//.)/ |: (((x-1),0)+"1 x cs1&<: y),(x,0)+"1 x cs1 y-1 end.
)```

The original "national lottery" problem uses 1-origin (the first combination is 1 2 3 4 5 6 instead of 0 1 2 3 4 5 ); just add 6 to each sum to account for that. Thus:

```   6!:2 't=: 6 0 (+"1) 6 cs1 49'
0.0250315```

That is, the result is computed in 0.025 seconds (2.2 GHz Athlon 3200+). The shape of t and its first and last 4 rows:

```   \$ t
259 2

4 {. t
21 1
22 1
23 2
24 3

_4 {. t
276 3
277 2
278 1
279 1```

The sum of the counts should be 6!49 :

```   +/ t
38850 13983816

6!49
13983816```

As promised:

```   6!:2 't=: 50 cs1 100'
2.05181

\$ t
2501 2

+/ t
6.18998e6 1.00891e29

50!100
1.00891e29```

## Further Improvements

The computation can be simplified and made more efficient by observing that column 0 of x cs1 y is always (2!x)+i.1+x*y-x , and that the counts for 1=x are y\$1 . Thus:

```cs2=: 4 : '((2!x)+i.0>.1+x*y-x) ,. x cs2a y'

cs2a=: 4 : 0 M.
if. (x>:y)+.1>:x do. ((y*1=x)>.x<:y)\$1
else. (m{.x cs2a&<:y) + (-m){.x cs2a y-1 [ m=. 1+x*y-x end.
)

(cs1 -: cs2)"0/~ i.10
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1

6!:2 '50 cs2 100'
0.743049```

cscheck is like cs2 but incorporates checks.

```cscheck=: 4 : 0
z=. x cs2 y
assert. (-:<.) z
assert. (\$z) -: 2,~m=. 1+x*y-x
assert. ({."1 z) -: (2!x) (+i.) m
assert. (+/{:"1 z) = x!y
assert. (-:|.) {:"1 z
assert. ({.z) -: (2!x),1
assert. ({:z) -: ((x*y-x)+2!x),1
)

\$ 50 cscheck 100
2501 2```

## Collected Definitions

```comb=: 4 : 0 M.   NB. All size x combinations of i.y
if. (x>:y)+.0=x do. i.(x<:y),x else. (0,.x comb&.<: y),1+x comb y-1 end.
)

cs=: 4 : '({.,#)/.~ +/"1 x comb y'

cs1=: 4 : 0 M.
if. (x>:y)+.0=x do. (x<:y)#,:2!x,2
else. (~.@[ ,. +//.)/ |: (((x-1),0)+"1 x cs1&<: y),(x,0)+"1 x cs1 y-1 end.
)

cs2=: 4 : '((2!x)+i.0>.1+x*y-x) ,. x cs2a y'

cs2a=: 4 : 0 M.
if. (x>:y)+.1>:x do. ((y*1=x)>.x<:y)\$1
else. (m{.x cs2a&<:y) + (-m){.x cs2a y-1 [ m=. 1+x*y-x end.
)

cscheck=: 4 : 0
z=. x cs1 y
assert. (-:<.) z
assert. (\$z) -: 2,~m=. 1+x*y-x
assert. ({."1 z) -: (2!x) (+i.) m
assert. (+/{:"1 z) = x!y
assert. (-:|.) {:"1 z
assert. ({.z) -: (2!x),1
assert. ({:z) -: ((x*y-x)+2!x),1
)```