Overview of J Wiki introductory material, notation FAQ, linear Diophantine equations

## NYC JUG Meeting of 20070313

In this meeting, we talked about how we think about how the introductory material on the J Wiki might be better organized and what could be added to provide more motivation for learning J, the idea of a FAQ to address recurring questions on issues of "notation", solving linear Diophantine equations, and improving the performance of a long-running adverb.

## Meeting Agenda for NYC JUG 20070313

```             Meeting Agenda for NYC JUG 20070313
-----------------------------------

Working with the J Wiki
-----------------------
1. Beginner's regatta: overview of intro material on J Wiki -
how might it be better organized?  Where is the motivational material?
See section 4 below.

2. Show-and-tell: in progress on J Wiki: notationFAQ,
linearDiophantineEquations.

from "getVarInfo" to "gavInfo": from 150 hours to 12 hours.

Possible enhancements: how to divide more evenly in 2 dimensions?
Alternately, how to represent subdivisions within existing divisions?

4. Learning and teaching J: what motivational examples can we put
on the J Wiki?  How can we make FAQs more transparent, more geared
toward actual beginner questions?

+.--------------------+.

Like waves of the sea,
events have two sides:
either you ride them out
or they ride you down.
- Arabic proverb```

## Proceedings

We debated the layout of the J Wiki in the "Beginner's regatta", discussed what should be in a J language FAQ specifically in relation to notational "oddities", looked at some work on solving linear Diophantine equations, briefly discussed performance improvement of an adverb used in working on the Netflix Challenge, and discussed what examples we could provide to motivate people to learn J.

### Beginner's regatta

The consensus on the J Wiki was that it was bit crowded and not sufficiently visually interesting. The initial page does not make one think "Really? Tell me more!"

We debated the idea of a splash page, common on many sites, but decided this was more annoying than compelling. One example of compelling J some of us like is the cover of Howard Peelle's J book: it's festooned with short J definitions having familiar names like "pascal" and "pythagoras".

### Show and Tell

See [NYCJUG/notationFAQ] for the answer to a commonly asked question about one of J's prominent notational conventions: why is the default order of operation right-to-left?

See [NYCJUG/linearDiophantineEquations] for the discussion on these equations.

I developed the adverb "getVarInfo" to apply an arbitrary function (verb) to a list of variables on file specified by the 99 names in the vector "UVN" in the directory specified by "VDIR". This is for handling a very large array by breaking it into a hundred or so pieces. Instead of applying a function to the array directly, we do so indirectly using this adverb which applies the supplied left function "u" to each of the pieces on file.

So, an expression like dts=: (3&{) getVarInfo&.>(<VDIR);&.>UVN applies 3&{ to get the date row from each matrix on file.

```NB.* getVarInfo: apply arbitrary function to each (filed) var named.
getVarInfo=: 1 : 0
'dd varnm'=. y.                 NB. Vars dir, var names.
rc=. dd unfileVar_WS_ varnm     NB. Get var from file
if. >{.rc do. rc=. 1;u. ".varnm NB. Do something to it
[4!:55 <varnm               NB. Erase when done to conserve space
end.
rc
NB.EG ({."1,.{:"1) getVarInfo &.>(<'C:\data\');&.>'var1';'var2';'var3'
NB.EG dts=: (3&{) getVarInfo&.>(<VDIR);&.>MVN
)```

The function "unfileVar_WS_" above is from "WS.ijs" found at [Scripts/File J Variables]. This function allows us to read and write a J variable from and to a file.

The newer version of this adverb takes two verbs instead of one. This allows us to apply the function of interest (see "accumCMRatings" below) to a group of arrays at a time by specifying an appropriate concatenation, ",." in this case, to several variables before applying the function of interest. For some functions, it's much faster to work on larger pieces.

```NB.* gavInfo: apply arbitrary fnc "u" to all file-vars joined by fnc "v".
gavInfo=: 2 : 0
for_dv. y do. 'dd varnm'=. >dv  NB. Vars dir, var names.
rc=. dd unfileVar_WS_ varnm
if. >{.rc do.
if. -.nameExists 'cumvals' do. cumvals=. ".varnm
else. cumvals=. cumvals v ".varnm end.
4!:55 <varnm
end.
end.
1;u cumvals
NB.EG (_2 ({."1,.{:"1) gavInfo ,.)\(<'C:\data\');&.>'var1';'var2';'var3';'var4'
)```

### Timings

The timings below, first for the original adverb "getVarInfo" then for the newer adverb "gavInfo" are mis-leading as presented. This is because the first version was taking so long - it had been running for days and was only about halfway through the files - I came up with the newer version while the older one was still running. I then moved the unprocessed files, 32 of the 99, to an alternate directory and ran the new version on those.

Once both versions finished, I combined the results.

#### Session Using Original Adverb ''getVarInfo''

```   CTMAT=: 0\$~CMClassVars ''     NB. Initialize the global we'll be updating
6!:2 'accumCMRatings getVarInfo&.>(<VDIR);&.>UVN'
409856.93
0 60 60#:409856.93            NB. Number of seconds as hours, minutes, and seconds
113 50 56.93```

Note that this timing was for the first 67 files whereas the following is for the remaining 32 files. A little forethought in the design of this code allowed it to fail gracefully when I pulled the rug out from under the first adverb by removing some of its files after it had started running. Note that this graceful failure, combined with good modularization, also helps for coarse-grained parallelism: I can run multiple instances of this code on distinct sets of files on separate machines or different cores of the same processor.

Now, we finish the job using the newer adverb "gavInfo" which we apply to blocks of 8 variables from file at a time using the scan adverb "\" with a negative number to specify non-overlapping windows. The value "8" happens to divide evenly into the 32 files I had remaining but this doesn't matter for the result. A short block at the end would have been processed just fine.

#### Separate Session Using Updated Adverb ''gavInfo''

```   CTMAT=: 0\$~CMClassVars ''         NB. Initialize the global
VD2=: 'C:\Data\Netflix\AltVDir\'  NB. Specify alternate file variables' directory
6!:2 '_8 (accumCMRatings gavInfo ,.)\(<VD2);&.>UVN'
14266
0 60 60#:14266
3 57 46```

We see that we needed almost 114 hours for the first 67 files was whereas we processed the final 32 files in less than 4 hours. So, extrapolating to the full set, the first version would have taken about 168 hours versus about 12 for the newer adverb.

#### Verb and Sub-functions Used with Adverbs

```NB.* accumCMRatings: group (cust,movie) ratings by averages partitions.
accumCMRatings=: 3 : 0
NB. VDIR unfileVar_WS_ 'umurd0'
cn=. classifyCMRatings ptnVar y
cr=. cn </. 2{y       NB. Customer-movie ratings by CM-class
cn=. ~.cn             NB. Class number/partition
NB.   ctmat=. (NCC,NMC)\$0   NB. Count # ratings/class
nd=. >:<.10^.NCC*NMC  NB. Max # digits in total class number
for_fnum. i.#cn do.   NB. Rating info into appropriate CM-class file-var
vnm=. 'cmclass',(-nd){.(nd\$'0'),":fnum{cn
VDIR unfileVar_WS_ vnm
(vnm)=: (".vnm),>fnum{cr
VDIR fileVar_WS_ vnm
4!:55 <vnm
ix=. <(NCC,NMC)#:fnum{cn
CTMAT=: ((#>fnum{cr)+ix{CTMAT) ix}CTMAT
end.
NB. accumCMRatings getVarInfo&.>(<VDIR);&.>UVN[CTMAT=: 0\$~CMClassVars ''
)

NB.* classifyCMRatings: find Cust-Movie class assuming "cbpv" and "mclass".
classifyCMRatings=: 3 : 0
cc=. <:+/cbpv </ mean&>2{&.>y   NB. Customer class based on avg movie rating
mc=. mclass{~(0{mvnums) i. ;0{&.>y
cc=. cc#~(1{\$)&>y               NB. Customer class/rating entry
cn=. mc+NMC*cc                  NB. Class number: (cust, movie)
)

NB.* countBiClass: count # (cust,movie) per bidimensional equi-rating groups.
countBiClass=: 3 : 0
if. 0=#y do. y=. 10 10 201 end.
'cnb mnb bsz'=. y          NB. Cust # breakpoints, Movie # brkpts, block sz
CMClassVars cnb,mnb        NB. Vars: NCC, NMC, cbpv, mbpv, cclass, mclass
ctmat=. (cnb,mnb)\$0
for_cb. i.>.NCUST%bsz do.
len=. bsz<.NCUST-cb
'ct ix'=. <"1 |:frtab ,(mnb*cclass{~cb+i.len)+/mclass
ix=. <"1](cnb,mnb)#:ix
ctmat=. (ct+ix{ctmat) ix}ctmat
end.
ctmat
)

NB.* findEqualBinBreakpoints: find distinct values to partition vec equally.
findEqualBinBreakpoints=: 3 : 0
NB. avgbp=. /:~avgmr=. %/2{.MMR
'nbp vals'=. y.
vals=. /:~vals
anpb=. nbp%~#vals          NB. Average number per bin
NB. Locate index of 1st instance of breakpoint value
bpi=. vals i. vals{~<.0.5+anpb*i.>.anpb%~#vals
bp=. bpi{vals
bp=. bp#~whUnq bp
bp;(<./,>./,mean,stddev) 2-~/\bp     NB. Stats on breakpoint differences
NB.EG 'bpv stats'=. findEqualBinBreakpoints 100;%/2{.MUR
)```

## Scan of Meeting Notes

CategoryNYCJUGMeeting

NYCJUG/2007-03-13 (last edited 2011-04-27 15:36:59 by DevonMcCormick)