4. A First Look At J Programs

Before we get into learning the details of J, let's look at a couple of realistic, if simple, problems, comparing solutions in C to solutions in J. The J code will be utterly incomprehensible to you, but we will nevertheless be able to see some of the differences between J programs and C programs. If you stick with me through this book, you will be able to come back at the end and understand the J code presented here.

Average Daily Balance

Here is a program a bank might use. It calculates some information on accounts given the transactions that were performed during a month. We are given two files, each one containing numbers in lines ended by (CR,LF) and numeric fields separated by TAB characters (they could come from spreadsheets). Each line in the Accounts file contains an account number followed by the balance in the account at the beginning of the month. Each line in the Journal file contains an account number, the day of the month for a transaction, and the amount of the transaction (positive if money goes into the account, negative if money goes out). The records in the Journal file are in order of date, but not in order of account. We are to match each journal entry with its account, and print a line for each account giving the starting balance, ending balance, and average daily balance (which is the average of each day's closing balance). The number of days in the month is an input to the program, as are the filenames of the two files.

I will offer C code and J code to solve this problem. To keep things simple, I am not going to deal with file-I/O errors, or data with invalid format, or account numbers in the Journal that don't match anything in the Accounts file.

C code to perform this function might look like this:

#include <stdio.h>

#define MAXACCT 500

// Program to process journal and account files, printing

// start/end/avg balance. Parameters are # days in current

// month, filename of Accounts file, filename of Journal file

void acctprocess(int daysinmo, char * acctfn, char *jourfn)

{

FILE fid;

int nacct, acctx;

float acctno, openbal, xactnday, xactnamt

struct {

float ano; // account number

float openbal; // opening balance

float prevday; // day number of last activity

float currbal; // balance after last activity

float weightbal; // weighted balance: sum of closing balances

} acct[MAXACCT];

// Read initial balances; set day to start-of-month, sum of balances to 0

fid = fopen(acctfn);

for(nacct = 0;2 == fscanf(fid,"%f%f",acctno,openbal) {

acct[nacct].ano = acctno;

acct[nacct].openbal = openbal;

acct[nacct].prevday = 1;

acct[nacct].currbal = openbal;

acct[nacct].weightbal = 0;

++nacct;

}

fclose(acctfn);

// Process the journal: for each record, look up the account

// structure; add closing-balance values for any days that

// ended before this journal record; update the balance

fid = fopen(jourfn);

while(3 == fscanf(fid,"%f%f%f",acctno,xactnday,xactnamt) {

for(acctx = 0;acct[acctx].ano != acctno;++acctx);

acct[nacct].weightbal +=

acct[nacct].currbal * (xactnday - acct[nacct].prevday);

acct[nacct].currbal += xactnamt;

acct[nacct].prevday = xactnday;

}

// Go through the accounts. Close the month by adding

// closing-balance values applicable to the final balance;

// produce output record

for(acctx = 0;acctx < nacct;++acctx) {

acct[nacct].weightbal +=

acct[nacct].currbal * (daysinmo - acct[nacct].prevday);

printf("Account %d: Opening %d, closing %d, avg %d\n",

acct[acctx].ano, acct[acctx].openbal, acct[acctx].currbal,

acct[acctx].weightbal/daysinmo);

}

fclose(fid);

}

The corresponding J program would look like this:

NB. Verb to convert TAB-delimited file into numeric array

rdtabfile =: (0&".;.2@:(TAB&,)@:}:);._2) @ ReadFile @<

NB. Verb to process journal and account files

NB. y is (# days in current month);(Account filename);

NB. (Journal filename)

acctprocess =: monad define

'ndays acctfn jourfn' =: y

NB. Read files

'acctano openbal' =. |: rdtabfile acctfn

'jourano jourday jouramt' =. |: rdtabfile jourfn

NB. Verb: given list of days y, return # days that

NB. each balance is a day's closing balance

wt =. monad : '(-~ 1&(|.!.(>:ndays))) 0{"1 y'

NB. Verb: given an Account entry followed by the Journal

NB. entries for the account, produce (closing balance),

NB. (average daily balance)

ab =. monad : '(wt y)({:@] , (%&ndays)@(+/)@:*)+/\1{"1 y'

NB. Create (closing balance),(average daily balance) for

NB. each account. Assign the start-of-month day (1) to the

NB. opening balance

cavg =. (acctano,jourano) ab/.(1,.openbal),jourday,.jouramt

NB. Format and print all results

s =. 'Account %d: Opening %d, closing %d, avg %d\n'

s&printf"1 acctano ,. openbal ,. cavg

)

Let's compare the two versions. The first thing we notice is that the J code is mostly commentary (beginning with NB.). The actual processing is done in 3 lines that read the files, 3 lines to perform the computation of closing and average balance, and 2 lines to print the results. J expresses the algorithm much more briefly.

The next thing we notice is that there seems to be nothing in the J code that is looping over the journal records and the accounts. The commentary says 'create balances for each account' and 'produce average daily balance for an account', tasks that clearly require loops, and yet there is nothing resembling loop indexes. This is one of the miracles of J: loops are implied; in C terminology, they are expressions rather than statements, and so they can be assembled easily into single lines of code that replace many nested loops. We will be spending a lot of time learning how to do this.

We also note that there is nothing in the J code corresponding to the #define MAXACCT 500 in the C. This is one of the things that makes programming in J so pleasant: you don't have to worry about allocating storage, or freeing it, or wondering how long is long enough for a character-string variable, or how big to make an array. Here, even though we don't know how many accounts there are until we have read the entire Accounts file, we simply read the file, split it into lines and numbers, and let the interpreter allocate as much storage as it needs to hold the resulting array.

The last thing to see, and perhaps the most important, is that the C version is just a toy program. It searches through the Accounts information for every record in the Journal file. We can test it with a small dataset and verify that it works, but if we scale it up to 10,000 accounts and 1,000,000 journal entries, we are going to be disappointed in the performance, because its execution time will be proportional to A*J where A is the number of accounts and J the number of journal entries. It is every programmer's dread: a function that will have to be rewritten when the going gets tough.

The J version, in contrast, will have execution time proportional to (A+J)*log(A+J). We did nothing meritorious to achieve this better behavior; we simply expressed our desired result and let the interpreter pick an implementation. Because we 'think big'--we treat the entire Journal and Accounts files as units--we give the interpreter great latitude in picking a good algorithm. In many cases the interpreter makes better decisions than we could hope to, because it looks at the characteristics of the data before it decides on its algorithm. For example, when we sort an array, the interpreter will use a very fast method if the range of numbers to be sorted is fairly small, where 'fairly small' depends on the number of items to be sorted. The interpreter takes great care in its implementation of its primitives, greater care than we can normally afford in our own C coding. In our example, it will use a high-speed method for matching journal entries with accounts.

Calculating Chebyshev Coefficients

This algorithm for calculating coefficients of the Chebyshev approximation of a function is taken verbatim from Numerical Recipes in C. I have translated it into J just so you can see how compact the J representation of an algorithm can be. Again, the J code will be gobbledygook for now, but it's concentrated gobbledygook.

// Program to calculate Chebyshev coefficients

// Code taken from Numerical Recipes in C 1/e

#include <math.h>

#define PI 3.141592653589793

void chebft(float a, float b, float c[], int n, float (*func)())

{

int k,j;

float fac,bpa,bma,f[300];

bma = 0.5 * (b-a)

bpa = 0.5 * (b+a)

for(k = 0;k<n;k++) {

float y = cos(PI*(k+0.5)/n);

f[k] = (*func)(y*bma+bpa);

}

fac = 2.0/n;

for (j = 0;j<n;j++) {

double sum = 0.0;

for(k = 0;k<n;k++)

sum += f[k] * cos(PI*j*(k+0.5)/n);

c[j] = fac*sum;

}

J version:

chebft =: adverb define

f =. u 0.5 * (+/y) - (-/y) * 2 o. o. (0.5 + i. x) % x

(2 % x) * +/ f * 2 o. o. (0.5 + i. x) *"0 1 (i. x) % x

)