Intro

This page is based on posts in a thread from the J forum discussing how to introduce mathematical formulae to students using J. It attempts to illustrate a basic approach for the use of J to construct mathematical formulas using mean and standard deviation as examples. The page uses only explicit, rather than tacit, notation.

There is also a version of this available as a J lab.

The Mean

The arithmetic mean is often used to provide an estimate of the middle or center of a set of values. The mathematical formula for the mean is:

In words this formula says:

Let's let y represent the numbers: 4 5 6 2 3 4

In J we do that by using the symbol  =:  (Copula) to assign the numbers to a name, in this case the name is y.

   y=: 4 5 6 2 3 4

We can check the contents of a name by typing the name and pressing the Enter key.

   y
4 5 6 2 3 4

So the first step in calculating the mean was to sum our set of values y ($$ \sum y $$) :

   +/y
24

The number of values in y ($$ n $$) is:

   #y
6

So the sum divided by the number of values is:

   24 % 6
4

We can do this calculation in one step:

   (+/y) % #y
4

We can assign or define a name for this formula by creating a function, or verb in J terminology, for easy reuse. We do this using the symbol  =:  in the same way as for assigning the set of numbers to y. This time though we need to tell J that the phrase is a verb:

   mean=: verb def '(+/y) % #y'

Let's test our new verb with our original set of numbers y:

   mean y
4

Yes, that gives the same answer, however the real reason for creating a verb is that we can now use it with any set of numbers:

   mean 3 7 4 5 8 2 3 4 5
4.55556

Try it with your own numbers!

The Standard Deviation

Apart from knowing where the center of a set of values is, we often also want to know about their spread, in other words how spread out they are. The range can be used but is very simplistic as it only considers two values in the set - the maximum and the minimum. The standard deviation is much better at discriminating between the spread of different sets of values. It attempts to give you an idea of the average distance of the values in a set from their mean.

The mathematical formula for the standard deviation is:

In words this formula says:

This is a bit of a mouthful so let's take it one step at a time using the set of values y that we defined earlier ...

The "deviations of the values from their mean" ($$ y - \bar{y} $$) are:

   y                    NB. remind ourselves what y is
4 5 6 2 3 4
   mean y
4
   y - mean y           NB. deviations from the mean
0 1 2 _2 _1 0

The symbol  *:  raises values to the power of 2 (squares them) so the "sum of the squared deviations" ($$ \sum (y - \bar{y})^2 $$) is:

   *: 0 1 2 _2 _1 0     NB. squared deviations
0 1 4 4 1 0   
   +/ *: 0 1 2 _2 _1 0  NB. sum of squared deviations
10

... or combined:

   +/ *: y - mean y     NB. sum of squared deviations of values from their mean
10

The next step is to divide that sum (10) by 1 less than the number of values.
1 less than the number of values is:

   <:#y                 NB. 1 less than the number of values
5

So we want "10 divided by 5" or "5 divided into 10"

   5 %~ 10
2
   (<:#y) %~ 10
2
   (<:#y) %~ +/ *: y - mean y
2

Finally, we need to take the square root:

   %: 2
1.41421
   %: (<:#y) %~ +/ *: y - mean y
1.41421

It is again helpful to define the formula as a verb for easy reuse:

   stddev=: verb def '%: (<:#y) %~ +/ *: y - mean y'

... and test it with different sets of values:

   stddev y
1.41421
   stddev 3 7 4 5 8 2 3 4 5
1.94365

Authors

RicSherlock

Thanks go to Richard Hill for the suggestion of putting this on the wiki and BillLam for his suggestion of adding latex markup of the formulae. Thanks also to Don Watson for prompting the discussion in the first place.

See Also


CategoryWorkInProgress

RicSherlock/StatswithJ (last edited 2009-06-21 23:48:26 by RicSherlock)