This page is based on posts in a thread from the J forum discussing how to introduce mathematical formulae to students using J. It attempts to illustrate a basic approach for the use of J to construct mathematical formulas using mean and standard deviation as examples. The page uses only explicit, rather than tacit, notation.
There is also a version of this available as a J lab.
The arithmetic mean is often used to provide an estimate of the middle or center of a set of values. The mathematical formula for the mean is:
is the summation operator sigma and means "add the set of values together".
represents the number of values in a set.
In words this formula says:
- "to calculate the mean of some values, we sum those values and divide that sum by the number of values".
Let's let y represent the numbers: 4 5 6 2 3 4
In J we do that by using the symbol =: (Copula) to assign the numbers to a name, in this case the name is y.
y=: 4 5 6 2 3 4
We can check the contents of a name by typing the name and pressing the Enter key.
y 4 5 6 2 3 4
So the first step in calculating the mean was to sum our set of values y () :
The number of values in y () is:
So the sum divided by the number of values is:
24 % 6 4
We can do this calculation in one step:
(+/y) % #y 4
We can assign or define a name for this formula by creating a function, or verb in J terminology, for easy reuse. We do this using the symbol =: in the same way as for assigning the set of numbers to y. This time though we need to tell J that the phrase is a verb:
mean=: verb def '(+/y) % #y'
Let's test our new verb with our original set of numbers y:
mean y 4
Yes, that gives the same answer, however the real reason for creating a verb is that we can now use it with any set of numbers:
mean 3 7 4 5 8 2 3 4 5 4.55556
Try it with your own numbers!
The Standard Deviation
Apart from knowing where the center of a set of values is, we often also want to know about their spread, in other words how spread out they are. The range can be used but is very simplistic as it only considers two values in the set - the maximum and the minimum. The standard deviation is much better at discriminating between the spread of different sets of values. It attempts to give you an idea of the average distance of the values in a set from their mean.
The mathematical formula for the standard deviation is:
represents the mean of the set of values .
In words this formula says:
- "To calculate the standard deviation of a set of values, we take the sum of the squared deviations of the values from their mean, and then divide that number by 1 less than the number of values and then take the square root."
This is a bit of a mouthful so let's take it one step at a time using the set of values y that we defined earlier ...
The "deviations of the values from their mean" () are:
y NB. remind ourselves what y is 4 5 6 2 3 4 mean y 4 y - mean y NB. deviations from the mean 0 1 2 _2 _1 0
The symbol *: raises values to the power of 2 (squares them) so the "sum of the squared deviations" () is:
*: 0 1 2 _2 _1 0 NB. squared deviations 0 1 4 4 1 0 +/ *: 0 1 2 _2 _1 0 NB. sum of squared deviations 10
... or combined:
+/ *: y - mean y NB. sum of squared deviations of values from their mean 10
The next step is to divide that sum (10) by 1 less than the number of values.
1 less than the number of values is:
<:#y NB. 1 less than the number of values 5
So we want "10 divided by 5" or "5 divided into 10"
5 %~ 10 2 (<:#y) %~ 10 2 (<:#y) %~ +/ *: y - mean y 2
Finally, we need to take the square root:
%: 2 1.41421 %: (<:#y) %~ +/ *: y - mean y 1.41421
It is again helpful to define the formula as a verb for easy reuse:
stddev=: verb def '%: (<:#y) %~ +/ *: y - mean y'
... and test it with different sets of values:
stddev y 1.41421 stddev 3 7 4 5 8 2 3 4 5 1.94365
Thanks go to Richard Hill for the suggestion of putting this on the wiki and BillLam for his suggestion of adding latex markup of the formulae. Thanks also to Don Watson for prompting the discussion in the first place.
- link to post that provoked this page