20. New Model Computer
By Eugene McDonnell. First published in Vector, 15, 4, (April 1999), 106-112.
Donald Knuth began writing his series called The Art of Computer Programming in 1962, and devised a mythical computer called MIX with which to illustrate computer language programs, and as a machine to be used for exercises. Half a lifetime has gone by since then – half a human lifetime, three or four computer lifetimes. His MIX design is now obsolete. A new machine, called MMIX, will replace it in subsequent revisions of his volumes. A preliminary writeup of this new machine is given in Knuth’s web site.
His home page address is http://sunburn.stanford.edu/~knuth
Briefly, MMIX operates on 64-bit words. It has 256 general-purpose 64-bit registers that can hold either fixed point or floating point numbers, or characters, or arbitrary binary data. Most instructions have the form OP X Y Z, where OP, X, Y, and Z are each 8-bit bytes. If OP is the ADD instruction, for example, the meaning is “X=Y+Z”, that is, “Set register X to the contents of register Y plus the contents of register Z.” There are 256 op codes that fall into a dozen or so categories. The virtual memory available to an application is 2^64 bytes. There are no input-output instructions; files are handled as memory-mapped data. There is no operating system at the moment for MMIX. Knuth writes, “Whenever I have been asked if I will be writing a book about operating systems, my reply has always been ‘Nix.’ Therefore the name of MMIX’s operating system, NNIX, should come as no surprise.”
The purpose of this column is to discuss a few J-related aspects of this machine. I am attempting just now to write a J model of it. Time will tell if and how well I do this.
On this machine numbers are 64 bits long, though the design tolerates shorter numbers. For floating point numbers it uses the IEEE/ANSI standard 64-bit form, and accommodates the 32-bit forms only to transform them to the 64-bit form in computations. More interestingly, integer numbers are also 64 bits long. Thus an unsigned integer can take on any value from zero through 18,446,744,073,709,551,615, or approximately 1.8e19. Signed integers range between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807. One of the reasons I chose J as simulation language is its extended integers, which allow me to use these large numbers directly, with the least amount of fuss: an occasional terminal ‘x’ on a number, or ‘x:’ preceding a number.
Many years ago I heard Luther Woodrum, the one who gave APL a good grade (he designed and implemented the grade primitives in APL\360) ask, “When will computer designers put in a maximum instruction?” In almost all computers, finding the maximum of two numbers involves a branch instruction. Something like this:
condition is x is less than y
branch to Label if condition is true
maximum is x
branch to exit
Label: maximum is y
One of the great motivations of computer designers is to reduce the number of branches required by programs executing at the machine level. Branches really slow things down. I recently attended, along with Larry Breed (the key designer and implemetor of APL\360), a talk Knuth gave on MMIX. I had read the preliminary MMIX writeup and hadn’t found a maximum instruction, so after the lecture, in the question period, I asked how one found a maximum on MMIX. Knuth fumbled a bit, and gave an explanation which I didn’t hear very well. Larry, however seemed to understand and accept it, so I didn’t pursue the issue then, but the next day I phoned Larry and asked him for more. He explained that there was an identity that could be used to obtain maximum at the machine level without a branch, namely:
a max b is b + 0 max a - b
I said to Larry that that looked a bit circular, didn’t it, defining maximum in terms of itself? Larry assured me that the circularity is only apparent. He said also that, before retiring from IBM a few years ago, he had been instrumental in getting the architects of an IBM computer (I think the RS6000) to add a “difference or zero” instruction, primarily to handle maximum, and that now Knuth had devised a similar feature for MMIX. It’s called ‘zero or set if negative’, written ZSNI rX, rY, Z, which acts as follows:
If register Y is negative, register X is set to Z, otherwise nothing happens. To see how this works in finding maximum, look at the three cases below. The two values are in A and B and the result is to appear in B.
case 1 case 2 case 3 A=8 B=3 A=3 B=8 A=3 B=3 SUB A,A,B 5 3 -5 8 0 3 A=A-B ZSNI A,A,0 5 3 0 8 0 3 A=0 max A ADD B,A,B 5 8 0 8 0 3 B=A+B
Somewhere below the machine instruction level there is a branch, but the machine is happy nevertheless, because the three instructions above execute very rapidly, and in a pipelined machine the pipeline is undisturbed, since no instruction-level branch is needed.
The next thing I’d like to mention is exemplified by a frustrating aspect of IEEE floating point as implemented by two different systems: the little-endian and the big-endian school. In the little-endian implementations the bytes are numbered from left to right, 7 through 0, and the bytes in the big-endian school are numbered 0 through 7. This leads to strange things like, in J, looking at the value of a floating point number on a Macintosh and on an Intel machine gives two different pictures, one reversed from the other. Another bit of explanation I need to give you is that Knuth uses the prefix # in front of a number expressed in hexadecimal. One of the ‘bit fiddling’ instructions is ‘multiple or’, denoted by MOR. The explanation given for it is:
Suppose the 64 bits of registers Y and Z are indexed by pairs such that the bytes are numbered from 0 to 7 and the bits within the bytes also by 0 through 7; then the MOR operation replaces bit ij of register X by the bit
y0jzi0 or y1jzi1 or ... or y7jz7i
Thus, for example, if register Z contains #0102040810204080, MOR reverses the order of the bytes in register Y, converting between little-endian and big-endian addressing.
What Knuth has designed, if the above is opaque to you, is a special case of the good old-fashioned ‘or dot and’ inner product. When I came to describing this instruction in my simulation, in addition to standard housekeeping, all I had to write was
U =. B +./ . *. A
where B and A are 8-by-8 boolean matrices representing the 64 bits in the two operands.
If I get anywhere with this simulation, I’ll probably report on it in a future column.