Formalism in Programming Languages

Kenneth E. Iverson
International Business Machines Corporation
Yorktown Heights, New York

Received July, 1963. Presented at a Working Conference on Mechanical Language Structures, Princeton, N.J., August 1963, sponsored by the Association for Computing Machinery, the Institute for Defense Analyses, and the Business Equipment Manufacturers Association. This work was done at Harvard University while the author was a visiting lecturer, February through June, 1963.



Introduction

Although the question of equivalences between algorithms expressed in the same or different languages has received some attention in the literature, the more practical question of formal identities among statements in a single language has received virtually none. The importance of such identities in theoretical work is fairly obvious. The present paper will be addressed primarily to the practical implications for a compiler.

The formal identities can be incorporated directly into a compiler, or can alternatively be used by a programmer to derive a more efficient equivalent of a program specified by an analyst. The identities cited include (1) dualities which permit the inclusion of only one of a dual pair as a basic operator, (2) partitioning identities which permit the automatic allocation of limited fast-access storage in operations on arrays, (3) permutation identities which permit the adoption of a processing sequence suited to the particular representation used (e.g., row list or column list of a matrix), (4) general associativity and distributivity identities for double operators (determined as a function of the properties of the basic operators) which permit efficient reordering of operations, (5) transposition identities, and (6) the automatic extension of the appropriate identities to any ad hoc operations (i.e., subroutines or procedures) defined by any user of the compiler.

The discussion will be based upon a programming language which has been presented in full elsewhere [1]. However, the relevant aspects of the language will first be summarized for reference.

The problems of transliteration and syntax which commonly dominate discussions of language will here be subordinated as follows. The symbols employed will permit the immediate determination of the class to which each belongs; thus literals are denoted by roman type, variables are denoted by italics (lowercase, lowercase bold, and uppercase bold for scalar, vector and matrix, respectively), and operators are denoted by distinct (usually nonalphabetic) symbols. The problems of transliteration (i.e., mapping the set of symbols employed onto the smaller set provided in a computer) and of mapping positional information (such as subscripts and superscripts) onto a linear representation therefore can, and will, be subordinated to questions of the structure of an adequate language.
 

The Language[a]

1.  

The left arrow “←” denotes “specification”, and each[b] statement in the language is of the form
  x
where x is a variable and is some function.
 

2.  The application of any unary operator Ο to a scalar argument x is denoted by Οx, and the application of a binary operator Ο to the arguments x, y is denoted by xΟy. The set of basic operators and symbols in shown in Table 1. The use of the same symbol for a binary and a unary operator (e.g., xy for min(xy) and x for largest integer not exceeding x) produces no ambiguity and does conserve symbols.

As shown in Table 1, any relation is treated as an operator (denoted by the usual symbol for the relation) having the range zero and one (logical variables). Thus, for integers i and j, the operator “=” is equivalent to the Kronecker delta.

3.  The ith component of a vector x is denoted by xi, the ith row vector of a matrix M by M i, the jth column vector by Mj, and the (i, j)th element by Mji. A vector may be represented by a list of its components separated by commas. Thus, the statement
  x ← 1,2,3,4
specifies x as a vector of dimension 4 comprising the first four positive integers. In particular, catenation of two vectors x and y is denoted by x,y.
 
4.  Operators are extended component-by-component to arrays. Thus if Ο is any operator (unary or binary as appropriate),[c]
  r ← Οx   ↔   ri ← Οxi
  rxΟy   ↔   rixiΟyi
  R ← ΟN   ↔   Rji ← ΟMji
  RMΟN   ↔   RjiMjiΟNji.

5.  The order of execution of operations is determined by parentheses in the usual way and, except for intervening parentheses, operations are executed in order from right to left, with no priorities accorded to multiplication or other operators.
 
6.  Certain unary operators are defined upon vectors and matrices rather than upon scalars. These appear in Table 2 and include the dimension operators ν and μ as well as the transposition operators , , , , in which the symbols indicate the axis of transposition of a matrix.
 
7.  It is convenient to provide symbols for certain constant vectors and matrices as shown in Table 3. The parenthetic expression indicating the dimension of each may be elided when it is otherwise determined by conformability with some known vector.
 
8.   If α(i) denotes one of a family of variables (e.g, scalars xi or xi, vectors xi or X i or Xi, or matrices iX) for i belonging to some index set i, and if Ο is a binary operator, then for any set si,
  Οsi/α(i) ↔ α(s1) Ο α(s2) Ο … Ο α(sνs).
If
  α(i) = xi and s = 1(νx),
or if
  α(i) = Xi and s = 1(νX),

then s and i may he elided. Thus,

  +/x =  x1 + x2 + … + xνx,
  ∧/x =  x1x2 ∧ … ∧ xνx,
  +/X =  X1 + X2 + … + XνX, etc.

If α(i) = X i and s = 1(μX), then the s and i may be elided provided that a second slash be added to distinguish this case from the preceding one. Thus,
  Ο//X  =  X1 Ο X2 Ο … Ο XμX.

9.   If α is any argument and Ο is any binary operator, then Οn/α denotes the nth power of α with respect to Ο. Formally,
  Οn/αα Ο α Ο … Ο α (to n terms).
Hence Ο1/α = α, Ο–1/α is the inverse of α with respect to Ο, and Ο0/α is the identity element of the operator Ο (if they exist).
 
10.  If Ο1 and Ο2 are binary operators, then the matrix product A B is a matrix of dimension μA × νB defined by:

  (A B)ji = Ο1 / Ai Ο2 Bj.

In particular, A B denotes the ordinary matrix product. Moreover, the pair () behaves as a binary operator on A and B and hence may be treated as a binary operator. For example, applying the notation of part 9, ()–1/A denotes the ordinary inverse of A.

If the post-multiplier is a vector x (i.e., a matrix of one column), the usual conventions of matrix algebra are applied:
  (A x)i = A i x = +/ A i × x.
Similarly,
  (x B)j = x Bj , and x y = +/ x × y.

11.  The outer product of two vectors x and y is denoted by x y and defined as the matrix M of dimension νx × νy such that Mji = xiΟyj.
 
12.  Deletion from a vector x of those components corresponding to the zeros of a logical vector u of like dimension is called compression and is denoted by u/x. Compression is extended to matrices both row-by-row and column-by-column as follows: .
  Yu/X     ↔   Y i = u/X i
  Yu//X   ↔ Yj = u/Xj.

11.  If p is any vector containing only indices of x, then xp is defined as follows:
  yxp  ↔  yi = xpi, i ε 1(νp) .
If p is a permutation vector (containing each of its own indices once) and if νp = νx, then xp is a permutation of x.

Permutation is extended to matrices by row and by column as follows:
  YXp   ↔   Y i = (X i)p
  YX p   ↔   Yj = (Xj)p.

12.  Left rotation is a special case of permutation denoted by kx and defined by
  ykx   ↔   yi = x(νx)|k+i.
Right rotation is denoted by kx and is defined analogously.

A noncyclic left rotation (left shift) denoted by is defined as follows:
  k x   ↔   (~ k) × kx.
(The zero attached to the shaft of the arrow suggests that zeros are drawn into the “evacuated” positions). Similarly,
  k x   ↔   (~ k) × kx.
 
Rotations are extended to matrices in the usual way, a doubled symbol (e.g., ) denoting rotation of columns. For example,
  (k X) i   =   ki X i,
and (k) is a matrix with ones on the kth superdiagonal.[d]
 

13.  Any new operator defined (e.g., by some algorithm, usually referred to as a subroutine) is to be denoted in accordance with Definition (2) and is extended to arrays exactly as any of the basic operators defined in the language. For example, if x gcd y (or, better, xy) is used to denote the greatest common divisor of integers x and y, then xy, /x, and Xy are automatically defined. Moreover, if n is a vector of integers and F i represents the prime factorization of ni with respect to the vector of primes p (that is, n = F p), then clearly /n = (//F) p. Similarly, if xy denotes the l.c.m. of x and y, then /n = (//F) p.

Array Operations in a Compiler

The systematic extension of the familiar vector and matrix operations to all operators, and the introduction of the generalized matrix product, greatly increase the utility and frequency of use of array operations in programs, and therefore encourages their inclusion in the source language of any compiler. Array operations can, of course, be added to the repertoire of any source language by providing library or ad hoc subroutines for their execution. However, the general array operations spawn a host of useful identities, and these identities cannot be mechanically employed by the compiler unless the array operations are denoted in such a way that they are easily recognizable.

The following example illustrates this point. Consider the vector operation
  xx + y
and the equivalent subroutine (expressed in ALGOL and using νx as a known integer):
  for i = 1 step 1 until νx do
x(i) := x(i) + y(i)

It would be difficult to make a compiler recognize all legitimate variants of this program (including, for example, an arbitrary order of scanning the components), and to make it distinguish the quite different and essentially sequential program:
  for i = 1 step 1 until νx – 1 do
x(i+1) := x(i) + y(i)

The foregoing programs could perhaps be analyzed by a compiler, but they are merely simple examples of much more complex scan procedures which would occur in, say, a matrix product subroutine. A somewhat more complex case is illustrated by the vector operation zkx, and the equivalent ALGOL program:

  for i = 1 step 1 until νx do
   if i + kνx then j := i + k;
else j := i + kνx;
   z(j) := x(i); end

Finally, there is a distinct advantage in incorporating array operations by providing a single general scan for each type (e.g., vector, matrix, and matrix product) and treating the operator (or operators) as a parameter. It then matters not whether each operator is effected by a one-line subroutine (i.e., a machine instruction) or a multiline subroutine, or whether it is incorporated in the array operation as an open or a closed subroutine. If several types of representations are permitted for variables (e.g., double precision, floating point, chained vectors), then a scan routine may have to be provided for each type of representation.
 

Identities

The identities fall naturally into five main classes: duality, partitioning (selection), permutation, associativity and distributivity, and transposition. A few examples of each class will be presented together with a brief discussion of their uses.

In discussing identities it will be convenient to employ the symbols Ο, Ο1, Ο2, ρ, σ, and τ to denote operators, and to define certain functions and relations on operations as follows. The (unary) logical functions αΟ and γΟ are equal to unity iff Ο is associative and Ο is commutative, respectively. The relation Ο1δΟ2 holds iff Ο1 distributes over Ο2, and Ο1αΟ2 holds iff Ο1 associates with Ο2, that is,

  (xΟ1y2zxΟ1(yΟ2z).

This latter is clearly a generalization of associativity, that is, Ο1αΟ1αΟ1. Finally, the unary operator δ applied to the operator Ο1 (denoted by δΟ1) produces the operator Ο2 which is dual to Ο1 in the sense defined in Table 4 (which summarizes these functions) and in Subsection (a) below.

All of the identities are based upon the fundamental properties of the elementary operators summarized in Tables 5-8. Table 5 shows the vector a of binary arithmetic operators and below it two logical matrices describing its properties of distributivity and associativity. These matrices show, for example, that a3 (that is, ×) distributes over + and –, that and distribute over themselves and each other, and that × associates with itself and ÷. The first four rows of the table show the self-associativity of a (equal to the diagonal of the outer product matrix a a), the commutativity, and the dual operators, wrt ÷ and –, respectively.

Table 6 shows three alternative ways of denoting the 16 binary logical functions: as the vector of operators l, as the matrix T of characteristic vectors (Ti is the characteristic vector of operator li), and as the vector T obtained as the base-two values (expressed in decimal) of the columns of T. The symbols employed in l include the familiar symbols ∨ and ∧ for or and and, and for their complements (i.e., the Pierce function and the Sheffer stroke), 0 and 1 for the zero and identity functions, the six numerical relations ≤, <, =, ≥, >, ≠, and the symbols , , , and for the four “unary” functions, that is, xy = x, xy = y, xy = , and xy = .

The remaining portion of Table 6 is arranged like Table 5. Since ((αl) ∧ γl)/l = (0, ∧, ≠, ∨, =, 1), it follows that the only nontrivial associative commutative logical operators are g = (∧, ∨, ≠, =). The properties of this particularly useful subset (abstracted from Table 6) are summarized in Table 7.

Certain functions of the matrices ll and ll are also of interest—for example, the matrix (ll) > (ll) shows that there are only six operator pairs which are associative and not distributive, namely, (≠, ≠), (≠, =), (=, ≠), (=, =), (, ≠), and (, =).

(a) Dualities

A unary operator τ is said to be self-inverse if ττxx. If ρ, σ, and τ are unary operators, if τ is self-inverse, and if ρxτστx, then σxτρτx, and ρ and σ are said to be dual[e] with respect to τ. The floor and ceiling operators and are obviously dual with respect to the minus operator. Duality clearly extends to arrays, e.g.,
  x ↔ – x
The duals of unary operators are shown in Table 8 as the vector δc.

If ρ and σ are binary operators, if τ is a self-inverse unary operator, and if

  xρyτ(τx)σ(τy)

then ρ and σ are said to be dual with respect to τ. The max and min operators ( and ) are dual with respect to minus, and or and and (∨ and ∧) are dual with respect to negation (~), as are the relations ≠ and =.

Dual operators are displayed in the vectors δa and δl of Tables 5 and 6. Each of the 16 logical operators has a dual:

  δli = l~Ti.

The duality of binary operators ρ and σ also extends to vectors and matrices. Moreover, when they are used in reduction, the following identities hold:

  ρ/x  ↔   τσ/τx,
  ρ/X  ↔   τσ/τX,
  ρ//X  ↔   τσ//τX.

For example,

  /x  =   / – x,
and
  ∧/x  =   ~ ∨/ ~ x (De Morgan’s Law).

The basic reduction identity (namely, ρ/xτσ/τx) leads immediately to the following family of identities for the matrix product:

  ABτ(τA) τ(τB).

For the logical operators, the family comprises 256 identities, of which 144 are nontrivial.

Duality relations can be specified for a compiler by a table incorporating l and δl, and can be employed to obviate the inclusion of a subroutine for one of the dual pair or to transform a source statement to an equivalent form more efficient in execution. For example, in a computer such as the IBM 7090 (which executes an or between registers (i.e., logical vectors) much faster than a corresponding and, and which quickly performs an or over a register (i.e., a test for non-zero)), the operation ~(~x)y is more efficiently executed as the equivalent operation x~y, obtained by duality.

(b) Partitioning

Partitioning identities, which permit a segment of a vector result to be expressed in terms of segments of the argument vectors, are of obvious utility in the efficient allocation of limited capacity high-speed storage.

If zxΟy, then u/z ← (u/x) Ο (u/y), where u is an arbitrary (but conformable) logical vector. This simple identity applies for any binary operator Ο and permits any vector operation to be partitioned or segmented at will. A similar identity holds for unary operators.

From the definition of the matrix product it is clear that for any binary operators ρ and σ,
  u/AB   ↔   Au/B,
and
  u//AB   ↔   (u//A)B.

If ρ is any associative commutative operator (i.e., αρ = γρ = 1), then
  ρ/x   ↔   (ρ/x)ρ(ρu/x),
where is used as an alternative notation for (~u). Consequently,
  A B   ↔   ((/A)(//B) ρ ((u/A)(u//B)).

Since the distributivity of σ and ρ is not involved, the foregoing identity (which is a simple generalization of the familiar identity for the product of partitioned matrices) applies to most of the common arithmetic and logical operators.

The identity for the two-way partitioning effected by u and can obviously be extended to a (μP)-way partitioning effected by a logical partition matrix P (defined by = +//P) as follows:
  A B   ↔   ρi1(μP) / (P i/A)(P i//B).
This is the form most useful in allocating storage; if fast-access storage for 2n components of A and B were available, P would normally be chosen such that P i = (n × i) ↓ n.

(c) Permutation

In this section, p, q, and r will denote permutation vectors of appropriate dimensions.

If Ο is any binary operator, then
  (xΟy)p   ↔   xpΟyp,
i.e., permutation distributes over any binary operator. For any unary operator Ο,
 x)p   ↔   Ο(xp),
and permutation therefore commutes with any unary operator. Consider, for example, a vector x whose components are arranged in increasing order on some function g(xi) [e.g., lexical order so as to permit binary search] but is represented by (i.e., stored as) the vector y in arbitrary order and the permutation vector p such that x = yp. Then the operation z ← Οx may be executed as w ← Οy, where zwp.

For any binary operators ρ and σ,
  (AB)pq   ↔   ApBq. (1)
Moreover, if αρ = γρ = 1, then ρ/xρ/xp, and consequently
  ArBr   ↔   AB. (2)
Finally, then
  (AB)pq   ↔   ArpBqr.

This single identity permits considerable freedom in transforming a matrix product operation to a form best suited to the access limitations imposed by the representation (i.e., storage allocation) used for A and B (e.g., row-by-row and column-by-column lists).

For the special case q = 1, A = , ρ = +, and σ = ×, equation (1) reduces to the well-known method of permuting the columns of a matrix by ordinary premultiplication by a permutation matrix p, that is,
  Bp   ↔   pB.
The fact that p and p are inverse permutations (i.e., (p) p = ) is obtainable directly from equation (2) and the fact that p = ()p = p.

The rotation operators ↑, ↓, , are special cases of permutations; consequently,
  j kA B   ↔   (j A) (kB).

Moreover, this identity still holds when the cyclic rotation operators are replaced by the corresponding noncyclic operators , , , and . In particular,

  j B   =   j ( B)   =   (j ) B,
and if
  B   =   h ,
then
  (h+j)   =   j h   =   (j ) (h ),

a well-known identity for the superdiagonal matrices h and k .

(d) Associativity and Distributivity of Double Operators

If αρ = γρ = σδρ = 1, then α() = 1; that is,

  Aσ(BC)   ↔   (AB)C.

Moreover, α()δρ = 1; that is

  A(BρC)   ↔   (AB)ρ(AC).

For example, if C is the connection matrix of a directed graph, then B = C C is the matrix of connections of length two; the operator () is associative and distributes over ∨. Similarly, if D is a distance matrix (Dji is the distance from point i to point j), then E = D D is the matrix of minimum distance for trips of two legs; () is associative and distributes over .

The associativity of matrix product operators can be very helpful in arranging an efficient sequence of calculations on matrices stored row-by-row or column-by-column. For the logical operators, the number of associative double operators is given by the expression
  +/+/(αI)/II
which (according to Table 6) has the value 66.

(e) Transpositions

Of the unary transposition operators, and are special cases of permutation, but and are not. Table 9 shows the multiplication table for the group generated by these four transpositions. The notation chosen for the four added operators is clear: denotes the identity, , (90° axial left rotation), and (axial right rotation). Since , it could as well have been denoted by .

The following illustrate the many transposition identities:

  AB   ↔   AB (3)
  AB   ↔   (A)B (4)
  AB   ↔   (A)(B)   if   αρ = γρ = 1 (5)
  (AB)   ↔   (B)(A)   if   γσ = 1 (6)
  (AB)   ↔   (B)(A)   if   αρ = γρ = γσ = 1 (7)

Identities (3)-(5) are special cases of the permutation identities and permit freedom in the order of scan, which may be important if a backward-chained representation is employed for the vectors involved. Identity (6) is the generalization of the well-known transposition identity of matrix algebra. Identity (7) is obtained directly from (6) by the application of (3), (4), and (5).
 

Conclusion

The use of a programming language in which elementary operations are extended systematically to arrays provides a wealth of useful identities. If the array operations are incorporated directly in a compiler for the language, these identities can be automatically applied in compilation, using a small number of small tables describing the fundamental properties of the elementary operators. Moreover, the identities can be extended to any ad hoc operators specified by the source program, provided only that the fundamental characteristics (associativity, etc.) of the ad hoc operators are supplied.

Exploitation of the identities within the compiler will, of course, increase the complexity of the compiler, and one would perhaps incorporate only a selected subset of them. However, the possibility of later extensions to exploit further identities is of some value. Finally, the identities are extremely useful to the programmer (as opposed to the analyst who specifies the overall procedure and who may use the identities in theoretical work), since the tricks used by the programmer, as in allocating storage (partitioning) or modifying the sequence of a scan (permutation), are almost invariably special cases of the more general identities outlined here.
 

Reference

1. Iverson, Kenneth E., A Programming Language. Wiley, 1962.
 

Notes

a.  The language described here differs from [1] in minor details designed to further systematize and simplify its structure.
b.  Except for branching statements, which are not relevant to the present discussion.
c.  The symbol ↔ will be used to denote equivalence.
d.  The may be elided.
e.  Abbreviated as “dual wrt”.

Tables

Table 1. Symbols for Basic Operations

UNARY
     Operation  Symbol
Absolute value |
Minus 
Floor (largest integer contained) 
Ceiling (smallest integer containing) 
Logical negation ~
Reciprocation (÷ x ↔ 1 ÷ x)  ÷
 
BINARY
     Operation  Symbol
Arithmetic operators + – × ÷
Arithmetic relations < ≤ = ≥ > ≠
Max, Min ⌈ ⌊
Exponentiation (yx)  x y
Residue mod m m | n
Logical AND, OR  ∧ ∨

Table 2. Unary Operations Defined on Arrays

νx  Dimension of vector x
νA  Row dimension of matrix A (dimension of row vectors)
μA  Column dimension of matrix A (dimension of column vectors)
μA  Column dimension of matrix A (dimension of column vectors)
   Transposition of matrix about axis indicated by the straight line
(A is the ordinary transposition of A)
   x denotes transposition of vector x
(reversal of order of components)
  Base-two value of vector

Table 3. Constant Vectors and Square Matrices of Dimension n

Symbol     Designated Constant
(n)  Full vector (all 1’s)   Logical Vectors
 j(n)  jth unit vector (1 in position j)
 j(n)  Prefix vector of weight j (j leading 1’s)
 j(n)  Suffix vector of weight j (j trailing 1’s)
 j(n)  Interval vector of weight j (j, j+1, …, j+n–1)
(n)  Zero matrix   Logical Matrices
(n)  Identity matrix (1’s on diagonal)
(n)  Strict upper right triangle (1’s above diagonal)
(n)  Upper right triangle (1’s above and on diagonal)
(n)  Strict lower right triangle
 
(n)  Upper left triangle

Table 4. Operations and Relations Defined on Operators

Self-associativity   αΟ   = 1 iff xΟ(yΟz) ↔ (xΟyz
Commutativity   γΟ   = 1 iff xΟyyΟx
Distributivity   Ο1δΟ2   = 1 iff xΟ1(yΟ2z) ↔ (xΟ1y2(xΟ1z)
Associativity   Ο1αΟ2   = 1 iff xΟ1(yΟ2z) ↔ (xΟ1y2z
Dual wrt τ  δΟ is an operator such that
(δΟ)xτΟτx   if Ο is unary
x(δΟ)yτ((τx)Ο(τy))   if Ο is binary.

Table 5. Properties of the Binary Arithmetic Operators

 
101 011 00
10101 100
  ×÷   
+     
+×÷ |
 αa
 γa
 δa (wrt ÷)
 δa (wrt –)
 a
  +
×
÷
|
 
  +
×
÷
|
00 00 11 00
00 00 rr 00
11 00 00 10
rr 00 00 r0
00 00 11 00
00 00 11 00
00 00 00 00
00 ll ll 00
11 00 00 00
00 00 00 00
00 11 00 00
00 00 00 00
00 00 10 00
00 00 01 00
00 00 00 00
00 00 00 00
a a
 
a a

  † l and r denote left and right distributivity.
 

Table 6. Properties of the Binary Logical Operators

  1101 0111 0100 0001   αl
  1100 0011 1100 0011   γl
  1 = > <0   δl *
  0000 0000 1111 1111 T
  0000 1111 0000 1111
  0011 0011 0011 0011
  0101 0101 0101 0101
 
  0> < = 1   l
  0123 4567 891011 12131415   T
  0 0 1111 1111 0000 0000   l l
 1 1111 1111 0000 0000
> 2   0001 0100 0000 0000
 3 0101 0101 0000 0000
< 4 1111 1111 0000 0000
 5 1111 1111 1111 1111
 6 0001 0100 0010 1000
 7 0101 0101 0101 0101
 8 0001 0100 0000 0000
= 9   0001 0100 0010 1000
 10   0001 0100 0010 1000
 11   0001 0100 0000 0000
 12   0101 0101 0000 0000
 13   0101 0101 0101 0101
 14   0001 0100 0000 0000
1 15   0101 0101 0101 0101
 
  0> < = 1   l
  0123 4567 891011 12131415   T
  0 0 1111 0000 0000 0000   l l
 1 1111 0000 0000 0000
> 2   0001 0000 0000 0000
 3 0001 0000 0000 0000
< 4 1111 0000 0000 0000
 5 1111 1111 1111 1111
 6 0001 0010 0100 1000
 7 0001 0001 0001 0001
 8 0001 0000 0000 0000
= 9   0001 0010 0100 1000
 10   0001 0010 0100 1000
 11   0001 0000 0000 0000
 12   0001 0000 0000 0000
 13   0001 0001 0001 0001
 14   0001 0000 0000 0000
1 15   0001 0001 0001 0001

    * Duality with respect to ~.
 

Table 7. Properties of the Nontrivial Associative Commutative Logical Operators

 
=
  g
=
11 10
11 01
00 00
00 00
  g g
=
10 00
01 00
00 11
00 11
  g g

Table 8. Properties of the Unary Operators

| ÷ ~    c
  ÷      δc (wrt –)
      ÷      δc (wrt ÷)
          ~    δc (wrt ~)

Table 9. Group of Transpositions (rotations of the square)

 
 
t
t t

Discussion

Gorn: Some almost ancient sources of generalized operators are: Whitehead, Universal Algebra, and Grassman, Die Ausdehnungslehre. Some more modern sources are: Bourbaki Algebra, Forder, Calculus of Extension, and Bodewig, Matrix Calculus.

Backus: Why this comment?

Gorn: The paper presents generalized relationships among operators. The cited references are directly concerned with such questions.

Brooker: Why do you insist on using a notation which is a nightmare for typist and compositor and impossible to implement with punching and printing equipment currently available? What proposals have you got for overcoming this difficulty?

Iverson: Transliteration is, of course, essential, but I have avoided its treatment first, because a suitable scheme is highly dependent on the particular equipment available, and second, because it is extremely simple. If, for example, you have the stamina of ALGOL and MAD users (who tirelessly write PROCEDURE and WHENEVER), then you can use the distinct names that I have given (for conversational purposes) to each of the operators. Anyone who prefers briefer symbols can (as I have) easily design schemes which are brief, simple and mnemonic.

Gorn: This question of transliteration: I’m not talking about this paper in particular. In general it is a problem that is always with us. There is a danger that as the transliteration rules become more complicated replacement productions, we rapidly fall into a recognition problem, a translation problem and possibly an unsolvable word problem.

Iverson: Yes, one should distinguish the recognition of identifiers from the syntax, which is of more concern to the ultimate user.

Brooker: It is not obvious to me that these two symbols for FLOOR and CEILING have a great deal of mnemonic value.

Iverson: Yes, but once you have read it, you can remember it.

Gorn: But the more redundance you put in the symbolism of a language, the more equivalence problems you have.

Iverson: Not problems, I suggest that these are assets. In the extreme we could go back to the Assign and the Sheffer stroke, let’s say, and then we have no problems.

Ross: I don’t remember who asked the original question about notation, but I submit that they find themselves a sugardaddy or someone with a few thousand bucks and get themselves a display console such as we’re getting with programmable characters. You can even publish from it by taking pictures. I don’t see why we should let mechanics influence our progress at all.

Iverson: Someone who is interested in standardization would not like that comment—a 48-character set is the thing you know. The limitation on the available character set, I think, is more of a transient phenomenon than the algorithms we want to describe.

Ross: With our console the 48 characters are available, and there is another mode where you can program any bit patterns you want in a matrix; we are doing this specifically for this purpose because we feel that the notation that goes along with the set of ideas should be usable.

Bauer: I would say that compared with some other existing proposals for matrix extensions such as that of Ershov this is a much more closed consistent system. No one can say today how far we will go in using such a language in the near future.

Iverson: Let me comment that it is useful to distinguish two reasons for learning a language; one is for description and analysis and the other is for automatic execution. I submit that this kind of formalism is extremely helpful in analyzing difficult problems without worrying about whether one wants to execute the resulting program. As a matter of fact, I would use this as a preliminary before going into some language that is executable.

Gorn: As I have it, the descriptive language you have does have direct translation properties into command language.

Holt: I would like to translate that comment of Ken’s about description, analysis, and execution in the following way: Programming languages are machine-dependent—one is appropriate for the human processor and another for the computer.

Iverson: Well, I would disagree with that because I would use exactly the same notation for describing the computer. In fact, I’ve done it for the 7090 or most of the 7090, and other machines as well. In fact, you can say the instruction set of the machine is another form of language with a slave to execute it.

Holt: Then, what was the meaning of your comment?

Iverson: At this point, for example, this is not a source language in the sense that there is a mechanism available for translating it into some other language. There is no convenient way for automatic execution by translation or direct execution. Now I suggest that the notation is worthwhile just for analysis even though later we have to do a hand translation into some executable language.

Green: If I may interrupt for a moment, I think we should limit the discussion on notation to the next five minutes. And we should get to other questions.

Tompkins: There exist problems around here which were coded first in essentially this language and then were translated with great care into FORTRAN, for example.

Perlis: How should this language be used on computers? For what class of problems—on or off computers? Thus, it’s not quite clear to me that a mathematical proof of an algorithm written in FORTRAN (the same algorithm if you will) is any more difficult than a mathematical proof of one of your algorithms. Algorithms are written for two reasons: 1) execution by computer: which means that it is pointless to write it if you cannot execute it on a computer, and 2) for description and analysis. Now if the description is difficult to read, then it fails somewhat. If, in addition, analysis is as difficult, say, as in ALGOL, then the virtue of the language is questionable.

A last question: You haven’t discussed at all the way you describe data. It is not clear that you have a notation for describing data, though you have a great wealth of notation for manipulating data once it is described. Now ALGOL will obviously be extended to include matrix and vector operations in expressions. So my question is: for what classes of problems, remembering you have no data description, is your description better than “Algolic” descriptions?

Iverson: I’m not sure if I can really separate all these points. The question of representation (data description) is too lengthy to treat here. To save time, let me say that I discuss it in Chapter 3 of my book. This discussion is fairly limited, but adequate.

Concerning the virtues of the language for description and analysis, I can only say that I have found it very useful in many diverse areas, including machine description, search procedures, symbolic logic, sorting and linear programming. Now it is a separate problem as to whether you want to incorporate the complete generality of the language in any particular compiler—but I suggest that it is desirable to have a more general system that you retract from for any particular compiler rather than adding ad hoc provisions to more limited languages.

As to the question of proofs, you can, of course, translate a proof in any language to any other language, but I suggest that the proofs I give are the kind that are immediately obvious to any mathematician. There is, of course, the question of to whom you want your proofs to be obvious. Likewise, for difficulty of reading, the question is, “for whom”? And I suggest that anybody who has ever dealt with matrix operations finds this notation very easy to read.

Perlis: But is it fair to say then that if one is going to create or extend a language that the direction of extension really isn’t critical—that the accent should not be put on operations so much, but on data representation or sequence rules?

Iverson: No, I disagree.

Gorn: Since you are supporting an infix notation for binary operators, would it not be useful to have some control operators in the language which would correspond to the combinatory logician’s “Application” operation? Also operators for insertion and deletion of parentheses, and operators to adjust priorities in the scopes of other operators, e.g. to construct precedence matrices of the type discussed by Floyd?

Iverson: Let me give a sort of general answer to this sort of thing. You’re probably talking about some specialized application for which you want special operators. I submit that no one can design a language that is equally useful for everybody. Instead, what you would like to have is a single core which you can extend in a straightforward manner.

In so far as precedence and hierarchy are concerned, I have not found any great need for them in my work, but I can understand why you might want to use them in compilers. In fact, I think such hierarchy should be included in a tabular form so that it is easily changeable.

Holt: The presentation is a marvelous demonstration of the power of notation in the hands of a very clever man. Conclusions: (1) Let us teach this skill to clever people. (2) Let us create machine mechanisms to respond to notational inventions.

Iverson: On the contrary, the basic notions are very simple and should be introduced at high school level to provide a means for describing algorithms explicitly. For example, the vector can be introduced as a convenient means for naming a family of variables and can be used by the student (together with a few very simple operators) to work out explicit algorithms for well-known operations such as decimal addition, polynomial evaluation, etc. A little notation and much care in requiring explicit algorithms would, in fact, clarify and simplify the presentation of elementary mathematics and obviate the teaching of programming as such.

Gosden: Many of the equivalences only become useful and powerful when time dependency is included. For example, each operation on any array implies serial or parallel execution component by component. How can you cover this for serial or parallel statements? Obviously, there are many tricks that are time (or series) dependent in array operations. How do they relate to dualities and equivalences, etc.

Iverson: Parallel operation is implied by any vector operation; serial operation can be made explicit by a program showing the specified sequence of operations on components. Distinctions of this type (employing the present notation) are made clear in Falkoff’s “Algorithms for Parallel Search Memories” [J. ACM, Oct. 1962].

More complex simultaneity can be expressed by a collection of programs operating concurrently, all mutually independent but for interaction through certain (interlock) variables common to some two or more programs. Explicit dependence on real time can be introduced by incorporating, as one of this collection of programs, a program describing a clock (i.e., oscillator) driven counter.

Gorn: Does your generalized operator notation for matrices lead to a simpler proof of the generalized Laplace expansion of determinants?

Iverson: For a given logical vector u, the Laplace expansion of the determinant δA can be expressed as
  δA = (+i/((δu/S i//A) × (δ/ i//A) × ρS i)) × ρu,
where S is a logical matrix whose rows represent all partitions of weight +/u, where ρv = ρ(v/1, /1) is the “parity” of the logical vector v, and ρp is the parity of the permutation vector p, defined as +1 or –1 according as the parity of p is even or odd. Since
  δA = +i/((×/  P i/A) × ρP i)
(where P is the matrix whose (νA)! rows exhaust all permutations of dimension νA, and where compression by a logical matrix U is defined in the obvious way as the catenation of the vectors U i/A i), then the usual proof of the Laplace expansion (i.e. showing that a typical term of either expansion occurs in the other) can be carried through directly with the aid of the following fact: if u is any logical vector and p is a permutation of like dimension, then there exists a unique triple v, q, r, such that
   q = u/v// p, and  r = /// p.
[The vectors v and u are clearly related by the expressions v =  p u, and u = v  q, and moreover, ρp = (ρu) × (ρv) × (ρq) × (ρr).]

The special matrices occurring in the foregoing can all be specified formally in terms of the matrix T(bn) defined as follows: Tji ε 0(b), μT = bn, νT = n, and bT = 0, where bx denotes the base-b value of the vector x. Thus S = (+/u = +/M)//M, where M = T(2, νA) and P = (∧/σ/M)//M, where M = T(νA, νA), and σ/x is the set selection operation [1, p. 23].

Moreover, the parity function ρp may be defined formally as ρp = u, where u = 2 | +/ /(p p).

Dijkstra: How would you represent a more complex operation, for example, the sum of all elements of a matrix M which are equal to the sum of the corresponding row and column indices?

Iverson: ++/(M = 1 1)//M
 

List of Conferees, Working Conference on Language Structures, August 14-16, 1963

P. Abrahams International Electric Corp.
R.W. Allard Control Data Corp.
John W. Backus IBM
F. Bauer Math. Inst. der TH München
R. Bosak Inst. for Defense Analyses
R.A. Brooker IBM
L.L. Bumgarner Oak Ridge Nat. Lab.
W.H. Barge Univac Div., Sperry Rand
T.E. Cheatham, Jr. Computer Associates, Inc.
H.B. Curry Pennsylvania State U.
E.W. Dijkstra Technological U., Eindhoven
Arthur Evans Carnegie Inst. of Technology
R.J. Evey IBM
J. Fennell Logistics Research Project
Robert W. Floyd Computer Associates, Inc.
Donald B. Gillies U. of Illinois
Ruth Goodman Westinghouse Corp.
S. Gorn U. of Pennsylvania
John Gosden Auerbach Electronics Corp.
Robert M. Graham U. of Michigan
Julien Green IBM
Sheila Greibach Harvard U.
John W. Guy Nat. Security Agency
Leonard H. Haines MIT
A.W. Holt U. of Pennsylvania
P.Z. Ingerman Westinghouse Electric Corp.
R. Itturiaga Carnegie Inst. of Technology
E.T. Irons Inst. for Defense Analyses
K.E. Iverson IBM
Walter W. Jacobs Inst. for Defense Analyses
Charles Katz General Electric
R.A. Kirseh Nat. Bur. Standards
Rainer Kogon IBM
B.M. Leavenworth IBM
M. Henri Leroy Cie Bull
L. Lombardi MIT
William H. Marlow Logistics Research Project
E.J. McCluskey Princeton U.
M.A. Melkanoff U. of California
J.N. Merner Burroughs Corp.
G.J. Mitchell Inst. for Defense Analyses
A. Newell Carnegie Inst. of Technology
M. Paul Math. Inst. der TH Munchen
A.J. Perlis Carnegie Inst. of Technology
George Radin IBM
Gene F. Rose System Development Corp.
D.T. Ross MIT
Bernard D. Rudin Lockheed Aircraft Corp.
R.A. Sibley, Jr. IBM
K.H. Speierman Burroughs Corp.
T.B. Steel System Development Corp.
C.B. Tompkins U. of California
Hale F. Trotter Princeton U.
R.E. Utman Business Equipment Mfrs. Assoc.
S. Warshall Computer Associates, Inc.
J.H. Wegstein Nat. Bur. Standards
J. Weizenbaum General Electric
M.V. Wilkes Univ. Math. Laboratory, Cambridge
Kenneth A. Wolf Control Data Corp.


Originally appeared in the Communications of the ACM, Volume 7, Number 2, 1964-02.

created 2009-09-08 10:00
updated:2013-07-23 21:50