Formalism in Programming Languages

Formalism in Programming Languages

Kenneth E. Iverson
International Business Machines Corporation
Yorktown Heights, New York

Received July, 1963. Presented at a Working Conference on Mechanical Language Structures, Princeton, N.J., August 1963, sponsored by the Association for Computing Machinery, the Institute for Defense Analyses, and the Business Equipment Manufacturers Association. This work was done at Harvard University while the author was a visiting lecturer, February through June, 1963.

Introduction

Although the question of equivalences between algorithms expressed in the same or different languages has received some attention in the literature, the more practical question of formal identities among statements in a single language has received virtually none. The importance of such identities in theoretical work is fairly obvious. The present paper will be addressed primarily to the practical implications for a compiler.

The formal identities can be incorporated directly into a compiler, or can alternatively be used by a programmer to derive a more efficient equivalent of a program specified by an analyst. The identities cited include (1) dualities which permit the inclusion of only one of a dual pair as a basic operator, (2) partitioning identities which permit the automatic allocation of limited fast-access storage in operations on arrays, (3) permutation identities which permit the adoption of a processing sequence suited to the particular representation used (e.g., row list or column list of a matrix), (4) general associativity and distributivity identities for double operators (determined as a function of the properties of the basic operators) which permit efficient reordering of operations, (5) transposition identities, and (6) the automatic extension of the appropriate identities to any ad hoc operations (i.e., subroutines or procedures) defined by any user of the compiler.

The discussion will be based upon a programming language which has been presented in full elsewhere [1]. However, the relevant aspects of the language will first be summarized for reference.

The problems of transliteration and syntax which commonly dominate discussions of language will here be subordinated as follows. The symbols employed will permit the immediate determination of the class to which each belongs; thus literals are denoted by roman type, variables are denoted by italics (lowercase, lowercase bold, and uppercase bold for scalar, vector and matrix, respectively), and operators are denoted by distinct (usually nonalphabetic) symbols. The problems of transliteration (i.e., mapping the set of symbols employed onto the smaller set provided in a computer) and of mapping positional information (such as subscripts and superscripts) onto a linear representation therefore can, and will, be subordinated to questions of the structure of an adequate language.

The Language^[a]

The left arrow “←” denotes “specification”, and each^[b] statement in the language is of the form
x ← ⍺
where x is a variable and ⍺ is some function.

The application of any unary operator Ο to a scalar argument x is denoted by Οx, and the application of a binary operator Ο to the arguments x, y is denoted by xΟy. The set of basic operators and symbols in shown in Table 1. The use of the same symbol for a binary and a unary operator (e.g., x⌊y for min(x, y) and ⌊x for largest integer not exceeding x) produces no ambiguity and does conserve symbols.

As shown in Table 1, any relation is treated as an operator (denoted by the usual symbol for the relation) having the range zero and one (logical variables). Thus, for integers i and j, the operator “=” is equivalent to the Kronecker delta.

The ith component of a vector x is denoted by x_i, the ith row vector of a matrix M by Mⁱ, the jth column vector by M_j, and the (i, j)th element by M_jⁱ. A vector may be represented by a list of its components separated by commas. Thus, the statement

x ← 1,2,3,4

specifies x as a vector of dimension 4 comprising the first four positive integers. In particular, catenation of two vectors x and y is denoted by x,y.

Operators are extended component-by-component to arrays. Thus if Ο is any operator (unary or binary as appropriate),^[c]

r ← Οx	↔	r_i ← Οx_i
r ← xΟy	↔	r_i ← x_iΟy_i
R ← ΟN	↔	R_jⁱ ← ΟM_jⁱ
R ← MΟN	↔	R_jⁱ ← M_jⁱΟN_jⁱ.

5. The order of execution of operations is determined by parentheses in the usual way and, except for intervening parentheses, operations are executed in order from right to left, with no priorities accorded to multiplication or other operators.

6. Certain unary operators are defined upon vectors and matrices rather than upon scalars. These appear in Table 2 and include the dimension operators ν and μ as well as the transposition operators , , , , in which the symbols indicate the axis of transposition of a matrix.

7. It is convenient to provide symbols for certain constant vectors and matrices as shown in Table 3. The parenthetic expression indicating the dimension of each may be elided when it is otherwise determined by conformability with some known vector.

If α(i) denotes one of a family of variables (e.g, scalars xⁱ or x_i, vectors xⁱ or Xⁱ or X_i, or matrices ⁱX) for i belonging to some index set i, and if Ο is a binary operator, then for any set s ⊆ i,

Ο^s_i/α(i) ↔ α(s₁) Ο α(s₂) Ο … Ο α(s_νs).

α(i) = x_i and s = ⍳¹(νx),

or if

α(i) = X_i and s = ⍳¹(νX),

then s and i may he elided. Thus,

+/x	=	x₁ + x₂ + … + x_νx,
∧/x	=	x₁ ∧ x₂ ∧ … ∧ x_νx,
+/X	=	X₁ + X₂ + … + X_νX, etc.

If α(i) = Xⁱ and s = ⍳¹(μX), then the s and i may be elided provided that a second slash be added to distinguish this case from the preceding one. Thus,
Ο//X = X¹ Ο X² Ο … Ο X^μX.

If α is any argument and Ο is any binary operator, then Οⁿ/α denotes the nth power of α with respect to Ο. Formally,

Οⁿ/α ↔ α Ο α Ο … Ο α (to n terms).

Hence Ο¹/α = α, Ο^–1/α is the inverse of α with respect to Ο, and Ο⁰/α is the identity element of the operator Ο (if they exist).

10.

If Ο₁ and Ο₂ are binary operators, then the matrix product A

B is a matrix of dimension μA × νB defined by:

B)_jⁱ = Ο₁ / Aⁱ Ο₂ B_j.

In particular, A B denotes the ordinary matrix product. Moreover, the pair () behaves as a binary operator on A and B and hence may be treated as a binary operator. For example, applying the notation of part 9, ()^–1/A denotes the ordinary inverse of A.

If the post-multiplier is a vector x (i.e., a matrix of one column), the usual conventions of matrix algebra are applied:
(A x)_i = Aⁱ x = +/ Aⁱ × x.
Similarly,
(x B)_j = x B_j , and x y = +/ x × y.

11. The outer product of two vectors x and y is denoted by x y and defined as the matrix M of dimension νx × νy such that M_jⁱ = x_iΟy_j.

12.

Deletion from a vector x of those components corresponding to the zeros of a logical vector u of like dimension is called compression and is denoted by u/x. Compression is extended to matrices both row-by-row and column-by-column as follows: .

	Y ← u/X	↔	Yⁱ = u/Xⁱ
	Y ← u//X	↔	Y_j = u/X_j.

11.

If p is any vector containing only indices of x, then x_p is defined as follows:

y ← x_p ↔ y_i = x_{p_i}, i ε ⍳¹(νp) .

If p is a permutation vector (containing each of its own indices once) and if νp = νx, then x_p is a permutation of x.

Permutation is extended to matrices by row and by column as follows:

	Y ← X_p	↔	Yⁱ = (Xⁱ)_p
	Y ← X^p	↔	Y_j = (X_j)_p.

12.

Left rotation is a special case of permutation denoted by k ↑ x and defined by

y ← k ↑ x ↔ y_i = x_(νx)|k+i.

Right rotation is denoted by k ↓ x and is defined analogously.

A noncyclic left rotation (left shift) denoted by

is defined as follows:

k x ↔ (~⍵^k) × k ↑ x.

(The zero attached to the shaft of the arrow suggests that zeros are drawn into the “evacuated” positions). Similarly,

k x ↔ (~⍺^k) × k ↓ x.

Rotations are extended to matrices in the usual way, a doubled symbol (e.g.,

) denoting rotation of columns. For example,

X)ⁱ = k_i

Xⁱ,

and (k∊)

is a matrix with ones on the kth superdiagonal.^[d]

13. Any new operator defined (e.g., by some algorithm, usually referred to as a subroutine) is to be denoted in accordance with Definition (2) and is extended to arrays exactly as any of the basic operators defined in the language. For example, if x gcd y (or, better, xy) is used to denote the greatest common divisor of integers x and y, then xy, /x, and Xy are automatically defined. Moreover, if n is a vector of integers and Fⁱ represents the prime factorization of n_i with respect to the vector of primes p (that is, n = F p), then clearly /n = (⌊//F) p. Similarly, if xy denotes the l.c.m. of x and y, then /n = (⌈//F) p.

Array Operations in a Compiler

The systematic extension of the familiar vector and matrix operations to all operators, and the introduction of the generalized matrix product, greatly increase the utility and frequency of use of array operations in programs, and therefore encourages their inclusion in the source language of any compiler. Array operations can, of course, be added to the repertoire of any source language by providing library or ad hoc subroutines for their execution. However, the general array operations spawn a host of useful identities, and these identities cannot be mechanically employed by the compiler unless the array operations are denoted in such a way that they are easily recognizable.

The following example illustrates this point. Consider the vector operation
x ← x + y
and the equivalent subroutine (expressed in ALGOL and using νx as a known integer):
for i = 1 step 1 until νx do
x(i) := x(i) + y(i)

It would be difficult to make a compiler recognize all legitimate variants of this program (including, for example, an arbitrary order of scanning the components), and to make it distinguish the quite different and essentially sequential program:
for i = 1 step 1 until νx – 1 do
x(i+1) := x(i) + y(i)

The foregoing programs could perhaps be analyzed by a compiler, but they are merely simple examples of much more complex scan procedures which would occur in, say, a matrix product subroutine. A somewhat more complex case is illustrated by the vector operation z ← k ↑ x, and the equivalent ALGOL program:

for i = 1 step 1 until νx do
if i + k ≤ νx then j := i + k;
else j := i + k – νx;
z(j) := x(i); end

Finally, there is a distinct advantage in incorporating array operations by providing a single general scan for each type (e.g., vector, matrix, and matrix product) and treating the operator (or operators) as a parameter. It then matters not whether each operator is effected by a one-line subroutine (i.e., a machine instruction) or a multiline subroutine, or whether it is incorporated in the array operation as an open or a closed subroutine. If several types of representations are permitted for variables (e.g., double precision, floating point, chained vectors), then a scan routine may have to be provided for each type of representation.

Identities

The identities fall naturally into five main classes: duality, partitioning (selection), permutation, associativity and distributivity, and transposition. A few examples of each class will be presented together with a brief discussion of their uses.

In discussing identities it will be convenient to employ the symbols Ο, Ο₁, Ο₂, ρ, σ, and τ to denote operators, and to define certain functions and relations on operations as follows. The (unary) logical functions αΟ and γΟ are equal to unity iff Ο is associative and Ο is commutative, respectively. The relation Ο₁δΟ₂ holds iff Ο₁ distributes over Ο₂, and Ο₁αΟ₂ holds iff Ο₁ associates with Ο₂, that is,

(xΟ₁y)Ο₂z ↔ xΟ₁(yΟ₂z).

This latter is clearly a generalization of associativity, that is, Ο₁αΟ₁ ↔ αΟ₁. Finally, the unary operator δ applied to the operator Ο₁ (denoted by δΟ₁) produces the operator Ο₂ which is dual to Ο₁ in the sense defined in Table 4 (which summarizes these functions) and in Subsection (a) below.

All of the identities are based upon the fundamental properties of the elementary operators summarized in Tables 5-8. Table 5 shows the vector a of binary arithmetic operators and below it two logical matrices describing its properties of distributivity and associativity. These matrices show, for example, that a₃ (that is, ×) distributes over + and –, that ⌈ and ⌊ distribute over themselves and each other, and that × associates with itself and ÷. The first four rows of the table show the self-associativity of a (equal to the diagonal of the outer product matrix a a), the commutativity, and the dual operators, wrt ÷ and –, respectively.

Table 6 shows three alternative ways of denoting the 16 binary logical functions: as the vector of operators l, as the matrix T of characteristic vectors (T_i is the characteristic vector of operator l_i), and as the vector T obtained as the base-two values (expressed in decimal) of the columns of T. The symbols employed in l include the familiar symbols ∨ and ∧ for or and and, ∆ and ∇ for their complements (i.e., the Pierce function and the Sheffer stroke), 0 and 1 for the zero and identity functions, the six numerical relations ≤, <, =, ≥, >, ≠, and the symbols ⍺, ⍵, , and for the four “unary” functions, that is, x⍺y = x, x⍵y = y, xy = , and xy = .

The remaining portion of Table 6 is arranged like Table 5. Since ((αl) ∧ γl)/l = (0, ∧, ≠, ∨, =, 1), it follows that the only nontrivial associative commutative logical operators are g = (∧, ∨, ≠, =). The properties of this particularly useful subset (abstracted from Table 6) are summarized in Table 7.

Certain functions of the matrices ll and ll are also of interest—for example, the matrix (ll) > (ll) shows that there are only six operator pairs which are associative and not distributive, namely, (≠, ≠), (≠, =), (=, ≠), (=, =), (, ≠), and (, =).

(a) Dualities

A unary operator τ is said to be self-inverse if ττx ↔ x. If ρ, σ, and τ are unary operators, if τ is self-inverse, and if ρx ↔ τστx, then σx ↔ τρτx, and ρ and σ are said to be dual^[e] with respect to τ. The floor and ceiling operators ⌊ and ⌈ are obviously dual with respect to the minus operator. Duality clearly extends to arrays, e.g.,
⌈ x ↔ – ⌊ – x
The duals of unary operators are shown in Table 8 as the vector δc.

If ρ and σ are binary operators, if τ is a self-inverse unary operator, and if

xρy ↔ τ(τx)σ(τy)

then ρ and σ are said to be dual with respect to τ. The max and min operators (⌈ and ⌊) are dual with respect to minus, and or and and (∨ and ∧) are dual with respect to negation (~), as are the relations ≠ and =.

Dual operators are displayed in the vectors δa and δl of Tables 5 and 6. Each of the 16 logical operators has a dual:

δl_i = l_{⊥~T_i}.

The duality of binary operators ρ and σ also extends to vectors and matrices. Moreover, when they are used in reduction, the following identities hold:

ρ/x	↔	τσ/τx,
ρ/X	↔	τσ/τX,
ρ//X	↔	τσ//τX.

For example,

	`⌊`/x	=	– `⌈`/ – x,
and
	∧/x	=	~ ∨/ ~ x (De Morgan’s Law).

The basic reduction identity (namely, ρ/x ↔ τσ/τx) leads immediately to the following family of identities for the matrix product:

B ↔ τ(τA)

τ(τB).

For the logical operators, the family comprises 256 identities, of which 144 are nontrivial.

Duality relations can be specified for a compiler by a table incorporating l and δl, and can be employed to obviate the inclusion of a subroutine for one of the dual pair or to transform a source statement to an equivalent form more efficient in execution. For example, in a computer such as the IBM 7090 (which executes an or between registers (i.e., logical vectors) much faster than a corresponding and, and which quickly performs an or over a register (i.e., a test for non-zero)), the operation ~(~x)y is more efficiently executed as the equivalent operation x~y, obtained by duality.

(b) Partitioning

Partitioning identities, which permit a segment of a vector result to be expressed in terms of segments of the argument vectors, are of obvious utility in the efficient allocation of limited capacity high-speed storage.

If z ← xΟy, then u/z ← (u/x) Ο (u/y), where u is an arbitrary (but conformable) logical vector. This simple identity applies for any binary operator Ο and permits any vector operation to be partitioned or segmented at will. A similar identity holds for unary operators.

From the definition of the matrix product it is clear that for any binary operators ρ and σ,

u/AB ↔ Au/B,

and
u//AB ↔ (u//A)B.

If ρ is any associative commutative operator (i.e., αρ = γρ = 1), then
ρ/x ↔ (ρ/x)ρ(ρu/x),
where is used as an alternative notation for (~u). Consequently,
A B ↔ ((/A)(//B) ρ ((u/A)(u//B)).

Since the distributivity of σ and ρ is not involved, the foregoing identity (which is a simple generalization of the familiar identity for the product of partitioned matrices) applies to most of the common arithmetic and logical operators.

The identity for the two-way partitioning effected by u and can obviously be extended to a (μP)-way partitioning effected by a logical partition matrix P (defined by ∊ = +//P) as follows:
A B ↔ ρ_i^⍳¹(μP) / (Pⁱ/A)(Pⁱ//B).
This is the form most useful in allocating storage; if fast-access storage for 2n components of A and B were available, P would normally be chosen such that Pⁱ = (n × i) ↓ ⍵ⁿ.

(c) Permutation

In this section, p, q, and r will denote permutation vectors of appropriate dimensions.

If Ο is any binary operator, then
(xΟy)_p ↔ x_pΟy_p,
i.e., permutation distributes over any binary operator. For any unary operator Ο,
(Οx)_p ↔ Ο(x_p),
and permutation therefore commutes with any unary operator. Consider, for example, a vector x whose components are arranged in increasing order on some function g(x_i) [e.g., lexical order so as to permit binary search] but is represented by (i.e., stored as) the vector y in arbitrary order and the permutation vector p such that x = y_p. Then the operation z ← Οx may be executed as w ← Οy, where z ← w_p.

For any binary operators ρ and σ,
(AB)_p^q ↔ A^pB_q. (1)
Moreover, if αρ = γρ = 1, then ρ/x ↔ ρ/x_p, and consequently
A_rB^r ↔ AB. (2)
Finally, then
(AB)_p^q ↔ A_r^pB_q^r.

This single identity permits considerable freedom in transforming a matrix product operation to a form best suited to the access limitations imposed by the representation (i.e., storage allocation) used for A and B (e.g., row-by-row and column-by-column lists).

For the special case q = ⍳¹, A = , ρ = +, and σ = ×, equation (1) reduces to the well-known method of permuting the columns of a matrix by ordinary premultiplication by a permutation matrix ^p, that is,
B^p ↔ ^pB.
The fact that ^p and ^p are inverse permutations (i.e., (^p) ^p = ) is obtainable directly from equation (2) and the fact that ^p = ()_p = _p.

The rotation operators ↑, ↓, , are special cases of permutations; consequently,
j k ↑ A B ↔ (j A) (k ↑ B).

Moreover, this identity still holds when the cyclic rotation operators are replaced by the corresponding noncyclic operators , , , and . In particular,

	j B = j ( B) = (j ) B,
and if
	B = h ,
then
	(h+j) = j h = (j ) (h ),

a well-known identity for the superdiagonal matrices h and k .

(d) Associativity and Distributivity of Double Operators

If αρ = γρ = σδρ = 1, then α() = 1; that is,

Aσ(B

C) ↔ (A

Moreover, α()δρ = 1; that is

(BρC) ↔ (A

B)ρ(A

C).

For example, if C is the connection matrix of a directed graph, then B = C C is the matrix of connections of length two; the operator () is associative and distributes over ∨. Similarly, if D is a distance matrix (D_jⁱ is the distance from point i to point j), then E = D D is the matrix of minimum distance for trips of two legs; () is associative and distributes over ⌊.

The associativity of matrix product operators can be very helpful in arranging an efficient sequence of calculations on matrices stored row-by-row or column-by-column. For the logical operators, the number of associative double operators is given by the expression
+/+/(αI)/II
which (according to Table 6) has the value 66.

(e) Transpositions

Of the unary transposition operators, and are special cases of permutation, but and are not. Table 9 shows the multiplication table for the group generated by these four transpositions. The notation chosen for the four added operators is clear: denotes the identity, ↔ ↔ , ↔ (90° axial left rotation), and ↔ (axial right rotation). Since ↔ , it could as well have been denoted by .

The following illustrate the many transposition identities:

AB	↔	AB	(3)
AB	↔	(A)B	(4)
AB	↔	(A)(B) if αρ = γρ = 1	(5)
(AB)	↔	(B)(A) if γσ = 1	(6)
(AB)	↔	(B)(A) if αρ = γρ = γσ = 1	(7)

Identities (3)-(5) are special cases of the permutation identities and permit freedom in the order of scan, which may be important if a backward-chained representation is employed for the vectors involved. Identity (6) is the generalization of the well-known transposition identity of matrix algebra. Identity (7) is obtained directly from (6) by the application of (3), (4), and (5).

Conclusion

The use of a programming language in which elementary operations are extended systematically to arrays provides a wealth of useful identities. If the array operations are incorporated directly in a compiler for the language, these identities can be automatically applied in compilation, using a small number of small tables describing the fundamental properties of the elementary operators. Moreover, the identities can be extended to any ad hoc operators specified by the source program, provided only that the fundamental characteristics (associativity, etc.) of the ad hoc operators are supplied.

Exploitation of the identities within the compiler will, of course, increase the complexity of the compiler, and one would perhaps incorporate only a selected subset of them. However, the possibility of later extensions to exploit further identities is of some value. Finally, the identities are extremely useful to the programmer (as opposed to the analyst who specifies the overall procedure and who may use the identities in theoretical work), since the tricks used by the programmer, as in allocating storage (partitioning) or modifying the sequence of a scan (permutation), are almost invariably special cases of the more general identities outlined here.

Reference

1. Iverson, Kenneth E., A Programming Language. Wiley, 1962.

Notes

a.		The language described here differs from [1] in minor details designed to further systematize and simplify its structure.
b.		Except for branching statements, which are not relevant to the present discussion.
c.		The symbol ↔ will be used to denote equivalence.
d.		The `∊` may be elided.
e.		Abbreviated as “dual wrt”.

Tables

Table 1. Symbols for Basic Operations

UNARY
Operation		Symbol
Absolute value		\|
Minus		–
Floor (largest integer contained)		`⌊`
Ceiling (smallest integer containing)		`⌈`
Logical negation		~
Reciprocation (`÷` x ↔ 1 `÷` x)		`÷`

BINARY
Operation		Symbol
Arithmetic operators		+ – × `÷`
Arithmetic relations		< ≤ = ≥ > ≠
Max, Min		`⌈ ⌊`
Exponentiation (y^x)		x y
Residue mod m		m \| n
Logical AND, OR		∧ ∨

Table 2. Unary Operations Defined on Arrays

νx		Dimension of vector x
νA		Row dimension of matrix A (dimension of row vectors)
μA		Column dimension of matrix A (dimension of column vectors)
μA		Column dimension of matrix A (dimension of column vectors)
		Transposition of matrix about axis indicated by the straight line (A is the ordinary transposition of A)
		x denotes transposition of vector x (reversal of order of components)
`⊥`		Base-two value of vector

Table 3. Constant Vectors and Square Matrices of Dimension n

Symbol	Designated Constant
`∊`(n)	Full vector (all 1’s)	Logical Vectors
`∊`^j(n)	jth unit vector (1 in position j)
`⍺`^j(n)	Prefix vector of weight j (j leading 1’s)
`⍵`^j(n)	Suffix vector of weight j (j trailing 1’s)
`⍳`^j(n)	Interval vector of weight j (j, j+1, …, j+n–1)
(n)	Zero matrix	Logical Matrices
(n)	Identity matrix (1’s on diagonal)
(n)	Strict upper right triangle (1’s above diagonal)
(n)	Upper right triangle (1’s above and on diagonal)
(n)	Strict lower right triangle
…	…
(n)	Upper left triangle

Table 4. Operations and Relations Defined on Operators

Self-associativity	αΟ	= 1 iff xΟ(yΟz) ↔ (xΟy)Οz
Commutativity	γΟ	= 1 iff xΟy ↔ yΟx
Distributivity	Ο₁δΟ₂	= 1 iff xΟ₁(yΟ₂z) ↔ (xΟ₁y)Ο₂(xΟ₁z)
Associativity	Ο₁αΟ₂	= 1 iff xΟ₁(yΟ₂z) ↔ (xΟ₁y)Ο₂z
Dual wrt τ	δΟ is an operator such that (δΟ)x ↔ τΟτx if Ο is unary x(δΟ)y ↔ τ((τx)Ο(τy)) if Ο is binary.

Table 5. Properties of the Binary Arithmetic Operators

1	0	1	0	1	1	0	0
1	0	1	0	1	1	0	0
		×	`÷`	`⌊`	`⌈`
+	–			`⌊`	`⌈`
+	–	×	`÷`	`⌈`	`⌊`	\|

	αa
	γa
	δa (wrt `÷`)
	δa (wrt –)
	a

	+
	–
	×
	`÷`
	`⌈`
	`⌊`
	\|


	+
	–
	×
	`÷`
	`⌈`
	`⌊`
	\|

0	0	0	0	1	1	0
0	0	0	0	r	r	0
1	1	0	0	0	0	1
r	r	0	0	0	0	r
0	0	0	0	1	1	0
0	0	0	0	1	1	0
0	0	0	0	0	0	0
0	0	l	l	l	l	0

1	1	0	0	0	0
0	0	0	0	0	0
0	0	1	1	0	0
0	0	0	0	0	0
0	0	0	0	1	0
0	0	0	0	0	1
0	0	0	0	0	0
0	0	0	0	0	0

	a a †

	a a

† l and r denote left and right distributivity.

Table 6. Properties of the Binary Logical Operators

		1	1	0	1	0	1	1	1	0	1	0	0	0	0	0	1	αl
		1	1	0	0	0	0	1	1	1	1	0	0	0	0	1	1	γl
		1	∨	≥	`⍺`	≤	`⍵`	=	∧	`∆`	≠		>		<	`∇`	0	δl *
		0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1		T
		0	0	0	0	1	1	1	1	0	0	0	0	1	1	1	1
		0	0	1	1	0	0	1	1	0	0	1	1	0	0	1	1
		0	1	0	1	0	1	0	1	0	1	0	1	0	1	0	1

		0	∧	>	`⍺`	<	`⍵`	≠	∨	`∇`	=		≥		≤	`∆`	1	l
		0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	T
0	0	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0		l l
∧	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0
>	2	0	0	0	1	0	1	0	0	0	0	0	0	0	0	0	0
`⍺`	3	0	1	0	1	0	1	0	1	0	0	0	0	0	0	0	0
<	4	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0
`⍵`	5	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
≠	6	0	0	0	1	0	1	0	0	0	0	1	0	1	0	0	0
∨	7	0	1	0	1	0	1	0	1	0	1	0	1	0	1	0	1
`∇`	8	0	0	0	1	0	1	0	0	0	0	0	0	0	0	0	0
=	9	0	0	0	1	0	1	0	0	0	0	1	0	1	0	0	0
	10	0	0	0	1	0	1	0	0	0	0	1	0	1	0	0	0
≥	11	0	0	0	1	0	1	0	0	0	0	0	0	0	0	0	0
	12	0	1	0	1	0	1	0	1	0	0	0	0	0	0	0	0
≤	13	0	1	0	1	0	1	0	1	0	1	0	1	0	1	0	1
`∆`	14	0	0	0	1	0	1	0	0	0	0	0	0	0	0	0	0
1	15	0	1	0	1	0	1	0	1	0	1	0	1	0	1	0	1

		0	∧	>	`⍺`	<	`⍵`	≠	∨	`∇`	=		≥		≤	`∆`	1	l
		0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	T
0	0	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0		l l
∧	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0
>	2	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0
`⍺`	3	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0
<	4	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0
`⍵`	5	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
≠	6	0	0	0	1	0	0	1	0	0	1	0	0	1	0	0	0
∨	7	0	0	0	1	0	0	0	1	0	0	0	1	0	0	0	1
`∇`	8	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0
=	9	0	0	0	1	0	0	1	0	0	1	0	0	1	0	0	0
	10	0	0	0	1	0	0	1	0	0	1	0	0	1	0	0	0
≥	11	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0
	12	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0
≤	13	0	0	0	1	0	0	0	1	0	0	0	1	0	0	0	1
`∆`	14	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0
1	15	0	0	0	1	0	0	0	1	0	0	0	1	0	0	0	1

* Duality with respect to ~.

Table 7. Properties of the Nontrivial Associative Commutative Logical Operators

∧

∨

≠

∧

∨

≠

1	1	1	0
1	1	0	1
0	0	0	0
0	0	0	0

∧

∨

≠

1	0	0	0
0	1	0	0
0	0	1	1
0	0	1	1

Table 8. Properties of the Unary Operators

\|	`⌊`	`⌈`	–	`÷`	~	c
	`⌈`	`⌊`	–	`÷`		δc (wrt –)
			–	`÷`		δc (wrt `÷`)
					~	δc (wrt ~)

Table 9. Group of Transpositions (rotations of the square)

Discussion

Gorn: Some almost ancient sources of generalized operators are: Whitehead, Universal Algebra, and Grassman, Die Ausdehnungslehre. Some more modern sources are: Bourbaki Algebra, Forder, Calculus of Extension, and Bodewig, Matrix Calculus.

Backus: Why this comment?

Gorn: The paper presents generalized relationships among operators. The cited references are directly concerned with such questions.

Brooker: Why do you insist on using a notation which is a nightmare for typist and compositor and impossible to implement with punching and printing equipment currently available? What proposals have you got for overcoming this difficulty?

Iverson: Transliteration is, of course, essential, but I have avoided its treatment first, because a suitable scheme is highly dependent on the particular equipment available, and second, because it is extremely simple. If, for example, you have the stamina of ALGOL and MAD users (who tirelessly write PROCEDURE and WHENEVER), then you can use the distinct names that I have given (for conversational purposes) to each of the operators. Anyone who prefers briefer symbols can (as I have) easily design schemes which are brief, simple and mnemonic.

Gorn: This question of transliteration: I’m not talking about this paper in particular. In general it is a problem that is always with us. There is a danger that as the transliteration rules become more complicated replacement productions, we rapidly fall into a recognition problem, a translation problem and possibly an unsolvable word problem.

Iverson: Yes, one should distinguish the recognition of identifiers from the syntax, which is of more concern to the ultimate user.

Brooker: It is not obvious to me that these two symbols for FLOOR and CEILING have a great deal of mnemonic value.

Iverson: Yes, but once you have read it, you can remember it.

Gorn: But the more redundance you put in the symbolism of a language, the more equivalence problems you have.

Iverson: Not problems, I suggest that these are assets. In the extreme we could go back to the Assign and the Sheffer stroke, let’s say, and then we have no problems.

Ross: I don’t remember who asked the original question about notation, but I submit that they find themselves a sugardaddy or someone with a few thousand bucks and get themselves a display console such as we’re getting with programmable characters. You can even publish from it by taking pictures. I don’t see why we should let mechanics influence our progress at all.

Iverson: Someone who is interested in standardization would not like that comment—a 48-character set is the thing you know. The limitation on the available character set, I think, is more of a transient phenomenon than the algorithms we want to describe.

Ross: With our console the 48 characters are available, and there is another mode where you can program any bit patterns you want in a matrix; we are doing this specifically for this purpose because we feel that the notation that goes along with the set of ideas should be usable.

Bauer: I would say that compared with some other existing proposals for matrix extensions such as that of Ershov this is a much more closed consistent system. No one can say today how far we will go in using such a language in the near future.

Iverson: Let me comment that it is useful to distinguish two reasons for learning a language; one is for description and analysis and the other is for automatic execution. I submit that this kind of formalism is extremely helpful in analyzing difficult problems without worrying about whether one wants to execute the resulting program. As a matter of fact, I would use this as a preliminary before going into some language that is executable.

Gorn: As I have it, the descriptive language you have does have direct translation properties into command language.

Holt: I would like to translate that comment of Ken’s about description, analysis, and execution in the following way: Programming languages are machine-dependent—one is appropriate for the human processor and another for the computer.

Iverson: Well, I would disagree with that because I would use exactly the same notation for describing the computer. In fact, I’ve done it for the 7090 or most of the 7090, and other machines as well. In fact, you can say the instruction set of the machine is another form of language with a slave to execute it.

Holt: Then, what was the meaning of your comment?

Iverson: At this point, for example, this is not a source language in the sense that there is a mechanism available for translating it into some other language. There is no convenient way for automatic execution by translation or direct execution. Now I suggest that the notation is worthwhile just for analysis even though later we have to do a hand translation into some executable language.

Green: If I may interrupt for a moment, I think we should limit the discussion on notation to the next five minutes. And we should get to other questions.

Tompkins: There exist problems around here which were coded first in essentially this language and then were translated with great care into FORTRAN, for example.

Perlis: How should this language be used on computers? For what class of problems—on or off computers? Thus, it’s not quite clear to me that a mathematical proof of an algorithm written in FORTRAN (the same algorithm if you will) is any more difficult than a mathematical proof of one of your algorithms. Algorithms are written for two reasons: 1) execution by computer: which means that it is pointless to write it if you cannot execute it on a computer, and 2) for description and analysis. Now if the description is difficult to read, then it fails somewhat. If, in addition, analysis is as difficult, say, as in ALGOL, then the virtue of the language is questionable.

A last question: You haven’t discussed at all the way you describe data. It is not clear that you have a notation for describing data, though you have a great wealth of notation for manipulating data once it is described. Now ALGOL will obviously be extended to include matrix and vector operations in expressions. So my question is: for what classes of problems, remembering you have no data description, is your description better than “Algolic” descriptions?

Iverson: I’m not sure if I can really separate all these points. The question of representation (data description) is too lengthy to treat here. To save time, let me say that I discuss it in Chapter 3 of my book. This discussion is fairly limited, but adequate.

Concerning the virtues of the language for description and analysis, I can only say that I have found it very useful in many diverse areas, including machine description, search procedures, symbolic logic, sorting and linear programming. Now it is a separate problem as to whether you want to incorporate the complete generality of the language in any particular compiler—but I suggest that it is desirable to have a more general system that you retract from for any particular compiler rather than adding ad hoc provisions to more limited languages.

As to the question of proofs, you can, of course, translate a proof in any language to any other language, but I suggest that the proofs I give are the kind that are immediately obvious to any mathematician. There is, of course, the question of to whom you want your proofs to be obvious. Likewise, for difficulty of reading, the question is, “for whom”? And I suggest that anybody who has ever dealt with matrix operations finds this notation very easy to read.

Perlis: But is it fair to say then that if one is going to create or extend a language that the direction of extension really isn’t critical—that the accent should not be put on operations so much, but on data representation or sequence rules?

Iverson: No, I disagree.

Gorn: Since you are supporting an infix notation for binary operators, would it not be useful to have some control operators in the language which would correspond to the combinatory logician’s “Application” operation? Also operators for insertion and deletion of parentheses, and operators to adjust priorities in the scopes of other operators, e.g. to construct precedence matrices of the type discussed by Floyd?

Iverson: Let me give a sort of general answer to this sort of thing. You’re probably talking about some specialized application for which you want special operators. I submit that no one can design a language that is equally useful for everybody. Instead, what you would like to have is a single core which you can extend in a straightforward manner.

In so far as precedence and hierarchy are concerned, I have not found any great need for them in my work, but I can understand why you might want to use them in compilers. In fact, I think such hierarchy should be included in a tabular form so that it is easily changeable.

Holt: The presentation is a marvelous demonstration of the power of notation in the hands of a very clever man. Conclusions: (1) Let us teach this skill to clever people. (2) Let us create machine mechanisms to respond to notational inventions.

Iverson: On the contrary, the basic notions are very simple and should be introduced at high school level to provide a means for describing algorithms explicitly. For example, the vector can be introduced as a convenient means for naming a family of variables and can be used by the student (together with a few very simple operators) to work out explicit algorithms for well-known operations such as decimal addition, polynomial evaluation, etc. A little notation and much care in requiring explicit algorithms would, in fact, clarify and simplify the presentation of elementary mathematics and obviate the teaching of programming as such.

Gosden: Many of the equivalences only become useful and powerful when time dependency is included. For example, each operation on any array implies serial or parallel execution component by component. How can you cover this for serial or parallel statements? Obviously, there are many tricks that are time (or series) dependent in array operations. How do they relate to dualities and equivalences, etc.

Iverson: Parallel operation is implied by any vector operation; serial operation can be made explicit by a program showing the specified sequence of operations on components. Distinctions of this type (employing the present notation) are made clear in Falkoff’s “Algorithms for Parallel Search Memories” [J. ACM, Oct. 1962].

More complex simultaneity can be expressed by a collection of programs operating concurrently, all mutually independent but for interaction through certain (interlock) variables common to some two or more programs. Explicit dependence on real time can be introduced by incorporating, as one of this collection of programs, a program describing a clock (i.e., oscillator) driven counter.

Gorn: Does your generalized operator notation for matrices lead to a simpler proof of the generalized Laplace expansion of determinants?

Iverson: For a given logical vector u, the Laplace expansion of the determinant δA can be expressed as
δA = (+_i/((δu/Sⁱ//A) × (δ/ⁱ//A) × ρ′Sⁱ)) × ρ′u,
where S is a logical matrix whose rows represent all partitions of weight +/u, where ρ′v = ρ(v/⍳¹, /⍳¹) is the “parity” of the logical vector v, and ρp is the parity of the permutation vector p, defined as +1 or –1 according as the parity of p is even or odd. Since
δA = +_i/((×/ ^Pⁱ/A) × ρPⁱ)
(where P is the matrix whose (νA)! rows exhaust all permutations of dimension νA, and where compression by a logical matrix U is defined in the obvious way as the catenation of the vectors Uⁱ/Aⁱ), then the usual proof of the Laplace expansion (i.e. showing that a typical term of either expansion occurs in the other) can be carried through directly with the aid of the following fact: if u is any logical vector and p is a permutation of like dimension, then there exists a unique triple v, q, r, such that
^q = u/v//^p, and ^r = ///^p.
[The vectors v and u are clearly related by the expressions v = ^p u, and u = v ^q, and moreover, ρp = (ρ′u) × (ρ′v) × (ρq) × (ρr).]

The special matrices occurring in the foregoing can all be specified formally in terms of the matrix T(b, n) defined as follows: T_jⁱ ε ⍳⁰(b), μT = bⁿ, νT = n, and b⊥T = ⍳⁰, where b⊥x denotes the base-b value of the vector x. Thus S = (+/u = +/M)//M, where M = T(2, νA) and P = (∧/σ/M)//M, where M = T(νA, νA), and σ/x is the set selection operation [1, p. 23].

Moreover, the parity function ρp may be defined formally as ρp = – u, where u = 2 | +/ /(p p).

Dijkstra: How would you represent a more complex operation, for example, the sum of all elements of a matrix M which are equal to the sum of the corresponding row and column indices?

Iverson: ++/(M = ⍳¹ ⍳¹)//M

List of Conferees, Working Conference on Language Structures, August 14-16, 1963

P. Abrahams		International Electric Corp.
R.W. Allard		Control Data Corp.
John W. Backus		IBM
F. Bauer		Math. Inst. der TH München
R. Bosak		Inst. for Defense Analyses
R.A. Brooker		IBM
L.L. Bumgarner		Oak Ridge Nat. Lab.
W.H. Barge		Univac Div., Sperry Rand
T.E. Cheatham, Jr.		Computer Associates, Inc.
H.B. Curry		Pennsylvania State U.
E.W. Dijkstra		Technological U., Eindhoven
Arthur Evans		Carnegie Inst. of Technology
R.J. Evey		IBM
J. Fennell		Logistics Research Project
Robert W. Floyd		Computer Associates, Inc.
Donald B. Gillies		U. of Illinois
Ruth Goodman		Westinghouse Corp.
S. Gorn		U. of Pennsylvania
John Gosden		Auerbach Electronics Corp.
Robert M. Graham		U. of Michigan
Julien Green		IBM
Sheila Greibach		Harvard U.
John W. Guy		Nat. Security Agency
Leonard H. Haines		MIT
A.W. Holt		U. of Pennsylvania
P.Z. Ingerman		Westinghouse Electric Corp.
R. Itturiaga		Carnegie Inst. of Technology
E.T. Irons		Inst. for Defense Analyses
K.E. Iverson		IBM
Walter W. Jacobs		Inst. for Defense Analyses
Charles Katz		General Electric
R.A. Kirseh		Nat. Bur. Standards
Rainer Kogon		IBM
B.M. Leavenworth		IBM
M. Henri Leroy		Cie Bull
L. Lombardi		MIT
William H. Marlow		Logistics Research Project
E.J. McCluskey		Princeton U.
M.A. Melkanoff		U. of California
J.N. Merner		Burroughs Corp.
G.J. Mitchell		Inst. for Defense Analyses
A. Newell		Carnegie Inst. of Technology
M. Paul		Math. Inst. der TH Munchen
A.J. Perlis		Carnegie Inst. of Technology
George Radin		IBM
Gene F. Rose		System Development Corp.
D.T. Ross		MIT
Bernard D. Rudin		Lockheed Aircraft Corp.
R.A. Sibley, Jr.		IBM
K.H. Speierman		Burroughs Corp.
T.B. Steel		System Development Corp.
C.B. Tompkins		U. of California
Hale F. Trotter		Princeton U.
R.E. Utman		Business Equipment Mfrs. Assoc.
S. Warshall		Computer Associates, Inc.
J.H. Wegstein		Nat. Bur. Standards
J. Weizenbaum		General Electric
M.V. Wilkes		Univ. Math. Laboratory, Cambridge
Kenneth A. Wolf		Control Data Corp.

Originally appeared in the Communications of the ACM, Volume 7, Number 2, 1964-02.

created	2009-09-08 10:00
updated:	2013-07-23 21:50