Formalism in Programming Languages
Received July, 1963. Presented at a Working Conference on Mechanical Language Structures, Princeton, N.J., August 1963, sponsored by the Association for Computing Machinery, the Institute for Defense Analyses, and the Business Equipment Manufacturers Association. This work was done at Harvard University while the author was a visiting lecturer, February through June, 1963.
Although the question of equivalences between algorithms expressed in the same or different languages has received some attention in the literature, the more practical question of formal identities among statements in a single language has received virtually none. The importance of such identities in theoretical work is fairly obvious. The present paper will be addressed primarily to the practical implications for a compiler.
The formal identities can be incorporated directly into a compiler, or can alternatively be used by a programmer to derive a more efficient equivalent of a program specified by an analyst. The identities cited include (1) dualities which permit the inclusion of only one of a dual pair as a basic operator, (2) partitioning identities which permit the automatic allocation of limited fast-access storage in operations on arrays, (3) permutation identities which permit the adoption of a processing sequence suited to the particular representation used (e.g., row list or column list of a matrix), (4) general associativity and distributivity identities for double operators (determined as a function of the properties of the basic operators) which permit efficient reordering of operations, (5) transposition identities, and (6) the automatic extension of the appropriate identities to any ad hoc operations (i.e., subroutines or procedures) defined by any user of the compiler.
The discussion will be based upon a programming language which has been presented in full elsewhere . However, the relevant aspects of the language will first be summarized for reference.
The problems of transliteration and syntax
which commonly dominate discussions of language
will here be subordinated as follows.
The symbols employed will permit
the immediate determination
of the class to which each belongs;
thus literals are denoted by roman type,
variables are denoted by italics
(lowercase, lowercase bold, and uppercase bold
for scalar, vector and matrix, respectively),
and operators are denoted by distinct
(usually nonalphabetic) symbols.
The problems of transliteration
(i.e., mapping the set of symbols
employed onto the smaller set provided in a computer)
and of mapping positional information
(such as subscripts and superscripts)
onto a linear representation therefore can,
and will, be subordinated to questions
of the structure of an adequate language.
Array Operations in a Compiler
The systematic extension of the familiar vector and matrix operations to all operators, and the introduction of the generalized matrix product, greatly increase the utility and frequency of use of array operations in programs, and therefore encourages their inclusion in the source language of any compiler. Array operations can, of course, be added to the repertoire of any source language by providing library or ad hoc subroutines for their execution. However, the general array operations spawn a host of useful identities, and these identities cannot be mechanically employed by the compiler unless the array operations are denoted in such a way that they are easily recognizable.
The following example illustrates this point. Consider the vector operation
It would be difficult to make a compiler recognize all legitimate variants of this program (including, for example, an arbitrary order of scanning the components), and to make it distinguish the quite different and essentially sequential program:
The foregoing programs could perhaps be analyzed
by a compiler, but they are merely simple examples
of much more complex scan procedures
which would occur in, say,
a matrix product subroutine.
A somewhat more complex case is illustrated by the vector operation
Finally, there is a distinct advantage
in incorporating array operations
by providing a single general scan for each type
(e.g., vector, matrix, and matrix product)
and treating the operator (or operators) as a parameter.
It then matters not whether each operator is effected
by a one-line subroutine (i.e., a machine instruction)
or a multiline subroutine,
or whether it is incorporated in the array
operation as an open or a closed subroutine.
If several types of representations are permitted
for variables (e.g., double precision,
floating point, chained vectors),
then a scan routine may have to be provided
for each type of representation.
The identities fall naturally into five main classes: duality, partitioning (selection), permutation, associativity and distributivity, and transposition. A few examples of each class will be presented together with a brief discussion of their uses.
In discussing identities it will be convenient to employ the symbols Ο, Ο1, Ο2, ρ, σ, and τ to denote operators, and to define certain functions and relations on operations as follows. The (unary) logical functions αΟ and γΟ are equal to unity iff Ο is associative and Ο is commutative, respectively. The relation Ο1δΟ2 holds iff Ο1 distributes over Ο2, and Ο1αΟ2 holds iff Ο1 associates with Ο2, that is,
This latter is clearly a generalization of associativity,
All of the identities are based upon the fundamental
properties of the elementary operators summarized in Tables 5-8.
Table 5 shows the vector a
of binary arithmetic operators
and below it two logical matrices describing
its properties of distributivity and associativity.
These matrices show, for example,
that a3 (that is, ×)
distributes over + and –,
that ⌈ and ⌊ distribute over
themselves and each other,
and that × associates with itself and ÷.
The first four rows of the table show the
self-associativity of a
(equal to the diagonal of the outer product matrix
Table 6 shows three alternative ways of denoting the
16 binary logical functions:
as the vector of operators l,
as the matrix T of characteristic vectors
(Ti is the
characteristic vector of operator li),
and as the vector
The remaining portion of Table 6 is arranged like Table 5.
Certain functions of the matrices
A unary operator τ
is said to be self-inverse
if ττx ↔ x.
If ρ, σ, and τ
are unary operators, if τ is self-inverse,
If ρ and σ are binary operators, if τ is a self-inverse unary operator, and if
then ρ and σ are said to be dual with respect to τ. The max and min operators (⌈ and ⌊) are dual with respect to minus, and or and and (∨ and ∧) are dual with respect to negation (~), as are the relations ≠ and =.
Dual operators are displayed in the vectors δa and δl of Tables 5 and 6. Each of the 16 logical operators has a dual:
The duality of binary operators ρ and σ also extends to vectors and matrices. Moreover, when they are used in reduction, the following identities hold:
The basic reduction identity (namely, ρ/x ↔ τσ/τx) leads immediately to the following family of identities for the matrix product:
For the logical operators, the family comprises 256 identities, of which 144 are nontrivial.
Duality relations can be specified
for a compiler by a table incorporating l
and can be employed to obviate the inclusion
of a subroutine for one of the dual pair
or to transform a source statement to an equivalent form
more efficient in execution.
For example, in a computer such as the IBM 7090
(which executes an or between registers (i.e., logical vectors)
much faster than a corresponding and,
and which quickly performs an or
over a register (i.e., a test for non-zero)),
Partitioning identities, which permit a segment of a vector result to be expressed in terms of segments of the argument vectors, are of obvious utility in the efficient allocation of limited capacity high-speed storage.
From the definition of the matrix product it is clear that for any binary operators ρ and σ,
If ρ is any associative commutative operator (i.e., αρ = γρ = 1), then
Since the distributivity of σ and ρ is not involved, the foregoing identity (which is a simple generalization of the familiar identity for the product of partitioned matrices) applies to most of the common arithmetic and logical operators.
The identity for the two-way partitioning effected by u and can obviously be extended to a (μP)-way partitioning effected by a logical partition matrix P (defined by ∊ = +//P) as follows:
In this section, p, q, and r will denote permutation vectors of appropriate dimensions.
If Ο is any binary operator, then
For any binary operators ρ and σ,
This single identity permits considerable freedom in transforming a matrix product operation to a form best suited to the access limitations imposed by the representation (i.e., storage allocation) used for A and B (e.g., row-by-row and column-by-column lists).
For the special case
The rotation operators ↑, ↓, , are special cases of permutations; consequently,
Moreover, this identity still holds when the cyclic rotation operators are replaced by the corresponding noncyclic operators , , , and . In particular,
a well-known identity for the superdiagonal matrices
(d) Associativity and Distributivity of Double Operators
If αρ = γρ =
σδρ = 1,
For example, if C
is the connection matrix of a directed graph, then
The associativity of matrix product operators can be very helpful in arranging an efficient sequence of calculations on matrices stored row-by-row or column-by-column. For the logical operators, the number of associative double operators is given by the expression
Of the unary transposition operators,
are special cases of permutation,
but and are not.
Table 9 shows the multiplication table for the group
generated by these four transpositions. The notation
chosen for the four added operators is clear:
The following illustrate the many transposition identities:
Identities (3)-(5) are special cases of the permutation
identities and permit freedom in the order of scan,
which may be important if a backward-chained representation
is employed for the vectors involved.
Identity (6) is the generalization
of the well-known transposition identity
of matrix algebra.
Identity (7) is obtained directly from (6)
by the application of (3), (4), and (5).
The use of a programming language in which elementary operations are extended systematically to arrays provides a wealth of useful identities. If the array operations are incorporated directly in a compiler for the language, these identities can be automatically applied in compilation, using a small number of small tables describing the fundamental properties of the elementary operators. Moreover, the identities can be extended to any ad hoc operators specified by the source program, provided only that the fundamental characteristics (associativity, etc.) of the ad hoc operators are supplied.
Exploitation of the identities within the compiler will,
of course, increase the complexity of the compiler,
and one would perhaps incorporate
only a selected subset of them.
However, the possibility of later extensions
to exploit further identities is of some value.
Finally, the identities are extremely useful
to the programmer
(as opposed to the analyst
who specifies the overall procedure
and who may use the identities in theoretical work),
since the tricks used by the programmer,
as in allocating storage (partitioning)
or modifying the sequence of a scan (permutation),
are almost invariably special cases of the more
general identities outlined here.
1. Iverson, Kenneth E.,
A Programming Language.
Table 1. Symbols for Basic Operations
Table 2. Unary Operations Defined on Arrays
Table 3. Constant Vectors and Square Matrices of Dimension n
Table 4. Operations and Relations Defined on Operators
Table 5. Properties of the Binary Arithmetic Operators
† l and r denote left and right distributivity.
Table 6. Properties of the Binary Logical Operators
* Duality with respect to ~.
Table 7. Properties of the Nontrivial Associative Commutative Logical Operators
Table 8. Properties of the Unary Operators
Table 9. Group of Transpositions (rotations of the square)
Gorn: Some almost ancient sources of generalized operators are: Whitehead, Universal Algebra, and Grassman, Die Ausdehnungslehre. Some more modern sources are: Bourbaki Algebra, Forder, Calculus of Extension, and Bodewig, Matrix Calculus.
Backus: Why this comment?
Gorn: The paper presents generalized relationships among operators. The cited references are directly concerned with such questions.
Brooker: Why do you insist on using a notation which is a nightmare for typist and compositor and impossible to implement with punching and printing equipment currently available? What proposals have you got for overcoming this difficulty?
Iverson: Transliteration is, of course, essential, but I have avoided its treatment first, because a suitable scheme is highly dependent on the particular equipment available, and second, because it is extremely simple. If, for example, you have the stamina of ALGOL and MAD users (who tirelessly write PROCEDURE and WHENEVER), then you can use the distinct names that I have given (for conversational purposes) to each of the operators. Anyone who prefers briefer symbols can (as I have) easily design schemes which are brief, simple and mnemonic.
Gorn: This question of transliteration: I’m not talking about this paper in particular. In general it is a problem that is always with us. There is a danger that as the transliteration rules become more complicated replacement productions, we rapidly fall into a recognition problem, a translation problem and possibly an unsolvable word problem.Iverson: Yes, one should distinguish the recognition of identifiers from the syntax, which is of more concern to the ultimate user.
Brooker: It is not obvious to me that these two symbols for FLOOR and CEILING have a great deal of mnemonic value.
Iverson: Yes, but once you have read it, you can remember it.
Gorn: But the more redundance you put in the symbolism of a language, the more equivalence problems you have.
Iverson: Not problems, I suggest that these are assets. In the extreme we could go back to the Assign and the Sheffer stroke, let’s say, and then we have no problems.
Ross: I don’t remember who asked the original question about notation, but I submit that they find themselves a sugardaddy or someone with a few thousand bucks and get themselves a display console such as we’re getting with programmable characters. You can even publish from it by taking pictures. I don’t see why we should let mechanics influence our progress at all.
Iverson: Someone who is interested in standardization would not like that comment—a 48-character set is the thing you know. The limitation on the available character set, I think, is more of a transient phenomenon than the algorithms we want to describe.
Ross: With our console the 48 characters are available, and there is another mode where you can program any bit patterns you want in a matrix; we are doing this specifically for this purpose because we feel that the notation that goes along with the set of ideas should be usable.
Bauer: I would say that compared with some other existing proposals for matrix extensions such as that of Ershov this is a much more closed consistent system. No one can say today how far we will go in using such a language in the near future.
Iverson: Let me comment that it is useful to distinguish two reasons for learning a language; one is for description and analysis and the other is for automatic execution. I submit that this kind of formalism is extremely helpful in analyzing difficult problems without worrying about whether one wants to execute the resulting program. As a matter of fact, I would use this as a preliminary before going into some language that is executable.
Gorn: As I have it, the descriptive language you have does have direct translation properties into command language.
Holt: I would like to translate that comment of Ken’s about description, analysis, and execution in the following way: Programming languages are machine-dependent—one is appropriate for the human processor and another for the computer.
Iverson: Well, I would disagree with that because I would use exactly the same notation for describing the computer. In fact, I’ve done it for the 7090 or most of the 7090, and other machines as well. In fact, you can say the instruction set of the machine is another form of language with a slave to execute it.
Holt: Then, what was the meaning of your comment?
Iverson: At this point, for example, this is not a source language in the sense that there is a mechanism available for translating it into some other language. There is no convenient way for automatic execution by translation or direct execution. Now I suggest that the notation is worthwhile just for analysis even though later we have to do a hand translation into some executable language.
Green: If I may interrupt for a moment, I think we should limit the discussion on notation to the next five minutes. And we should get to other questions.
Tompkins: There exist problems around here which were coded first in essentially this language and then were translated with great care into FORTRAN, for example.
Perlis: How should this language be used on computers? For what class of problems—on or off computers? Thus, it’s not quite clear to me that a mathematical proof of an algorithm written in FORTRAN (the same algorithm if you will) is any more difficult than a mathematical proof of one of your algorithms. Algorithms are written for two reasons: 1) execution by computer: which means that it is pointless to write it if you cannot execute it on a computer, and 2) for description and analysis. Now if the description is difficult to read, then it fails somewhat. If, in addition, analysis is as difficult, say, as in ALGOL, then the virtue of the language is questionable.
A last question: You haven’t discussed at all the way you describe data. It is not clear that you have a notation for describing data, though you have a great wealth of notation for manipulating data once it is described. Now ALGOL will obviously be extended to include matrix and vector operations in expressions. So my question is: for what classes of problems, remembering you have no data description, is your description better than “Algolic” descriptions?
Iverson: I’m not sure if I can really separate all these points. The question of representation (data description) is too lengthy to treat here. To save time, let me say that I discuss it in Chapter 3 of my book. This discussion is fairly limited, but adequate.
Concerning the virtues of the language for description and analysis, I can only say that I have found it very useful in many diverse areas, including machine description, search procedures, symbolic logic, sorting and linear programming. Now it is a separate problem as to whether you want to incorporate the complete generality of the language in any particular compiler—but I suggest that it is desirable to have a more general system that you retract from for any particular compiler rather than adding ad hoc provisions to more limited languages.
As to the question of proofs, you can, of course, translate a proof in any language to any other language, but I suggest that the proofs I give are the kind that are immediately obvious to any mathematician. There is, of course, the question of to whom you want your proofs to be obvious. Likewise, for difficulty of reading, the question is, “for whom”? And I suggest that anybody who has ever dealt with matrix operations finds this notation very easy to read.
Perlis: But is it fair to say then that if one is going to create or extend a language that the direction of extension really isn’t critical—that the accent should not be put on operations so much, but on data representation or sequence rules?
Iverson: No, I disagree.
Gorn: Since you are supporting an infix notation for binary operators, would it not be useful to have some control operators in the language which would correspond to the combinatory logician’s “Application” operation? Also operators for insertion and deletion of parentheses, and operators to adjust priorities in the scopes of other operators, e.g. to construct precedence matrices of the type discussed by Floyd?
Iverson: Let me give a sort of general answer to this sort of thing. You’re probably talking about some specialized application for which you want special operators. I submit that no one can design a language that is equally useful for everybody. Instead, what you would like to have is a single core which you can extend in a straightforward manner.
In so far as precedence and hierarchy are concerned, I have not found any great need for them in my work, but I can understand why you might want to use them in compilers. In fact, I think such hierarchy should be included in a tabular form so that it is easily changeable.
Holt: The presentation is a marvelous demonstration of the power of notation in the hands of a very clever man. Conclusions: (1) Let us teach this skill to clever people. (2) Let us create machine mechanisms to respond to notational inventions.
Iverson: On the contrary, the basic notions are very simple and should be introduced at high school level to provide a means for describing algorithms explicitly. For example, the vector can be introduced as a convenient means for naming a family of variables and can be used by the student (together with a few very simple operators) to work out explicit algorithms for well-known operations such as decimal addition, polynomial evaluation, etc. A little notation and much care in requiring explicit algorithms would, in fact, clarify and simplify the presentation of elementary mathematics and obviate the teaching of programming as such.
Gosden: Many of the equivalences only become useful and powerful when time dependency is included. For example, each operation on any array implies serial or parallel execution component by component. How can you cover this for serial or parallel statements? Obviously, there are many tricks that are time (or series) dependent in array operations. How do they relate to dualities and equivalences, etc.
Iverson: Parallel operation is implied by any vector operation; serial operation can be made explicit by a program showing the specified sequence of operations on components. Distinctions of this type (employing the present notation) are made clear in Falkoff’s “Algorithms for Parallel Search Memories” [J. ACM, Oct. 1962].
More complex simultaneity can be expressed by a collection of programs operating concurrently, all mutually independent but for interaction through certain (interlock) variables common to some two or more programs. Explicit dependence on real time can be introduced by incorporating, as one of this collection of programs, a program describing a clock (i.e., oscillator) driven counter.
Gorn: Does your generalized operator notation for matrices lead to a simpler proof of the generalized Laplace expansion of determinants?
Iverson: For a given logical vector u, the Laplace expansion of the determinant δA can be expressed as
The special matrices occurring in the foregoing can all be
specified formally in terms of the matrix
T(b, n) defined as follows:
Moreover, the parity function ρp
may be defined formally as
ρp = – u,
Dijkstra: How would you represent a more complex operation, for example, the sum of all elements of a matrix M which are equal to the sum of the corresponding row and column indices?
Iverson: ++/(M = ⍳1
List of Conferees, Working Conference on Language Structures, August 14-16, 1963
Originally appeared in the Communications of the ACM, Volume 7, Number 2, 1964-02.