A Commentary on APL Development

Kenneth E. Iverson

Because A Dictionary of APL [1] was designed for concise and convenient reference, it is important that it be supplemented by other treatments of the language, concerning matters such as style and literacy [2], discussion of the means used to eradicate anomalies that have crept into the language [3], illustrations of analytic uses of the language [4], and explorations of the uses of particular verbs, adverbs, and conjunctions [5].

The present paper treats the rationale for many of the less familiar facilities defined in the dictionary, rectifies the definition of rank, provides simpler definitions for the conjunctions denoted by cup and cap, and discusses the matter of treating sets and set-like operations. The order of treatment parallels that of the dictionary.

Although some of the facilities in APL have been introduced in response to demands from programmers, most have been motivated by knowledge of their utility in some branch of mathematics. Moreover, they have invariably been generalized from the special cases on which they were based.

For example, reduction, inner product, and outer product were not demanded by programmers and are still not demanded by programmers in other languages. Moreover, although they were suggested by the sigma notation, by the matrix product of linear algebra, and by the outer product of tensor analysis, it is the generalization of these ideas that has made them fruitful in general programming.

The exploitation of these facilities has been a long process of discovery by programmers, discovery often impeded by the fact that the implementation of a given construct might well remain too inefficient for serious use until the discovery of its utility motivated more efficient implementation. Reductions and inner products on boolean functions provide a case in point.

The motivation for the newer verbs and adverbs defined in the dictionary remains largely mathematical, and the process of discovery and exploitation continues. For example, programmers have taken to the rank adverb, and its efficiency in the Sharp APL implementation has already been shaiply improved by embodying its treatment within individual primitive verbs.

If the development of APL is to benefit fully from experience in use, it is essential that the process be slow and conservative. Early examples of this conservative approach (such as the adoption of notation for the circular functions) occur in Section 4 of The Evolution of APL [6], and the “The Story of[7]. A more recent example will now be discussed.

Although Section IIC of [1] states that adverbs and conjunctions may produce results of any class, none of those defined in the dictionary produce anything other than verbs. The potential of adverbs producing conjunctions was recognized as early as 1978 [8], but was avoided by introducing a collection of conjunctions, as in fg ⍵ ←→ (f⍵)+(g⍵) , and pq⍵ ←→ (p⍵)^(q⍵) . Somewhat later, Arthur Whitney suggested the conjunction til [9], which provided a more general facility for this purpose, but which still avoided the introduction of constructs that produce adverbs or conjunctions.

The present paper proposes an adverb (called yoke) that produces a conjunction; it supersedes several facilities proposed in the dictionary, the withe conjunction (), the union and intersection (∪ ∩), and the concomitant notion of combining rank.

Box (<). The importance of box rests on the fact that it provides a scalar encoding of any noun to which it is applied. Because the result is a scalar, arrays of boxed elements can be handled conveniently by existing selection and structural functions, and because the inverse open (>) has rank 0, any function f can be handled with the convenience of a scalar function by using the simply-related function f¨> . For example, if f:+⌿(-⍳⍴⍺)⌽⍺∘.×⍵,0×1↓⍺ , then c1 f c2 is the coefficient vector of the polynomial equal to the product of polynomials with coefficients c1 and c2 , and f¨>/bc produces the boxed coefficient of the product of a list bc of boxed coefficients.

The convenience of scalar encodings can be illuslrated by the implicit scalar encodings used for numeric values: the increasingly complex lists 12 and 12.34 and 12.34j45.6 represent integers, rationals, and complex numbers, but in APL only the scalar encodings are relevant and the details of the representations can, for most purposes, be ignored.

The analogue of box used in other dialects (where it is called enclose, and denoted by), differs in two significant respects:

1.  a differs from <a for any argument a ; whereas a differs from ⊂a only if a is not a simple scalar. For example, if a←'I' , then <a is a one-letter word, <<a is a one-word sentence, and <<<a is a one-sentence paragraph; each distinct (i.e. distinguishable), whereas a and ⊂a and ⊂⊂a and ⊂⊂⊂a are not distinct.
2.  In other dialects, a certain list of functions (usually characterized as “mathematical”) apply to enclosed numeric arguments as if they were not enclosed; thus (⊂3 4)+(⊂5 6) is equivalent to (<3 4)+¨>(<5 6) . The use of such an explicit list raises questions concerning the characterization of any new functions, user defined, or derived functions. For example, would the defined function f:⍺+÷⍵ qualify? It is mathematical in the sense that 3 f 4 produces the result 3.25 , and in that f/i evaluates the continued fraction represented by the integers i .

Cycle, Mix, and Mix Index (≤ ≥ ]). These functions all concern permutations, and their importance rests on the importance of permutations in mathematics, and on the importance of sorting (a particular case of permutation) in data processing.

Any reordering or permutation of the vector ⍳n is called a permutation vector; it can be used with indexing to produce a corresponding ordering of the major cells of an array. For example, if p←0 3 1 2 , then p{'rstu' yields 'rust' .

Since { permits negative indexing (selecting from the tail end), an argument such as q←0 ¯1 1 2 can produce the same permutation as p .

In the dictionary, the term mix is introduced to characterize such a vector, but since any mix q can be reduced to the equivalent permutation vector by the expression (⍴q)|q , we will here simplify the discussion by speaking only of permutation vectors.

The permutation s←4 5 2 1 0 3 used for illustration in the dictionary is said to have cycles 5 3 1 and 4 0 and 2 because when used in indexing it “cycles” the elements in each group of positions, but never intermixes different groups. For example:

b←s{a ←→ efcbad
c←s{b ←→ adcfeb
d←s{c ←→ ebcdaf
e←s{d ←→ afcbed
f←s{e ←→ edcfab
g←s{f ←→ abcdef
         0   4
          1 3 5

Any permutation can be defined by specifying its cycles. Since any rotation of a cycle (such as 1⌽5 3 1 or 2⌽5 3 1) represents the same reordering, we can adopt a “standard” form in which the largest element is in the leading position (as was done in the example, with 5 3 1 and 4 0 and 2). Finally, if these cycles are put in increasing order according to their leading elements and then catenated to form a single vector, c←2 4 0 5 3 1 , the breaks between the successive cycles can be determined by the expression c=⌈\c , and c is called the cycle representation of the permutation s .

The cycle representation of a permutation s is given by the cycle function (c←≤s), and the corresponding permutation is obtained from the cycle representation by the mix function (s←≥c).

The cycle representation is useful in identifying distinct groups that are not intermixed by a permutation, and in finding the “power” of a permutation (the number of times it must be applied to restore the original order); this is given by the least common muItiple of the lengths of thc cycles, that is, by ^/,(c=⌈\c)1⍤⍴c .

Moreover, since the functiontreats any “incomplete” argument c as if the missing elements of ⍳⌈/c are single cycles (that do not move), then ≥5 3 1 produces a permutation that cycles the elements 5 , 3 , and 1 ; and ≥4 1 interchanges elements 4 and 1 . Such commonly useful permutations are otherwise relatively awkward to specify.

All the permutations of order n can be arranged in a !n by n table T in “lexical order”, that is, the rows evaluated as numbers in a base n system (n⊥⍤1 T) are in increasing order. For example, if n←3 , then T is the table:

0 1 2
0 2 1
1 0 2
1 2 0
2 0 1
2 1 0

A given permutation p can therefore be identified by its index in such a table; the expression ]p yields the index of p . Consequently, r←(n-⍳n←⍴p)⊤]p yields the reduced representation of p and 2|+/r yields its parity, both as defined in [4].

The dyadic expression i]a yields that permutation of the major cells of a produced by the permutation of order 1↑⍴a whose mix index is i . For example, if a←'abcd' , then 1]a reverses the last two elements of a , and 5]a reverses the last three, and 4]a rotates the last three.

Finally, i]⍳n produces row i of the table of permutations of order n , and (⍳!n)]⍳n produces the entire table.

Nub in, Nubsieve, and Nub (= ≠ ↑). The expression ≠a produces a boolean result that defines a classification or set, the 1’s in the result identifying the set of distinct major cells of a . The expression (≠a)⌿a produces a selection based on the classification ≠a , yielding the nub of a . The expression ↑a is equivalent to (≠a)⌿a , and therefore produces a combined classification and selection. For example:

   a           ≠a            ↑a
ABC         1 0 1          ABC
ABC                        DEF

Chapter 2 of [10] may be consulted for a discussion of classifications and sets, where they are introduced by the following statement of their role in mathematics and data processing:

It is often necessary to separate a collection of objects into several classes, and then perform some operation upon each of the classes. The operation performed is often trivial compared to the complexity of the classification procedure itself, and classification is therefore an important matter. Indeed, most computation involves some classification, even though the classification process may be implicit rather that explicit.

Further examples of the use of classification may be found in the treatment of graphs and barcharts in Chapter 4 of [11], and in the treatment of directed graphs and trees in Chapter 6 of [10].

Two further examples of the use of nub, and of classifications based upon it, will now be discussed:

   an←3 1 4 1 3 2
3 1 4 2
1 0 0 0 1 0
0 1 0 1 0 0
0 0 1 0 0 0
0 0 0 0 0 1

The last result is an “auto-classification” of an produced by the Nub-in function defined by (↑an)∘.=an . If an is a list of account numbers, and if

   de1←35 40 21 52 25 17

then the expression (=an)+.×dep may be seen to produce a summary of the deposits to individual accounts.

The second example concerns the use of the matrix =an as a post-multiplier rather than as a pre-multiplier. If an is a list of arguments to a function F that is expensive to compute, then it is desirable to apply F only to the distinct elements (as in F↑an), and then distribute the results for all arguments, as in (F↑an)+.×=an . An expression of this form is used in [4], to derive Newton’s symmetric functions for determining the coefficients of a polynomial from its roots.

Less (~). Because less applies to the major cells of its two array arguments as a “set-like” function (differences), its further discussion will be deferred to the final section.

Nand, Nor, Right, and Left (⍲ ⍱ ⊢ ⊣). Boolean functions (that are defined on the boolean arguments 0 and 1 , and produce boolean results) are very generally useful. Because there are four possible pairs of two arguments (shown in the columns of the table 2 2⊤⍳4) for each of which two different results are possible, there are 16 (that is, 2*4) distinct dyadic boolean functions. Of these, two are constants, one is the negation of the left argument, one is the negation of the right argument, and the remaining twelve are provided in APL. The relations < , etc. provide six, the or and and provide two, and the set is completed by nand and nor, and by right and left (or, more precisely, by ⊢⍤0 and ⊣⍤0).

Nand and nor also have the interesting property that either can be used alone to produce all of the other boolean functions. The four missing dyadic booleans can be expressed as 0⍤0 and 1⍤0 and ⊢⍤~ and ⊣⍤~ .

Programmers largely rely on the boolean functions or, and, and not, but it should be noted that others of the set can provide briefer, and perhaps clearer, statements:

For   Substitute        and read as
~a∨b  a⍱b  (neither) a nor b
a^~b  a>b a exceeds b
(~a)∨b  a≤b  a implies b
(a^~b)∨(~a)^b a≠b  a differs from b

Some uses of left and right are discussed in Section IIIE of the dictionary. A few others are worth noting:

  The expressions ⊣⍀m and ⊣\m replicate the leading row and the leading column of a matrix m .
  Expressions such as s*.5⊣s←+/x*2 and s÷⍴x⊣s←+/x may be used instead of (+/x*2)*.5 and (+/x)÷⍴x ; they tend to give prominence to the overall functions (square root and division of s) and to subordinate the detail involved in computing their arguments.
  The foregoing use of may be read as “where”, as in “square root of s , where s is the sum of squares of x”. The usage may be extended to reduce the use of parentheses to any desired degree.

Words (). The definition of the monad formalizes an essential process that has long been ill-defined and divergent in dialects. It introduces new classes of names that might be used in various ways, as in 2r3 as a constant name for the rational number 2÷3 .

A function that parses a sentence s and shows its execution piece-by-piece can be very useful as a teaching or de-bugging tool; use of the expression ⊥s could significantly simplify its definition.

The functionmight well be extended to a new dyadic case in which a boxed left argument would specify the parameter p used in the dictionary definition of the monadic case. This would provide considerable flexibility in the rules for word formation.

Circle (). The definition of the dyad k○⍵ has been extended to include two existing primitives (13○⍵ for the identify function, and 14○⍵ for the exponential), together with appropriate inverses for the cases ¯13○⍵ and ¯14○⍵ . The motivation rests on the frequent use, in applied mathematics, of phrases such as ⍵×(*⍵)×(1○⍵) , which can now be written in the form ×/k○⍵⊣k←13 14 1 .

From, All, Merge ({ }). The indexing provided by from can serve as a convenient replacement for all uses of the anomalous bracket-semicolon indexing. Moreover, it provides abbreviated indexing (allowing the selection of cells by specifying only the “outer” indices), negative indexing (allowing convenient selection from the tail end of an axis), and complementary indexing (allowing specification of “all but” an indicated set of indices along any axis). Hui [5] provides examples of uses of from as well as discussion of the related merge adverb } .

It should be noted that a definition such as f←¯1 2 ¯2¨{ cannot be made without negative indexing, because the shape of the argument to which f might eventually be applied is not known.

In the dictionary it is remarked that the monad all can be used to produce cartesian product, as in {⍺⊃<⍵ . It should be noted that the cartesian product cannot be used conveniently to produce the effect of all when applied to lists of more than two elements, since, for example, a cp b cp c would produce items consisting of a two-element list made up of an item from a , together with a boxed two-element list of items from b and c , rather than three-element lists of items from a , b , and c .

Swap (). The adverb“swaps” or “commutes” the arguments of its verb argument; for example, n÷⊂+/x is n into the sum over x and .5*⊂x is “the square root of x”. The symbolwas chosen for its similarity to the initial letter of “commute”.

Swap can greatly simplify expressions, particularly when used in conjunction with other adverbs, as in ⍺+.×⊂⍵ , or ⍺-⊂.×⍵ .

The monadic case (v⊂⍵←→⍵v⍵) can also be used to advantage. For example, ∘.+⊂⍳n and ∘.×⊂⍳n produce addition and multiplication tables.

Under (¨). This adverb was designed to embody much of the notion of duality that recurs in many applications of mathematics. Simple examples occur in the duality of and and or in logic, that is, ⍺∨⍵←→⍺^¨~⍵ and ⍺^⍵←→⍺∨¨~⍵ . Better examples are provided by cases where the right argument of ¨ is not self-inverse, as in ⍺×⍵←→⍺+¨⍟⍵ to characterize the use of “addition under logarithm” to perform multiplication. The use of base-10 logs would be characterized as ⍺×⍵←→⍺+¨(10¨⍟)⍵ , using the definition of ¨ with a noun as argument.

Rank (). The rank adverb applies to a verb left argument and a noun right argument. According to the dictionary definition, the “effective” rank of the derived function u⍤r (that is, the ranks of the cells to which it applies) is the lesser of the ranks of the actual arguments to which it applies, and the appropriate elements of the three-element list r . Moreover, each primitive verb has an “intrinsic” rank as specified in the header of its definition.

Because of complications arising from the use of negative ranks (to indicate a ¯1-cell or ¯2-cell, etc., as defined at the end of Section IIA in the dictionary), the definitions of the rank adverb and the rank of a function need to be refined as follows:

1.  The rank of any verb u is characterized by an n-by-3 matrix, where n is 1 for a primitive function, and increases by 1 for each applications of the rank adverb, the newly-specified rank being appended as a new last row.

When a verb is applied to an argument or arguments, the effective rank is determined by applying the rows of the rank matrix in turn, beginning with the last. For example, if the rank matrix is

10 1 1
¯1 1 1
 3 1 1

and the function is applied (monadically) to a right argument of shape 2 3 4 5 6 , then the successive effective (monadic) ranks computed are 5 and 3 and 2 and 2; the final cell size is 5 6 .


The definitions of composition (u⍤v) and under (u¨v) show that the rank of the derived function is the rank of v . More precisely, it is the last row of the rank matrix of v only. Moreover, the function v is then applied with its rank determined by all except the last row of its rank matrix.

This definition provides for what has been called “close composition”, that is, the function u is applied to the result of applying v , to each individual cell rather than to the overall result. Observe, for example, the difference between the results of ⍉⍤⌹a←?6 5 4⍴100 , and ⍉⌹a .

Finally, the application of v with the last row of its rank matrix removed prevents an effective double reduction in rank when the leading element of the last row is negative.

4.  An enclosed vector right argument to the rank adverb determines the rank of the result without reference to the rank of the left argument. For example, the rank matrix of the function u⍤(<3 4 5) is the matrix with the single row 3 4 5 .

Tie (.). The expression m.v provides a generalization of the outer product, in which a restricted number of axes of the arguments behave as in the outer product, the first m axes of the argument being required to “run together” (and therefore agree).

The outer product often produces a result of high rank which must then be pared down by the use of , as in the following identity D.13 from [4]:

M+.×N ←→ +/1 3 3 2⍉M∘.×N

Tie can be used to express this identity as M+.×N ←→ +/(⍉M)1 .×N .

Prefer, Defer (). Because functions apply only to cells (along the final axes), it is sometimes necessary to apply a monad f after moving a specified list of axes a to the tail end; this is done by using the defer adverb in the expression f⍥a . Similarly, in applying a reduction of the form f⌿ , it may be desirable to first move a specified axis s to the leading position, using s⍥(f⌿) .

Prefer and defer may also be used with the identity function () to perform transpositions that are otherwise awkward to express. Such uses occur in the discussion of sets in a later section. The application of the same transposition to each of the arguments of a dyad f may be expressed using composition, as in ⍺f⍤(⊢⍥a)⍵ .

Ply (.). In a still unpublished paper, E.E. McDonnell has suggested the adoption of a dyadic case to be defined in terms of the monadic case as follows:

⍺ u.n ⍵ ←→ >(n≠0){1¨↓,:(u¨>/).n ⍺,⍤<⍵

For example, 0 +.n 1 gives the n-th Fibonacci number, and 1 ∘∇'⍵×1+⍵÷⍺'.n 1 gives !n . The definition applies only for non-negative values of n .

The definition for the monad u.n⍵ and the dyad ⍺u.n⍵ should both be extended to non-scalar arguments n , the outer shape of the result being ⍴n . For example, 0+.(⍳7)1 yields 0 1 1 2 3 5 8 , and 2¨×.(⍳5) 1 is 1 2 4 8 15 .

Union, Intersection (∪ ∩). These conjunctions, and the concomitant notion of combining rank, are superseded by the adverb yoke defined below.

Yoke (:). The yoke defined here is an adverb that produces a conjunction. The symbol suggested for it (:) is already in use for the custom adverb, and the semicolon is now suggested for that role.

If f , g , and h are verbs, then:

⍺ f g: h ⍵ ←→ (f⍺) g (h⍵)
  f g: h ⍵ ←→ (f⍵) g (h⍵)
The ranks of f g: h are the minimum of the ranks of f and h .

For functions with scalar results, the expression f⍪:g provides what was provided by f∪g ; more generally, one may use f,¨<:g⍪:h⍪:i .

The equivalents of f∪g∩+ and f∪g∩× in the dictionary (or of f+g and f×g in mathematics) may now be provided by f+:g and f×:g . Moreover, p^:q and p∨:q and p>:q provide propositions that define the intersection, union, and difference of the sets defined by the propositions p and q . Finally, the effect of withe (a variant of til) is given by ⊢g:h ; in particular, ⊢+:÷/c gives the continued fraction represented by the list c .

Sets. In mathematics, sets are commonly defined by notation of the form:

S={x|Px}      T={x|Qx}

where P and Q are propositions, that is, functions whose range comprises two elements, sometimes called true and false, but in APL more conveniently treated as 0 and 1 . For example, if P:(⍵=⌊⍵)^(⍵>0)^(⍵<100) , then S is the set of integers lying strictly between 0 and 100 .

Since a set such as S can only be used to answer the question of whether an argument y belongs to S (expressed as y∊S), the set definition appears to add nothing to the proposition used in the definition; that is, the expression Py provides the same result.

However, expressions for the union and intersection of sets are more convenient than the corresponding expressions in terms of the the defining propositions, that is, S∪T instead of (P⍵)∨(Q⍵) , and S∩T instead of (P⍵)^(Q⍵) . What is required to make the use of propositions more palatable (and to exploit the use of booleans other than the or and and used in union and intersection) are suitable conjunctions to link the propositions.

This need is met by conjunctions of the form b: , where b is a boolean function. For example, P∨:Q produces the proposition analogous to the union of sets S and T , and P^:Q produces intersection. Moreover, P>:Q provides ordinary set difference, and P≠:Q provides symmetric set difference. Finally, a “family of propositions” such as H←P,:Q,:R can be used in a variety of ways such as direct evaluation, as in H⍵ , or in an intersection of sets and their complements, as in (b¨(^.=))⍥H , where the propositions corresponding to zeros in the parameter b are negated.

Set-like operations. A proposition defining a set is sometimes itself defined in terms of an exhasustive list of the elements belonging to the set, as in:


The union, intersection, and difference of two sets can then be represented by analogous operations on the corresponding lists, and these operations on lists are often referred to by the names appropriate to the operations on sets.

Thus, if a and b are the lists representing the sets (propositions) A and B , then a⍪b (or ↑a⍪b to remove duplicate elements) corresponds to union (A∨:B) ; a~b to difference (A>:B) ; and a~a~b to intersection (A^:B) .

It should be noted that a⍪b and a~b concern the major cells of the arguments. For example:

   a           b            a⍪b
mno         stu          mno
pqr         vwx          pqr
stu                      stu

   a~b         a~a~b
mno         stu

Finally, major cells relative to any axis k can be treated by expressions of the form a~⍤(k⍥⊢)b .


I am indebted to John Sansom and other careful readers of the dictionary for raising many of the questions addressed herein, and to Roland Pesch, Henri Schueler and E.E. McDonnell for helping provide some of the answers.


[1]  Iverson, K.E., “A Dictionary of APL”, APL Quote Quad, Vol 18, No.1, September 1987.
[2]  Berry, M., and R. Pesch, “Style and Literacy in APL”, Proceedings APL86, July 1986, APL Quote Quad, Vol 16, No.4.
[3]  Iverson, K.E., “APL87”, APL Quote Quad, Vol 17, No.4.
[4]  Iverson, K.E., “Notation as a Tool of Thought”, CACM 23, August 1980.
[5]  Hui, Roger, “Some Uses of { and }, Proceedings APL87, APL Quote Quad, Vol 17, No.4, May 1987.
[6]  Falkoff, A.D., and K.E. Iverson, The Evolution of APL, ACM SIGPLAN Notices 13, August 1978.
[7]  McDonnell, E.E., The Story of , APL Quote Quad, Vol 8, No.2, December 1977.
[8]  Iverson, K.E., “Operators and Functions”, IBM Research Report RC 7091, 1978.
[9]  Iverson, K.E., and A.T. Whitney, “Practical Uses of a Model of APL”, APL Quote Quad, Vol 13, No.1, September 1982.
[10]  Iverson, K.E., Applied Mathematics for Programmers, I.P. Sharp Associates, 1984.
[11]  Iverson, K.E., Mathematics and Programming, I.P. Sharp Associates, July 1986.

First appeared in the APL Quote Quad, Volume 19, Number 1, 1988-09.

created:  2009-10-03 19:00
updated:2013-05-08 09:45