Machine Solutions of Linear Differential Equations
Applications to a Dynamic Economic Model


 

A thesis presented by

Kenneth E. Iverson

to

The Division of Applied Science
in partial fulfullment of the requirements
for the degree of
Doctor of Philosophy
in the subject of Applied Mathematics
 

Harvard University
Cambridge, Massachusetts, January, 1954
 



 

Preface

The present use of the Mark IV Calculator in the solution of a new dynamic economic model continues the previous work of Dr. H.F. Mitchell in this field. The numerical results obtained should be of interest to economists, and it is believed that a number of the methods employed will prove useful to computers.

The writer wishes to express his appreciation to Professor Howard H. Aiken for his helpful guidance and encouragement, and to Professor Wassily W. Leontief who proposed the problem and offered many valuable suggests for its solution. The advice and assistance of numerous members of the Staff of the Computation Laboratory is gratefully acknowledged, in particular that of Dr. Robert C. Minnick and Mr. Anthony G. Oettinger. To Mr. James R. Terrell the writer is indebted for profitable discussions arising out of his use of the matrix subroutines described herein.

The writer also wishes to express his thanks to Miss Jacquelin Sanborn who typed the plates for printing and to Miss Carmela Ciampa, Miss Joan Boyle, Mr. Robert Burns and Mr. Paul Donaldson, all of whom assisted in the preparation of the manuscript.
 
 

Synopsis

The present study was undertaken to obtain solutions to a dynamic economic model proposed by Leontief, and to develop methods of computation suitable to the Mark IV Computer.

The economic model as presented in Chapter 1 is described by a system of linear differential equations with constant coefficients. The data for six systems of order 5, 6, 10, 11, 20 and 21 were supplied by the Bureau of Labor Statistics and the Harvard Economic Research Project.

The general solution for each system was sought. In Chapter 3 the complementary function is shown to be expressible in terms of the latent roots and principal vectors of a certain matrix related to the coefficient matrices of the given system. An expression is also given for the particular integrals appropriate to a set of demand functions of interest in economic theory.

The so-called Frame method, described in Chapter 4, was chosen for obtaining the latent roots and principal vectors. The Frame process generates the characteristic equation of the matrix and the latent roots of the matrix are then obtained as the roots of the characteristic equation. Methods of solution of polynomial equations of higher degree are therefore discussed in Chapter 5.

In Chapter 5 a modification of the quadratic factor method is described which has the advantage of greater uniformity for machine computation. Extensions to a higher order process and to the extraction of quartic factors are considered and a new method for the solution of the quartic is described. A machine method based on a combination of the Graeffe process and the modified quadratic factor method is also proposed. The method is very general in its application and offers several advantages, the most important among them being:

1.   No preliminary scaling of coefficients or bounds on the roots are required;
2.   The number of multiplications required for the evaluation of all the roots of a polynomial of degree n is of the order n2.

The above method has been used for the solution of several equations of degrees up to twenty.

Machine programs for the matrix operations required were organized around a set of matrix subroutines described in Chapter 6. These subroutines are programmed for systems of a general order and are believed to be sufficiently flexible and easy to apply to be of value in general use. They have already been profitably used by another member of the Staff of the Computation Laboratory in an unrelated program.

In addition to the standard matrix operations programmed, a new machine method of inversion is described. For machine computation, it has the advantages of uniformity, low storage requirements, and low round-off error. Several possible extensions to complex matrices are considered. A number of special machine techniques for vector algebra are suggested; among them a process called “floating vector operation” which offers most of the advantages of floating decimal point operations at a fraction of the computing time required by the latter. This process was applied in the calculation of the principal vectors. The requisite details of Mark IV programming are considered in Chapter 2.

Particular solutions were obtained for the six systems described. The results appear in the appendices and are described in Chapter 7. The complete complementary function was obtained for the systems of order 5, 6, 10 and 11. Because of round-off error accumulation only a partial solution was obtained for the system of order 20 and a complete solution would necessitate rerunning with greater accuracy. In view of the solutions of the similar lower order systems already obtained, the computation required is not warranted by the economic significance of the missing terms.
 
 

Chapter 1   A Dynamic Economic Model

Mathematical models have long been used with striking success in the physical sciences and hence attempts have been made to apply them to such diverse fields as economics, biology, and psychology. Early attempts at the application of mathematical models to the field of economics were hampered by (1) the lack of empirical data and (2) the lack of mechanical devices to carry out the extensive calculations required. As a result, economic models were either so over-simplified that they no longer adequately represented real phenomena, or else so complex as to render impossible the calculation of numerical results based upon them.

This situation has been radically changed by two factors: (1) the vast amount of statistical data now collected and made available by government and other agencies; and (2) the advent of the large scale automatic calculator which makes feasible the construction of models of much greater complexity than heretofore.

The present study was undertaken to obtain general solutions of a dynamic model proposed by Leontief. This is an extension of his earlier static input-output model. Both models are fully discussed in references 1 and 2 and only a brief outline will be attempted here.

The variables of the static input-output model are the outputs of the various “sectors” or “industries” which make up the economy. It is assumed that each sector of the economy will require inputs from some or all of the other sectors, and that a state of equilibrium exists. That is, if x1,x2,...,xn are the outputs of the several sectors, Xij is the input from the ith sector to the jth sector, and zi is the “final demand” or amount of goods to be made available for use outside the system considered, then

  xi Xij = zi .

The inputs to each sector are assumed[a] to be linear functions of the output of that sector, i.e. Xij = aijxj. The “flow” coefficients aij are assumed to be constants so that the equilibrium condition may now be stated as a set of n linear equations in n unknowns,

  xi aijxj = zi ,     i = 1,2,…,n.                             (1.1)

For a given final demand zi, the required outputs xi of each sector may now be obtained as a solution of (1.1).

The division of the economy into sectors is somewhat arbitrary. In general, the greater the number of sectors considered, the more detailed is the information available in the solution. Since, however, the amount of calculation required varies approximately as the cube of the number of sectors considered, a compromise must be made. Solutions3  have been obtained for systems of various orders, the largest being a system of 192 sectors.

Equations (1.1) describe a static model since no dependence on time is implied. Dependence on time may be introduced by making either the coefficient aij or the demand function zi (or both) functions of time. The simple linear model may also be extended in other ways; e.g. by including terms of higher degree in the variables or by adding terms in the derivatives dxi/dt. For reasons discussed in reference 2 the following dynamic model is chosen[b]:

  xi aijxj bij
dxj
dt
= zi ;     i = 1,2,…,n.                   (1.2)

The flow coefficients aij and the capital coefficients bij are assumed to be constants; the outputs xi and the final demands zi are explicit functions of time.

The system of first-order linear differential equations represented by (1.2) may be solved in either of two ways:

(1)   Given a set of initial conditions xi(0) = xi0, a particular solution may be tabulated for successive values of t by methods of numerical integration.
(2) A general solution may be obtained containing in general n arbitrary parameters which may be chosen so as to satisfy any[c] desired initial conditions.

The general solution is more difficult to obtain; however, it is of much greater value in studying the behavior of the system than is the particular solution. In the first place, the general solution is expressible as a sum of exponential functions. As in the analogous case of a coupled mechanical system [also described by (1.2)] a knowledge of the so-called “normal modes” or terms of the general solution provides greater insight into the behavior of the system than any tabulated solution, however extensive. Secondly, the particular solution corresponding to any given initial conditions may be obtained fairly readily from the general solution by evaluating the arbitrary parameters.

A number of difficulties may arise in connection with the general solution. They may be classified either as mathematical difficulties or as difficulties of interpretation of the model. The mathematical difficulties are problems arising in the actual solution of the system of equations represented by (1.2). They are given consideration in the following chapters. The difficulties of interpretation arise if in certain cases the model does not adequately represent the economic phenomena it purports to describe. These matters will not be discussed here, and again the reader is referred to reference 2.

Presumably a final demand can be considered as the input to an additional sector of the economy and by including this sector in the system the equation (1.2) become homogeneous. Such a system is referred to as “closed”. In order to separate the less stable[d] elements of the economy from the more stable, it is convenient to remove the sector classified as “households”from the system and treat it as a final demand on the remaining system. Such a system will be termed an “open” system.

The demand function may in general be any function of time, but the functions of most interest are of the form of exponentials, i.e. zi = gieμt where the gi are constants. Solutions for demand functions of this kind are discussed in section 3F.

Flow coefficients for a system of 192 sectors were provided by the Bureau of Labor Statistics and formed the basis for the present calculations. Capital coefficients were supplied by the Harvard Economic Research Project. By a processor of “aggregation” described in Chapter IX of reference 2, the coefficients for several systems of different orders were computed from the original data. The orders of the systems chosen were 6, 11 and 21. Comparison of the solutions for the various systems may be expected to give some indication of the distortion introduced by aggregation. Three “open” systems of order 5, 10 and 20 were derived from the systems of order 6, 11 and 21 by suppressing the “households” section.

General solutions for the six systems were obtained using the Harvard Mark IV Calculator. Particular integrals (see Section 3F) were also computed for exponential functions eμt for

  μ = 0, 0.015, 0.020, 0.025, 0.030, 0.035.

At the time this study was undertaken, the Mark IV Calculator had just been completed. The need for solving the extensive economic systems mentioned above therefore provided an excellent opportunity for an investigation of general matrix methods especially suitable for application to Mark IV.

Notes

a.  The various assumptions made are justifiable only on the basis of economic theory and will not be discussed here. They are given careful consideration in the references cited.
b.The primary motive for including derivative terms is that the capital requirements of a sector may be expected to be determined by the rate at which the output is changing, rather than directly by the output. Capital requirements are taken to include stocks on hand as well as the means of production.
c.Certain restrictions must be placed on the initial conditions if the matrix of capital coefficents [bij] is singular. See Section 3D.
d.A sector may be considered unstable if it is not linearly related to the remaining sectors.

 

Chapter 2   Problem Preparation for an Automatic Computer

Section A of this chapter will be devoted to a consideration of the general aspects of problem preparation for an automatic computer as a preliminary to the discussion in Chapters 4 and 5 where possible methods for the solution of our economic model are analyzed and compared on the basis of their suitability for machine computation. Section B will be devoted to programming details of the Harvard Mark IV Calculator as a preliminary to the discussion of Chapters 6 where the programming of the selected methods for the Mark IV Calculator is discussed. Reading of Section B may well be deferred until after Chapter 5.

A. General Aspects of Problem Preparation[a]

(1) Introduction

The solution of virtually any problem of applied mathematics may be reduced to a sequence of elementary operations involving only addition, subtraction, multiplication, division, and reference to previously computed quantities. For this reason it is clear that a truly general purpose calculator may be based on the following components:

1.  An arithmetic unit capable of carrying out the operations of addition, subtraction, multiplication, or division on any pair of numbers.
2.An input device for introducing numbers into the calculator. Among the various devices available are punched cards, paper tape, magnetic wire, and magnetic tape.
3.A storage unit for storing intermediate results until they are needed in further computations. The storage unit is made up of a number of individual “registers” each capable of storing a single number. Each register in the storage unit is specified by a number or “address” assigned to it.
4.An output device for recording in suitable form the results obtained by the computer. A typewriter or similar printing device is usually provided. Numbers may also be recorded on the same media used for input. This allows for unlimited “external storage” of intermediate results which may be read back into the machine at any time via the input device.
5.A number transfer bus for transferring numbers among the various units mentioned above.

The calculator may be made automatic by the addition of a “sequence unit” which determines the sequence of operations to be carried out. The sequence unit operates by reading sequentially a set of instructions presented to it, and carrying out the prescribed operations by determining which numbers are to be transferred, read, or recorded at each step. The sequence instructions or “program” must be coded and recorded in a form suitable to the particular machine.

Since each step must be programmed it is clear that little or no labor is saved by the automatic computer unless the operations are made repetitive. Such repetition is made possible by a special instruction referred to as a “call” which interrupts the sequence of instructions being read and cause the sequence unit to begin reading instructions from some point in the program specified by the call. This provides the possibility of a program consisting of a single closed loop which could, for example, be used in the tabulation of a given function as follows: The initial value of the argument is stored in a certain “argument” register and the prescribed function of the number in this register is computed and recorded. The number in the argument register is then incremented and a call is made to the point in the program immediately following the introduction of the initial argument.

The computer is made much more versatile by providing another instruction termed “conditional call” which effects a call if and only if the number in a special “conditional” register is negative. This allows a choice of programs to be made depending on results obtained in the course of the calculation. For example, a series may be automatically truncated at the first non-significant term as follows: Suppose that a certain section of the program is terminated with a call so as to form a closed loop which repeats indefinitely, and on each iteration adds a term of the series. Let the call be replaced by a conditional call and immediately preceding it let the number |T| - R be read into the condition register. If T is the last term added and R is a certain tolerance, the conditional call will fail when |T| becomes less than the tolerance and the computer will proceed to the next section of the program.

The conditional call is used frequently in programming and, in effect, provides the possibility of making any logical decision necessary in practice. The pattern of the calls and conditional calls occurring in the program divides it into groups of operations called “routines”.

(2) Choice of Numerical Method

In general, several alternative numerical methods will be available for any given problem; and the first step in the preparation of the problem will be the choice of the most suitable method. This choice will require consideration of the routines necessary to carry out each possible method. The factors which enter into this consideration will now be discussed.

The possible error present in computed results must be known if the results are to be of value. Loss of accuracy will be due to two types of errors; those due to certain approximations, and those due to “round-off”.

The first type of error occurs when a certain function is approximated by some more easily computed function as, for example, in the truncation of an infinite series. When such an approximation is used, an estimate of the “truncation error” so committed must be available.

Because a computer is limited in the number of digits carried, each number is correct only to a certain number of decimal places. Errors due to this effect are called round-off errors. The maximum round-off error introduced at each step is known by the way in which errors from any source propagate depends on the subsequent operations. The total error accumulated is therefore difficult to estimate and depends on the particular method employed. Upper bounds to the possible error can usually be obtained but these bounds are often grossly pessimistic and probable error bounds may have to be considered.

Errors of a completely different nature are those introduced by a failure of some part of the computer or by a flaw in the program. Such errors are usually called blunders. They must either be prevented or corrected before the computation is allowed to proceed. For this purpose checks must be included in the program and provision made to rerun the program from some previous point when a blunder occurs. Checks are usually arranged to stop the computer if two quantities which are supposed to agree (because of some mathematical identity) to within a certain tolerance do not agree. Rather than make the machine stop due to an intermittent failure, a rerun may be programmed to occur automatically for a limited number of times before the machine stops.

Although a clear-out distinction is not always possible it is useful to classify checks as current or final. The purpose of a final check is to insure that the final results recorded are correct. Current checks are provided at frequent intervals, usually at the end of each routine, with the double purpose of reducing the amount of time lost in further computation after an error occurs and of aiding in the detection of a source of error by localizing the trouble to a certain section of the program. Checks should be designed to detect not only machine errors but also any errors in programming. Program errors are by far the most difficult to detect and considerable ingenuity may be required to devise suitable checks.

Of two possible methods of solution, the one more general in application is also to be the more complex and the more time-consuming. Other things being equal, the use of a more general method is to be preferred because of the possibility of using the same program in later problems where the greater generality may be utilized. In some cases, the generality required cannot be completely determined in advance. For example, in Chapter 4 the so-called power method is seen to fail when certain types of multiplicities occur in the roots of the matrix.

The amount of computation required is clearly an important factor although it becomes relatively less important as speed and availability of automatic computers [increase]. The time required for a given process may be evaluated roughly by counting the number of multiplications required. This gives a useful measure for comparison. The amount of number storage required by a given method must also be given consideration.

The complexity of the program determines the amount of preparation time required of the mathematician, and to a certain extent, the difficulty of running the problem on the machine. When a program is first placed in operation it must be carefully “checked out” to remove any possible programming errors and the more complex the program, the more time this process is likely to take. The probable check-out time should therefore be taken into account in estimating the over-all machine time required. The complexity of the program also determines the amount of storage required for sequencing instructions; an important factor in the use of machines with limited storage capacity.

The complexity of the program required for a given process depends to a surprising degree on the lack of uniformity of the process. That is to say, an otherwise simple program may become very complex if exceptions must be made for certain special cases. The desirability of uniformity in machine methods is stressed therefore in the discussion of programs in Chapter 6.

In summary the major considerations entering into the assessment of a given method are the following:

1.  total error accumulation;
2.checks available and ease of reruns;
3.amount of computation required;
4.generality of the method;
5.complexity of the program; and
6.number storage requirements.

(3) High Accuracy Operation

The digits of a number are stored in successive “columns” of a storage register. In most automatic calculators the decimal point is fixed between a given pair of columns, and the numbers introduced must be suitably scaled so that none of the numbers encountered in the course of the calculation will exceed the capacity of the storage registers. In some calculators such as the Mark IV, the decimal point is fixed during the running of a problem but its location may be chosen at will by the operator. In this case the location of the decimal point must be chosen to prevent the occurrence of over-capacity numbers. In general, the decimal point should be located as far to the left as is consistent with this requirement since the round-off error is thereby reduced.

If very high accuracy is required or if the accumulation round-off error is very severe, the storage capacity of a single register may not be adequate. In this case each number may be stored in two registers; the high order part of the number in one register, and the low order part in the second. This procedure is know as “double accuracy” or “high accuracy” operation. Provisions are made for the addition and multiplication of such “high accuracy” numbers by suitable programming.

It may happen that the number of significant digits required in each number does not exceed the capacity of the registers but that the magnitude of different numbers varies so widely that they cannot be stored at the same decimal point. In this case it is usual to resort to the so-called “floating point” operation in which each number z is represented by two numbers y and p in the semi-logarithmic form z = y × 10p. The number y is “normalized”, i.e. it is stored with the first non-zero digit in the highest order column of the register. Some calculators[b]  operate directly with numbers of this type and reserve certain columns of each register for representation of the exponent p. In the use of a fixed point machine the corresponding operations must be programmed.

(4) Simplification of Programming

The speed of modern computers has reached a point where, for many applications, the time required to program a problem may be a more important factor than the actual computing time. Attempts are made, therefore, to simplify the work of programming in various ways.

The first step may be made by the programmer in seeking out repetitive operations within the program. Such a repetitive operation need be programmed but once together with instructions to repeat the operation the desired number of times. Similarly an operation which recurs at several different points in a program may be programmed once as a “subroutine” to be called whenever required. Such a subroutine may be arranged to call back automatically to the point in the main program from which it has itself been called.

It frequently happens that an operation is repetitive except for a simple permutation of the registers involved. For example, the summation of the numbers in the successive registers 1-50 is performed by adding successively the registers 1,2,…50 to the previous sum. The only difference in each of the fifty operations is the address of the particular register to be added. Modern machines allow the programmer to make such operations repetitive by making provision for modifying instructions. In the previous example it is only necessary to add one to that part of the instruction specifying the address of the new register to be added each time.

In most machines the sequence instructions and numbers are stored in the same set of registers and instructions may therefore be modified in the arithmetic unit. In the Harvard machines the number storage and sequence storage are completely separate and the direct modification of instructions is not possible. Equivalent operation is obtained by the use of certain control registers as follows: Normally each instruction specifies directly the address of the storage register concerned. In Mark IV, a special order is provided which specifies storage address “ij”, where ij is the number stored in a certain control register called the IJ register.[c] Thus, without changing the actual order stored in the sequence storage, the operation effected by it may be modified by changing the number resident in the IJ register.

The separation of number storage and sequence storage offers important advantages in making reruns. To make a rerun from any given point in a program all the relevant storage must be restored to the conditions which obtained at that point. For the number storage this may usually be done by so planning the program that the numbers entered into the calculation are still available in the same registers at the time the check is made. Although the number of orders modified in a certain routine may be rather large and the pattern of the modifications rather complex (due perhaps to repetitive loops within the routine), yet the modifications may usually be expressed as simple functions of a very limited number of parameters or “control numbers”. In Mark IV the desired functions of these control numbers are computed and introduced into the IJ register to produce the required modifications. Since the sequence instructions are themselves unchanged, the program may be restored to any point by restoring the new control numbers to their correct values. The rerun procedure is therefore relatively simple and may easily be made to occur automatically.

On the other hand, the restoration of the sequence instructions in a machine which modifies them directly is in general so difficult that the simplest procedure is to read the original instructions again from the input device. Automatic reruns become virtually impossible. It should also be noted that easy reruns greatly simplify the process of checking out a program.

A disadvantage of the separation of number and sequence storage shows up in the case of problems requiring a large amount of number storage and little sequence storage (or vice-versa). In this case the total storage capacity of the machine cannot be utilized. If, however, the storage device used is relatively cheap, e.g. a magnetic drum, the slight increase in total storage capacity required by the separation of numbers and instructions is amply justified by the advantages described above.

The algebraic coding machine of Mark IV5  is one approach to the further simplification of programming. Sequencing instructions in essentially algebraic form may be introduced on the keyboard. These instructions are automatically translated into the more complicated program instructions required to sequence the computer and are recorded on tape for later use by the computer. Only the most elementary of the algebraic operations are available by high accuracy and complex operations may also be coded directly. In addition, the elementary functions √x, tan-1x, cos x, log10x, and 10x, may be coded directly; e.g. the operation cos x0 ⇒ c0 computes the cosine of the number in register x0 and stores it in register c0.

In the simpler operations described above (namely, those requiring less than twelve instructions for their completion) the coding machine records the required sequence instructions directly. In the more complicated operations, e.g. cos x, the coding machine provides only the instructions which are necessary to call a “permanent” subroutine6  which is normally recorded on the sequence drum of the computer at all times.

The use of the algebraic coding machine can materially reduce problem preparation time. It also enables a mathematician to use the computer without learning its arbitrary sequence codes.

Two minor disadvantages are (1) the coding machine, generally speaking, does not program as efficienctly as is otherwise possible; and (2) in checking out a program and locating errors effectively, the detailed coding must be understood. Even so a mathematician may, with the help of a competent operator, check out a program satisfactorily without knowing the detailed coding.

The use of permanent subroutines for elementary functions might well be extended to the use of an extensive library of subroutines. Conceivably such a library could be made to include all the operations likely to be encountered in the majority of problems and programming would then be reduced to the preparation of a relatively simple main program to piece together the required subroutines.7 The value of a particular subroutine depends on both the ease and the frequency of its use. On this basis the subroutines for the elementary functions are in general the most valuable and subroutines for linear operations probably take second place. A set of subroutines for matrix operations on Mark IV are discussed in Chapter 6.

B. Mark IV Programming[d]

This section will be devoted to those details of Mark IV programming[e] which are necessary to an understanding of the routines discussed in Chapter 6. Programs for Mark IV may be prepared either on the algebraic coding machine mentioned in section A or on a “decimal” coding machine, with which the elementary sequence codes are recorded. Since the use of the decimal coding machine leads to a slightly more efficient program, it was chosen for the preparation of the subroutines of Chapter 6. For this reason only the decimal coding will be considered in this section.

Orders

The units of Mark IV are divided into two groups called β and γ. Under control of a single order a number from any β unit may be transferred to any γ unit or vice-versa. Direct transfers between two units of the same group are not possible. An order is made up of three parts called the α, β, and γ codes, which determine the operation to be performed as follows:

α-code (2 digits):  Determines the direction of transfer (β→γ or γ→β) and the sign with which the transfer takes place (+, –, +| | or –| |).
β-code (4 digits):  Determine the β unit involved in the transfer.
γ-code (2 digits):  determine the γ unit involved in the transfer.

A complete list of the codes for Mark IV appear at the end of this section. The calculator has a storage capacity of 10,000 orders, each of which is identified by a “line-number” in the range 0000-9999.

The function of certain β codes differs depending on whether a read-in or read-out of the register is involved (i.e., depending on the α-code with which it is combined). Hence it is sometimes necessary to specify the function of a β-code both for read-in and for read-out. In the coding list this is done by writing “in” or “out” after the appropriate function. Similar remarks apply to the γ-codes.

Example. The coding list gives for γ code 21: Multiplier in and multiply; low order product out. From this one may deduce the meanings of the following orders:

00 0019 21   Read the number stored in β register 0019 into the multiply unit and multiply.
10 0019 21 Read low order product out of the multiply unit into β register 0019.

If neither “in” nor “out” is specified, the function of the code is the same for either, except for direction of transfer.

Registers

Most of the registers of Mark IV have a capacity of sixteen decimal digits and sign. Such registers are referred to as “regular” in contrast with other “short” registers which do not have a full sixteen columns. All short registers are so arranged that their first (low order) column corresponds to the first column of the regular registers. Read-out from any short register to a regular register places zeros in the unoccupied columns.

The γ units consist of the γ transfer register and the arithmetic unit (multiplier, divider, and two accumulators). The β units include two hundred fast storage registers and other special registers. The content of any register is not destroyed on read-out.

Fast Storage Registers

The two hundred fast storage registers are referred to by the numbers of the β-codes controlling them and are numbered from 0000 to 0199. It is often convenient to designate them by a letter and subscript, the letter determining the second and third digits as follows:

abcd uvwx yz   a'b'c'd' u'v'w'x' y'z'
00010203 04050607 0809   10111213 14151617 1819

For example, b3 ≡ 0013 and d'8 ≡ 0138.

Function Registers

The function registers are ten in number (β0220-0229) and are designated as f0-9. In operation they are identical with the fast storage registers but are normally reserved for use by the permanent subroutines.6 

Slow Storage

Provision is made for the storage of four thousand numbers in “slow storage” positions numbered 0000-3999. These numbers are not immediately available but may be transferred to or from fast storage in blocks of ten numbers at a time. Such transfers are made to or from two groups of ten special β registers called S and S' and numbered from S0 ≡ 0200 to S'9 ≡ 0219. The two groups S and S' are identical in function, but the provision of two groups makes it possible to use one for computation while the other is engaged in a transfer. Transfers may be made in either direction between any block of ten consecutive slow storage positions and either the S or S' group of registers.

The slow storage control register (SSCR) is a four-column β register capable of storing a number in the range 0000-3999. The number storage in the SSCR determines the slow storage position of the first number of the block of ten to be transferred. For example, if the number 0328 stands in the γ transfer register (γ00) the effects of the following orders are:

10 1201 00   The number 0328 is read into the SSCR and transfer is made from slow storage positions 0328-0337 into registers S0-9.
10 1211 00   The number 0328 is read into the SSCR and transfer is made from registers S0-9 into slow storage positions 0328-0337.

β codes 1221 and 1231 produce similar results but with the S' registers replacing the S registers.

The slow storage control register operates modulo 4,000. Thus the four thousand slow storage positions form a closed ring and may be run through in sequence as many times as desired by simply augmenting the number in the SSCR.

The SSCR has ordinary read-in and read-out (without initiating a slow-fast transfer) under β code 3200. The S and S' registers have regular read-in and read-out and may be used as ordinary fast storage registers.

IJ Register

The IJ register is a three-column β register which can store numbers from 0 to 399. The first (low order) column is referred to separately as the J register, since read-in or read-out may be made either to the whole IJ register, (β3003) or to the J register alone (β3002). The number standing in the IJ register is referred to as ij. The IJ register controls or modifies the effects of certain β codes as follows: (for this example assume ij = 124)

β-code   β-register Selected     β-code   β-register Selected  
3000(ij)       124   1200(Sj)       204
1000(aj)     004 1210(S'j)     214
1010(bj)     014 1220(fj)     224
    … 2000(i0)     120
1190(z'j)     194 2001(i1)     121
      …
  2009(i9)     129

Shift and Normalize

The shift register may be used to shift a number either to the right or to the left by any number of places determined by the number in the shift control register (SCR). The SCR will store any number in the range ±19, a positive number producing a shift to the left, and a negative number producing a shift to the right. The SCR has ordinary read-out and read-in under β code 0302. The shift register has two types of read-out and read-in as follows:

Name of Order   β-code   Operation  
Shift in   0300 Read into the shift register.
Non-shift out   0300 Read out of the shift register without shift.
Normalize in   0301 Read into the shift register and place in the shift control register the amount of shift required to normalize the number.
Shift out   0301 Read out of the shift register with a shift determined by the number in the shift control register.

Under the first two orders the shift register behaves as an ordinary fast storage register and is often used in this way as a “β-transfer register”; i.e. as intermediary in a transfer between two γ registers.

A number is normalized by reading it into the shift register under β code 0301 and out under β code 0301. The amount of shift required for normalization appears automatically in the shift control register. After placing the desired amount of shift in the shift control register a number is shifted by reading it into the shift register under β code 0300 and out under β code 0301. Note that a shift occurs on read-out only so that the original number stands in the shift register regardless of which read-in or read-out is used, and may be read out unshifted under β code 0300 in either case.

Decimal Point Switch

The decimal point switch is a β register storing a number in the range 0-16. This number determines the operating decimal point of the machine and represents the number of columns to the right of the decimal point. The decimal point switch is set manually and read-out only is possible (β3333).

Sign Register

The sign register is manually set to store a positive or negative zero and is used to determine whether the decimal point is in the high or the low order part of a number in double accuracy operation. It is also a convenient source of zeros. Read-out only is possible (β3332).

Read-in Switch

The read-in switch is a manually-operated regular register from which it is possible to read out (α07) to either a β or a γ register, or to both simultaneously.

Call and Conditional Call

The line number register (LNR) is a four-column β register whose content determines the line number of the order to be performed. The line number register is automatically augmented by one as each order is performed so that orders are normally carried out in the sequence determined by their line numbers. The LNR has regular read-in (β3010) which is used for calling. If, for example, the number 3255 stands in the γ transfer register then line 3255 may be called by the following order: 10 3010 00.

Because the sequence orders are stored on a rotating drum and may be read out only once per drum revolutions (25 cycles) a call will introduce a wait until the called order passes under the reading head. The wait is such that any repetitive program loop consisting of t lines will take r drum revolutions if

  25(r–1) ≤ t+1 < 25r.

Note that the minimum time for a short repetitive loop is one drum revolution and that a loop of twenty-four lines will take twice as long as a loop of twenty-three lines. Waits due to calls may be greatly reduced by careful programming.

The condition register stores only the sign of a number read into it and is controlled by β code 3012. A conditional call (β3011) performs the same operation as a call if the condition register is negative, and does nothing if it is positive. Note that in the machine x - x = -0.

An order to read out of the line number register (β3010) reads out the line number of that same order.

γ Transfer Register and External Transfer

The γ transfer register is a fast storage register provided in the γ group and may be used as intermediary in any transfer between two β registers. This register also has a special function in the introduction of numbers in the course of a problem. This operation is called external transfer and may be performed in either of two ways:

1.   External transfer with reset (XT1). Under α code 04 any desired positive four-digit number may be introduced into the low order columns of the γ transfer register, the remaining column becoming zero.
2. External transfer with shift (XT2). Under α code 05 any desired positive four-digit number may be introduced into the low order columns of the γ transfer register, the number previously contained being shifted up four places.

In this way a complete positive sixteen-digit number may be introduced by four orders, the first with α code 04 followed by three with &alpha code 05.

Addition

Addition is performed in either of two accumulators which operate independently except that they may be used together for double accuracy operations. Accumulator 1 is controlled by γ code 10 and 11 and two different read-ins are possible. Code 10 is a read-in which destroys the number previously stored in the accumulator. Read-out in either case is under code 11. The similar operation of accumulator 2 is described in the coding list.

Example. The following orders add the contents of β registers 0019 and 0134 and place the result in β register 0022.

 
α    β   γ  
00     0019     10
00 0134   11
10 0022   11

Note that any number subtracted from itself yields a negative zero.

High Accuracy Addition

Using the symbols HO and LO to represent the registers containing respectively the high order and low order parts of a given number, high accuracy addition may be described as follows. The first number to be added is read in with reset, thus:

 
00   LO   10
00   HO   15 .

As many additional numbers as desired may now be added by providing for each of them the two instructions

 
00   LO   11
00   HO   17 .

Finally, the sum may be read out by the following orders:

 
10   LO   11
10   HO   17 .

Multiplication

The multiplicand (MC) must be read in first under γ code 20 and remains in the MC register. The multiplier (MP) is read in on the next (or any later) line under γ code 21. The product may be read out on the eighth (or any later) line following the read-in of the MP and in the intervening seven lines any orders not involving the multiply unit may be performed.

Four different read-outs of the product are possible. γ codes 21 and 22 read the product out at the operating decimal point whereas γ codes 23 and 24 read out at decimal point 15. The high order (HO) product is the remainder of the 32-digit product formed by the multiply unit and is used in high accuracy operations.

The MP is destroyed in the course of multiplication but the MC remains and may be read out at any time except when a multiplication is in progress. Multiplication by a constant multiplicand may be effected by reading the MC in once followed by the successive multiplier.

Division

The divisor must be read in first, followed by the dividend within fifteen cycles. The quotient may be read out on the twentieth line following read-in of the dividend and in the intervening nineteen lines any orders not involving the divide unit may be performed. The divisor and dividend are both destroyed in the process of division and may not be read out. Read-out under γ code 34 is not possible following a read-out under γ code 32.

Vacuous Codes

The vacuous order 17 3355 77 performs no operation and is used to provide any desired delay, as for example before read-out of a product or quotient when it is impossible to interpose enough useful orders. The vacuous codes 3355 and 77 may also be used separately whenever a β (or γ) register is not involved in the particular order.

Coded Stops

Two stop codes are provided: β1312 stops the machine on a positive transfer and β1313 on a negative transfer. The two stops are provided to give more reliable operation on a coded check.

Example. Suppose that two computed results stored in a0 and a1 are required to agree to within a certain tolerance stored in a2. The following routine is used to stop the machine if the agreement is not as required:

 
Line   α     β   γ  
  1 00  0000  10
  2 01  0001  11
  3 10  0300  11
  4 03  0300  10
  5 10  1312  11
  6 00  0002  11
  7 10  1313  11

Lines 1-4 place -|a0 - a1| in accumulator 1. Line 6 adds the tolerance, and so makes the accumulator 1 positive if the agreement is as required. Line 7 then stops the machine if the accumulator is negative. The reason for including line 5 is that the accumulator may be defective and standing in a permanently positive condition, thus passing the check regardless of the value of |a0 - a1|. Lines 5 and 7 together show that the accumulator is negative at one point and positive at another thus indicating it is operative.

Input-Output

All data are read to and from the machine by means of magnetic tape units which perform the following functions:

1.   Read in orders to be recorded in the machine before staring a problem.
2. Read numbers from tape under control of the sequence orders.
3. Record numbers on tape from the machine.

β codes 2301-11 select tape units 1-11 and with a β-in code will record on the appropriate tape the contents of any γ register desired. Read from tape is accomplished by the same β codes combined with a β-out code, but the number is read always to the read registers, numbers 1 and 2. The read registers are β registers with normal read-out and read-in under β codes 2321 and 2322.

It is also possible to select “tape unit k”, where k is the number (1-11) stored in the tape control (K) register. The functions are the same as described above. The K register stores numbers from 0 to 19 and has normal read-in and read-out under β code 2320. Read to and from the record register without recording is possible under β code 2323.

Example   (1) To record on tape 5 the number in accumulator 1:   10 2305 11;
  (2) To read from tape 10 into accumulator 1:   00 2310 77
    00 2321 10 .

Printing

If final output is to be in printed form, the recorded tapes must be used to control a printer. To accomplish this a “serial” number must be recorded before each number to be printed. This serial number serves to identify the number and to control the printer. The various columns (number right to left) of the serial number are used as follows:

Column      Function
10 Digit 0-9 determines the number of carriage returns.
9 Digit 0-9 determines the number of tabs.
8 Digit 0-9 determines the number of spaces.
7 Selects the style of printing. (Any one of ten styles which are preset on a plugboard.)
1-5 These columns form a five-digit number used to identify the number to be printed.
Sign The sign column contains a special “serial” code which distinguishes a serial number from a functional number. This is provided by α code 14; i.e., every serial number is read to the tape unit under α code 14.

Before printing the printer compares the first column of the following serial number with the preceding one. If the digit has increased by one printing proceeds, but if not the printer passes on to the next number without printing. Thus if any number has be rerecorded (due perhaps to a rerun) only the last recorded value is printed. All numbers are recorded in duplicate on the tape; the printer compares the two recorded values and stops if they do not agree.

Waits

Certain units which require several cycles to operate are provided with automatic waits to delay any order involving them until the previous operation is completed. This is not true of the multiplier and divider, where the requisite number of orders must be interposed between read-in and read-out.

Even where automatic waits are provided, machine time may be saved by interposing other orders between successive uses of any one of the above-mentioned units. The average time required for certain operations is given below in terms of machine cycles. A machine cycle is 1.2 milliseconds, the time required for a single order.

  Operation   Time (machine cycles)
Transfer ten numbers between slow storage and the S or S' registers.   37 (average)
Read one number from tape.   22 (average)
Record one number on tape.   22 (average)
Multiply (including read-in and read-out).   10 (7 lines interposed)
Divide (including read-in and read-out).   22 (19 lines interposed)
Call.   No. of cycles skipped — maximum 26, minimum 2
α-Codes
Code Function or Register
00 β → γ +
01 β → γ –
02 β → γ + | |
03 β → γ – | |
04 XT1 (with reset)
05 XT2 (with shift)
07 Read-in switch
10 β ← γ +
11 β ← γ –
12 β ← γ + | |
13 β ← γ – | |
14 β ← γ with serial identification
17 Vacuous code
 
β-Codes
Code Function or Register
0000-0199     a0-z'9
0200-0209 S0-9
0210-0219 S'0-9
0220-0229 f0-9
0300 Shift in; non-shift out
0301 Normalize in; shift out
0302 Shift control register
1000 aj
1010 bj
1190 z'j
1200 Sj
1201 SSCR in and S in from slow storage
1210 S'j
1211 SSCR in and S out to slow storage
1220 fj
1221 SSCR in and S' in from slow storage
1231 SSCR in and S' out to slow storage
1312 Stop on positive trfr.
1313 Stop on negative trfr.
2000-2009 i0-9
2300 Unit k        ] To record reg. and record;
2301-11 Units 1-11 ] read from tape
2320 Tape control (K) register
2321 Read register #1
2322 Read register #2
2323 Record register
3000 ij
3002 J
3003 IJ
3010 Call and call reg. in; LN out
3011 Cond. call and call reg. in conditionally
3012 Condition register
3200 Slow storage control (SSCR)
3332 Sign register out
3333 Decimal point out
3355 Vacuous code
 
γ-Codes
Code Function or Register
00 γ transfer register
10 Acc. 1 in
11 Acc. 1 in and add; out
15 Acc. 2 in
16 Acc. 2 in and add; out
17 Acc. 2 in and HA add; HA out
20 MC
21 MP in and multiply; LO product out
22 HO product out
23 LO product (DP 15) out
24 HO product (DP 15) out
30 Divisor (DR) in
31 Dividend (DD) in and divide
32 Quotient (Q) out
34 Quotient (Q) out (DP 15)
77 Vacuous code

Notes

a.   For an excellent and more detailed account of problem preparation the reader is referred to Chapter X of reference 4.
b.  The Harvard Mark II Calculator4  is an example of such a machine.
c. The IJ register is discussed further in section 2B, and examples of its use will be found in Chapter 6.
d. Parts of this section have appeared in Progress Report No. 23 of the Harvard Computation Laboratory.
e. For full details, see reference 5.

 

Chapter 3   Linear Differential Equations

A. Introduction

The general solution of a system of ordinary linear differential equations is well known, as are certain computational methods for evaluating the parameters involved. In this chapter the theory is developed in a manner designed to provide the necessary background for the discussion of computational methods in Chapter 4. In order to make the structure of the solution clear for the case of multiple roots, the whole discussion is based on the Jordan canonical form of the matrix of coefficients. The complementary function is expressed in terms of the system of principal vectors, and expressions for the principal vectors are derived in terms of the adjoint of the characteristic matrix. A knowledge of elementary matrix theory such as contained in the first three chapters of reference 8 will be assumed.

In matrix notation the most general form of a system of first order[a]  linear differential equations in n variables is

  Ax – B = z                                             (3.1)

where A and B are square matrices of order n and

  x = (x1,x2,…,xn),
  = (
dx1
dt
,
dx2
dt
,…,
dxn
dt
),
and
  z = (z1,z2,…,zn)

are vector functions of the independent variable t. The vector z is a given function and x is the vector to be determined. In the case of constant coefficients, to which this discussion is restricted, the elements of A and B are constants.

The general solution of Eq. (3.1) consists of the sum of (1) a particular integral of (3.1), and (2) the complementary function which is any solution involving n arbitrary constants, of the associated homogeneous equation

  Ax – B = 0.                                             (3.2)

If B is non-singular Eq. (3.2) may be reduced to

  Dx – = 0                                               (3.3)

where D = B-1A. The singular case is considered in Section D. The problem of the complementary function will be considered first and to this end certain results in the theory of matrices are first developed.

In this chapter the following conventions are adopted:

1.   The letter n refers exclusively to the order of the matrix under consideration. The order of a submatrix will be denoted by the letter n with a subscript.
 
2. Matrices are denoted by capital letters A, B, or A(λ), B(λ), etc., where the elements are functions of a parameter λ. A subscript n may be added to specify the order of the matrix.
 
3. Vectors are denoted by lower case Roman letters x, y, etc.
 
4. All equations are to be understood as matrix or vector relations unless otherwise specified.
 
5. The Kronecker delta is defined as:
  δij = 0 if i ≠ j,
 = 1 if i = j.
 
6. The symbols e1,e2,…en will be used to denote the so-called unit vectors defined by
  ei = (ei1,ei2,…ein)
where eij = δij.
 
7.

The unit matrix is denoted by I and the matrices Ik are defined by

  Ik = [δi,j-k],     k ≥ 0.

Thus Ik is a matrix whose elements are zero except for a diagonal of unit elements located k places above the main diagonal. For example if n = 3

  I0 = 1 0 0 ;       I1 = 0 1 0 ;       I2 = 0 0 1 .
 0 1 00 0 10 0 0
 0 0 10 0 00 0 0
If
 X = [Xpq] = IkIm,
then
 Xpq = δp,j-kδj,q-m = δp,q-(m+k) .
Therefore
 X = IkIm = Ik+m.
Similarly,
 ImIk = Ik+m.
Thus,
 I0 = I .                                 (3.4)
 Ikm = Imk = Ik+m
 Im = 0 if m ≥ n  
 
8.

Since differentiation is a linear operation it is easily extended to matrices and vectors. Thus if

  A(λ) = [aij(λ)]

is a matrix whose elements are functions of a scalar λ, the derivative dA/dλ is defined as the matrix

  dA/dλ = [daij/dλ].
 
9.

Two matrices A and B are said to be “similar” if there exists a non-singular matrix S such that

  B = S-1AS .
 
10.  

A matrix Jn(λ) of the form

  Jn(λ) = λI + I1                                         (3.5)

will be called a Jordan box of order n, e.g.,

  J3(λ) = λ 1 0 .
 0 λ 1
 0 0 λ
 
11.

A matrix J(λi,ni) made up of Jordan boxes arranged along the main diagonal with zeros elsewhere is said to be in Jordan canonical form and will be referred to briefly as a canonical matrix. Thus:

  Jni,ni) = Jn11)                         (3.6)
  Jn22)
   
     Jnmm)

where

  ni = n.

For example, if n1 = 3, n2 = 2, n3 = 1

  J6i,3;λ2,2;λ3,1) = λ1 1  0  0  0  0  .
 0  λ1 1 0  0  0 
 0  0 λ1 0  0  0 
 0  0  0  λ2 1 0 
 0  0  0  0 λ2 0 
 0  0  0  0  0  λ3
 
12.

The exponential function of a matrix is defined formally by the following series, convergent for any infinite matrix A:

  eA = I + A + 
A2
2!
 + 
A3
3!
 + … .

This function may be shown to have the usual properties of the exponential function (reference 9, p. 41). In particular

 
deAt
dt
= AeAt .                                       (3.7)

If A and B commute the following relation also holds:

  eAeB = eBeA = eA+B .

Consequently

  eAe-A = I .                                           (3.8)

For example:

  eJ4(λ)t = e(λI+I1)t = eλteI1t
   = eλt (I0 + tI1 +
t2
2!
I2 + 
t3
3!
I3). 

Therefore

  eJ4(λ)t = eλt 1  t 
t2
2!
t3
3!
.
   1  t 
t2
2!
    1 t 
     1 
 

The proof of the following theorem, omitted here, may be found on page 64 of reference 10.

 

Theorem: Every matrix D is similar to a matrix J in canonical form. Except for the order of occurrence of Jordan boxes, the canonical form is unique.

Referring to definition (9) this means that

  D = SJS-1                                           (3.9)

for some non-singular matrix S.

B. The Complementary Function

Equation (3.3) is repeated here for reference

  Dx – = 0                                               (3.3)

Substitution of Eq. (3.9) gives

  SJS-1 = 0.

Using the fact that SIS-1 = I and then premultiplying by S-1 we have

  JS-1x – S-1 = 0.

Let y = S-1x. Then = S-1 and finally Eq. (3.3) becomes

  J(λi,ni)y – = 0.                                          (3.10)

Substitution of the expression

  y = eJ(λi,ni)tc                                             (3.11)

in the left-hand side of Eq. (3.10) gives

  [ J(λi,ni)eJ(λi,ni)tc – eJ(λi,ni)t ] c.

The bracketed quantity is identically zero by virtue of (3.7) and therefore (3.11) is a solution of (3.10) for an arbitrary vector c. This is the general solution sought since it contains n arbitrary constants c1,c2,…,cn.

The expanded form of (3.11) is exhibited below for the particular matrix[b] J = J61,3;λ2,2;λ3,1) appearing in section 3A11:

  y = eλ1t     t2/2!        c1
              c2
              c3
         eλ2t  t   c4
           1  c5
               eλ3t c6

In terms of components:

  y1 = (c1 + c2t + c3
t2
2!
)eλ1t
  y2 = (c2 + c3t) eλ1t
  y3 = c3eλ1t
  y4 = (c4 + c5t) eλ2t
  y5 = c5eλ2t
  y6 = c6eλ3t .

The general solution of Eq. (3.3) is now given by

  x = Sy = SeJ(λi,ni)tc.                                       (3.12)

Since S is non-singular x also contains n arbitrary constants. Let be the vector of initial conditions; i.e. at

  t = 0,   x = .
 
Then
  = Sc.
Hence
  c = S-1.                                               (3.13)

C. Reduction to Canonical Form

The theorem cited at the end of section A assures the existence of a non-singular matrix S such that the similarity transformation S-1DS reduces D to canonical form. A process for obtaining a suitable matrix S for any given matrix D will now be derived.

(1) Principal Vectors and Latent Roots

A principal vector of order q is defined as non-trivial vector x satisfying the conditions

  (A – λI)q-1x ≠ 0,                                         (3.14)
  (A – λI)qx= 0.  

The corresponding value of λ is called the latent root, or more briefly the root, associated with the principal vector x.

  Theorem: The unit vectors e1,e2,…,en form a set of principal vectors for any canonical matrix J.

Proof: In order to relate the unit vectors to the structure of Jn, the following notation is adopted:

  xkj = er,
where
 r = j + np ;
 k = 1,2,…,m;
 j = 1,2,…,nk.

Thus xkj is a column vector with zero components except for a unit element in the jth position of the group of rows occupied by the kth Jordan box Jnk(λ). As k and j range over the integers 1 to m and 1 to nk respectively, the vectors xkj range over the set of unit vectors e1,e2,…,en. Clearly then,

  [Jni,ni) – λI] xkj = (λk – λ)xkj + xk,j-1   if j ≠ 1,
   = (λk – λ)xkj if j = 1.

For example, if k = j = 2, and J is the matrix of section 3A11,

  [Jni,ni) – λI] x22 = λ1–λ 1       0 = 0 .
   λ1–λ 1      00
    λ1–λ     00
     λ2–λ 1    01
      λ2–λ   1  λ2–λ
        λ3–λ 0  0 

If λ = λk then

  (Jn – λkI)xkj = xk,j-1,       j ≠ 1;                              (3.15)
   = 0, j = 1.

Clearly then

  (Jn – λkI)j-1xkj = xk,1 ≠ 0                                     (3.16)
and
  (Jn – λkI)jxkj = 0.  

Thus xkj is a principal vector of order j associated with the latent root λk and the theorem is proved.

  Theorem: Every matrix D possesses a set of n linearly independent principal vectors.

Proof: Let Jn be the canonical form of D. Then

  S-1DS = J
and
  S-1(D – λI)S = J – λI.

Therefore

  S-1(D – λkI)j-1Sxkj = (J – λkI)j-1xkj ≠ 0                         (3.17)
and
  S-1(D – λkI)jSxkj = (J – λkI)j-1xkj = 0.

Therefore Sxkj is a principal vector of order j of the matrix D and is associated with the root λk.

The principal vectors of A are therefore the columns of S and since S is non-singular, they are linearly independent. In other words, the matrix S required for the canonical reduction of D and hence for the solution of (3.3) is made up of the principal vectors of D arranged in descending order for each latent root as indicated by the relations (3.15). These relations carry over to the matrix D in the form

  (D – λkI)Skj = Sk,j-1,     j ≠ 1
  (D – λkI)Sk1 = 0

where Skj is the r-th column of S and r = j + nk.

Eigenvectors. A principal vector of order one is called an eigenvector. If in the canonical form of D, nk = 1, k = 1,2,…,m, the principal vectors are all eigenvectors and they form a linearly independent set. The nk are necessarily equal to unity if

1.   the latent roots are distinct, or
2.A is symmetric.

The proof of (2) rests on the fact that if D is symmetric so is J. Therefore J has no off-diagonal elements.

The significance of a linearly independent set of eigenvectors is that the solution of (3.3) then involves only exponentials and no terms of the form tkeλt.

(2) The Adjoint Matrix

It is easily verified that the pth power of a canonical matrix J may be expressed as

  Jnpi,ni) = Jn1p1)     .
   Jn2p2)   
     
     Jnmpm)

Hence for any polynomial function of J

  F(Jn) = F[Jn11)]     .                 (3.18)
   F[Jn22)]   
     
     F[Jnmm)]

Further, if Jnkk) is a Jordan box then

  F[Jnkk)] = F(λkI0 + I1) = F(λk)I0 + F'(λk)I1 + k)I2 + … .      (3.19)

For example,

  J34 = λk4    k3    k2 .
   λk4    k3
    λk4

Formally applying the above results to the function Jn-1 we have for each Jordan box

  F[Jnkk)] = Jnk-1k) = λk-1I0 – λk-2I1 + λk-3I2 – λk-4I3 + …
   = (-1)pλk-(p+1)Ip.                             (3.20)

Premultiplication by Jnk gives

  Jnkk) Jnk-1k) = (λkI0 + I1) (-1)pλk-(p+1)Ip
   = (-1)pλk-pIp + (-1)pλk-(p+1)Ip+1
   = (-1)pλk-pIp (-1)pλk-pIp
   = I0.

Therefore Jn-1 as formally define by (3.20) is indeed the inverse of Jn and the preceding results may therefore be extended to negative powers of J.

To avoid difficulties due to the possible singularity of the matrix J we consider instead the adjoint of J, to be denoted by Δ(J) and defined by

  Δ(J) = |J| J-1.                                           (3.21)

Since the determinant |J| is equal to

  k)nk

and since the largest negative power of λ occurring in J-1 is λk-nk it follows that the adjoint is finite even for singular matrices. Since J and J-1 commute

  J Δ(J) = Δ(J) J = |J| I.                                     (3.22)

(3) Principal Vectors Derived from the Adjoint

Since Jni,ni) – λI = Jni–λ,ni) is a canonical matrix, substitution in (3.22) gives

  [Jni,ni) – λI] Δ[Jni–λ,ni)] = |Jni–λ,ni)| I.

If λ is chosen equal to any latent root λk the determinant on the right vanishes and therefore any non-trivial column of the adjoint Δ[Jni–λ,ni)] is an eigenvector of Jn corresponding to the root λk (see Eqs. (3.14) and the definition of an eigenvector in section 3C1). The existence of such non-trivial columns will now be demonstrated.

The form of a single box of the adjoint of (J – λI) is shown below for nk = 3.

  Δ[Jnkk – λ)] = j – λ)nj  k – λ)-1   -(λk – λ)-2   k – λ)-3    (3.23)
   k – λ)-1   -(λk – λ)-2
    k – λ)-1

In the definition of Jni,ni) in section 3A11 the symbol λk is used to denote the diagonal element of the kth Jordan box. This notation is not meant to imply that the elements λ12,…,λm are necessarily distinct. To this point the occurrence of two or more equal values of λk has not entered the discussion but a distinction must now be made.

Let mk be the multiplicity of the root λk in J. Thus if T is the set of latent roots equal to λk the multiplicity of λk is

  mk = nk.

In particular, if the λk are distinct mk = nk; and if a root λk is repeated in more than one Jordan box then mk > nk.

The symbol will now be adopted for the product of the λ’s. The prime indicates that the factors are chosen only from a set of distinct λk. Then

  λknk = λkmk.

The product in (3.23) may now be written as

  j – λ)mj.

Factoring out (λj – λ)mk and multiplying it into the matrix, Eq. (2.23) appears as follows:

  Δ[Jnkk – λ)] = j – λ)mj k – λ)mk-1   -(λk – λ)mk-2   k – λ)mk-3
   k – λ)mk-1   -(λk – λ)mk-2
    k – λ)mk-1
        (3.24)

The same relation for a general value of nk may be written concisely in terms of the matrices Ik as follows:

  Δ[(Jnkk – λ)] = j – λ)mj (-1)pk – λ)mk-(p+1) Ip .       (3.25)

Suppose now that λk is a latent root occurring in a single Jordan box so that mk = nk. If the adjoint is evaluated at λ = λk all Jordan boxes other than the kth are null due to the occurrence of the factors k – λk)mk in the product . For the kth box the product

  k – λj)mj

is some non-zero constant c0. The kth box is also null except for an element equal to

  c0(-1)nk-1k – λk)mk-nk = (-1)nk-1c0

in the upper right corner. This may be verified by referring either to the example of (3.24) or to the general expression (3.25). Therefore the only non-zero column occurring in the adjoint is the vector (-1)nk-1c0xk1.

Eigenvectors (and principal vectors) are defined by a homogeneous equation and therefore any scalar multiple of an eigenvector is also an eigenvector. For convenience, the non-zero scalar multiplier (-1)nk-1c0 in the column of the adjoint may be dropped to give xk1 as the eigenvector corresponding to λk. This is the same result previously obtained from Eqs. (3.16).

The remaining principal vectors associated with λk may be obtained by differentiating the adjoint with respect to λ. For convenience the scalar constants cp are defined as

  cp = k – λ)mj .

Differentiating (3.24) or (3.25) gives

  Δ[Jnkk – λ)] = c1        (-1)nk-1 – c0     (-1)nk-2   0 
        (-1)nk-2
        

or

  Δ[(Jnkk – λ)] = c1(-I1)nk-1 – c0(-I1)nk-2 .

Since c0 ≠ 0 the derived adjoint contains a non-zero column proportional to the eigenvector. If c1 = 0 it contains in addition a linearly independent column with a component proportional [to] xk2, which is the principal vector of order 2 associated with λk. If c1 ≠ 0 this column contains also a component in the direction of xk1. This brings out a point already implicit in (3.16); namely that a principal vector of order greater than 1 is non-unique to the extent that it may have added to it any linear combination of the associated principal vectors of lower order.

The pattern should now be clear. The final column in the kth Jordan box of

  Δ

is a principal vector of order p+1 associated with the eigenvalue λk. At most p other columns are non-zero and they are principal vectors of lower order. Note also that for p ≥ nk the derived adjoint is null.

The case where λk is repeated in two or more boxes may now be disposed of briefly. The element of lowest degree in (λk – λ) which occurs in the adjoint is (λk – λ)mk-nk as shown by (3.24) or (3.25). In this case mk > nk so that the adjoint evaluated at λ = λk is null, as are all derived adjoints up to but not including the (mk–nk)th. The (mk–nk)th derived adjoint has a single non-zero column proportional to the eigenvector associated with the kth box. The successive derived adjoints up to the (mk–1)st repeat the pattern established above and yield in all the nk principal vectors associated with the kth box. Note, however, that this same process is going on simultaneously for all boxes with diagonal elements equal to λk. Thus more than one new independent column may be introduced by each differentiation. In particular, the (mk–1)st derived adjoint introduces all the principal vectors of highest order associated with each occurrence of λk.

(4) Principal Vectors of a General Matrix

The results obtained above for a canonical matrix may now be extended to a general matrix A by the use of the following train of relations: If

  A= SJS-1,
then
 A-1= SJ-1S-1
 
  Δ(A – λI) = | A – λI | A-1
   = | J – λI | SJ-1S-1
   = S | J – λI | J-1S-1
 
Δ(A – λI) = SΔ(J – λI)S-1                                   (3.26)

and

  Δ(A – λI) = S [ Δ(J – λI)] S-1,     p = 0,1,2,…

Then

  (A – λI)q Δ(A – λI) = S [(J – λI)q Δ(J – λI)] S-1.            (3.27)

Since S is non-singular Eq. (3.26) shows that Δ(A – λI) possesses the same number of independent columns as does Δ(J – λI). Also from (3.27) the functions

  (A – λI)q Δ(A – λI)   and   (J – λI)q Δ(J – λI)

are seen to have the same zeros. From the definition of a principal vector and the results obtained for the canonical matrix, it follows that the derived adjoints of A contain columns proportional to the principal vectors of A.

D. The Singular Case

In passing from Eq. (3.2) to Eq. (3.3) it is necessary to assume that the matrix B is non-singular. The singular case will now be considered.

If the matrix B is of rank (n-r) it is possible by rearranging and taking linear sums of the component equations of (3.2) to reduce them to an equivalent system in which the coefficients of the derivatives in the last r equations are zero. The last r equations may therefore be solved directly for r of the variables in terms of the remaining variables. These r variables may then be eliminated from the first (n-r) equations. We are then left with a reduced system of equations in (n-r) variables of the form

  A1x – B1 = 0                                           (3.28)

where B1 is non-singular. Solutions of the reduced system may now be obtained by the methods previously described and the r eliminated variables are expressible in terms of these solutions.

The general solution now contains only (n-r) arbitrary constants and the system may be said to have lost r degrees of freedom. The initial values of the variables are therefore subject to r linear restraints as determined by those r equations which are free of terms in derivatives.

If the matrix A is non-singular an alternative method is available. Premultiplication of (3.2) by A-1 gives:

  Ix – C = 0                                             (3.29)

where

  C = A-1B.

Let J = S-1CS be the canonical form of C. Then Eq. (3.29) may be transformed to

  y – J = 0                                              (3.30)

where

  y = S-1x.

Then

  y = eJ-1tc                                               (3.31)

is the general solution of (3.30). [Compare Eq. (3.11).]

If B is non-singular so are C and J and hence (3.31) is a valid result. If B is singular J-1 does not exist but the solution may still be made meaningful in the following way. Because C is singular it has at least one latent root equal to zero. Suppose it has exactly one zero root and let S be so chosen that this root occurs as the last element of J. Formally inverting J by the use of Eq. (3.20) the last latent root of J-1 becomes infinite. The solution y = eJ-1tc therefore has a discontinuity at t = 0 unless the vector c (i.e., the vector of initial conditions) is so chosen that it has no component in the direction of the eigenvalue of J corresponding to the zero latent root. A single linear restraint is thus imposed on the variables and at the same time one of the exponential terms in the general solution is dropped. This eigenvector of the canonical matrix J is of course the vector en (see Eq. (3.16)). The restraint is therefore simply cn = 0 and hence yn = 0. This is the same linear restriction arrived at by the previous method as may be verified as follows: The transformation

  S-1A-1[Ax – B]    
 x = Sy

is one of the possible ways of reducing Eq. (3.2) to the form (3.28), for we obtain (3.30) namely,

  y – J = 0.

Since the last row of the matrix J is null the system may now be reduced to the form (3.28) by dropping the last equation and using it as an auxiliary restraint. This equation is simply yn = 0 as was to be demonstrated. The same reasoning may be used to extend this result to the case of multiple zero roots of B. We may therefore lay down the following procedure for the case when B is singular.

1.   Obtain the latent roots and principal vectors of the matrix A-1B.
2. Use the solution (3.31) but with terms corresponding to the “infinite” roots of J-1 deleted and the components of c in the direction of the corresponding eigenvectors made zero.

This result is very useful in the present problem since the matrix A is known to be non-singular and the matrix B is in some cases singular. Also the inverse of the matrix A happens to be of interest in itself since A-1 is the solution of the static system (1.1).

E. The Particular Integral

The particular integral of (3.1) is now easily obtained.

  Ax – B = z.                                             (3.1)
Again assuming B non-singular,  
  B-1Ax – = B-1z.
Let
 B-1A = SJS-1.
Then
 S-1JSx – S-1IS = B-1z.
Letting y = Sx yields    
 Jy – = SB-1z.                                           (3.32)

A particular integral is then given by

  y = -eJt e-Jt SB-1z dt                                     (3.33)

as may be verified by substitution in (3.32). Finally,

  x = S-1y = -S-1eJt e-Jt SB-1z dt .

F. The Economic Model

The equations of motion (1.2) of the dynamic economic model proposed in Chapter 1 may be expressed in matrix notation as

  (I – A)x – B = z                                        (3.34)

where A = [aij] is the matrix of flow coefficients and B = [bij] is the matrix of capital coefficients. As indicated in section 3D the associated homogeneous equation may be put in the form

  x – D = 0                                             (3.35)

where D = (I – A)-1B. The general solution is then expressible in terms of the latent roots and principal vectors of D.

As indicated in Chapter 1 the final demand functions of interest in the present problem are of the form

  z = geμt

where g is an arbitrary vector and μ is a constant. A particular integral of (3.34) then takes the simple form

  x = (I – A – μB)-1geμt.                                    (3.36)

This may be verified by substitution in (3.34) as follows:

  {(I – A)[I – A – μB]-1 – μB[I – A – μB]-1}geμt = Igeμt = z.

Notes

a.   A system of linear differential equations of general order may be reduced to an equivalent first order system.9
b. The particular order of occurrence of the Jordan boxes along the main diagonal is of no significance and may be changed by a different choice of the matrix S.

 

Chapter 4.   Latent Roots and Principal Vectors

In Chapter 3 the solution of linear differential equations with constant coefficients was reduced to the problem of evaluating the latent roots and principal vectors of a certain matrix. A number of possible computational methods will now be considered with a view to choosing those methods best suited to the solution of the problem outlined in Chapter 1. Methods such as that of Jacobi which apply only to symmetric matrices will not be considered.

The complete solution of this problem requires the calculation of all the latent roots and all the principal vectors of the matrix (I – A)-1B. Since the matrix is not symmetric, complex roots may be expected to occur and indeed these complex roots are of particular interest in the theory of the business cycle.[a]  The method chosen must therefore be capable of extracting all roots, real or complex, and the associated principal vectors.

A. Bounds on the Roots

The rate of accumulation of round-off error in some of the methods to be described depends critically on the magnitude of the latent roots of the matrix concerned. The latent roots also determine the magnitude of the largest number to be encountered in the course of the calculation.[b]  For two reasons then it is useful to have a preliminary estimate of the latent roots. The best such estimate available with a small amount of computation is furnished by Gergšgorin’s theorem11 which follows.

 

Theorem: Let Ci be the closed region of the complex plane enclosed by a circle with center at aii and radius

  ri = |aij| .

Then all the latent roots of the matrix A = [aij] lie in the region formed by the union of the circles Ci, i = 1,2,…,n.

Proof: Let x be an eigenvector of A corresponding to the latent root λ. Then Ax = λx and

  aijxj = λxi ,   i = 1,2,…,n;
aijxj = (λ – aii)xi
|aij||xj| ≥ |λ – aii||xi| .      

Let xk = max |xj| ≠ 0 since x is non-trivial. Then

  |aij||xk| ≥ |λ – aii||xk| .

Dividing by |xk| we have

  |λ – aii| ≤ |aij|

Thus λ lies in one of the circles Ci and the theorem is proved.

Since AT has the same latent roots as A a similar theorem holds for circles Di with centers aii and radii

  |aij| .

Clearly then the two bounds to the roots are given by

  max| ≤ |aij|
and
 max| ≤ |aij| .

B. The Power Method

(1) Single Dominant Root

The power method in its simplest form can be used only to obtain the dominant root of a matrix and owes its popularity to the large number of problems in which a single dominant root is of interest. The method is based on the fact that for any vector x with a non-zero component in the direction of the largest eigenvector[c]  the sequence of vectors Ax,A2x,…,Apx approaches the largest eigenvector of the matrix A. The ratios of corresponding components of two successive vectors of the sequence approach the largest latent root of the matrix A. The proof of these statements is easily established with the use of the Jordan canonical form of A. For if

  A = SJS-1                                     (4.1)
then
  Apx = SJp(S-1x).

If λ1 is the dominant root of A then Jp has the form[d]

  Jp = λ1p    
   Jn2p2)   
     
     Jnmpm)

where each submatrix is of the form

  Jkpk) = λkp   kp-1   λkp-2   .
   λkp kp-1  
    λkp  
      
      λkp

Since λ1 is dominant

  Jp = λ1p 1    .
  0
   
    0

Hence

  Jp(S-1x) = λ1pαe1 ,

where α is the first component of (S-1x) and e1 is a unit vector. Since the eigenvector of A corresponding to λ1 is Se1 by Eq. (3.17) and since x has a component in the direction of this eigenvector it follows that α is not zero. Thus

  Apx = SJp(S-1x) = λ1pαSe1 .

The expression on the right is an eigenvector of A. Furthermore,

  (Apx)j = λ1pα(Se1)j = λ1 .
(Ap-1x)j λ1p-1α(Se1)j

The convergence is linear and depends on the ratios

  |
λ1
λk
|.

For this reason troubles arise when two or more roots of equal modulus[e] occur. This problem will be deferred and for the present the roots are assumed to occur singly.

(2) Smaller Roots

The simple power method may be extended to obtain roots other than the dominant root in either of two ways, which will be referred to as the modified power method and the method of deflation.

The Modified Power Method. Let the roots of matrix A be denoted by λ12,…,λn where λ1 > λ2 > … > λn. Equations (3.18) and (3.19) show that if F is any polynomial function then the roots μi of the matrix F(A) are given by

  μi = F(λi).

In particular if B = A – αI then the roots of B are λi – α. Thus if α ≥ λ1 the dominant root of B becomes μn = λn – α. It is then possible to determine μn by applying the power method to B and so obtain λn. Furthermore, Eq. (3.20) shows that the roots of A-1 are the reciprocals of the roots of A. Thus the roots of the matrix B = (A – αI)-1 are the quantities

  μi = 1 / (λi – α) .

Clearly any one of the μi may be made dominant by choosing α close enough to the corresponding root αi. Thus each value of μi (and from it λi) may be obtained from the matrix B by a suitable choice of α. In the absence of any knowledge of the location of λi, the required values of α must be obtained by repeated trials.

The Method of Deflation. This method, like the preceding, modifies the matrix A so as to make a secondary root dominant, but differs in that the dominant root only is changed and is replaced by zero. If the roots of A are distinct and if ci and ri are respectively the eigenvectors of A and AT corresponding to the root λi and with length so chosen that riTci = 1, then

  riTci = δij .                                             (4.2)

For

  riT(Acj) = (riTA)cj .
 
λjriTcj = λiriTcj
 
andj – λi)riTcj = 0.

If i ≠ j then λi ≠ λj and hence riTcj = 0. Equation (4.2) is therefore established. If will now be shown that

  A = λiciriT .                                         (4.3)

Note that each product ciriT is a square matrix of order n. To establish the relation (4.3) consider the matrix product

  (A – λiciriT)cj = Acj λiciriTcj
   = λjcj λiciδij
   = (λj – λj)cj = 0.

Since this is true for each of the n linearly independent vectors cj, it follows that the matrix in brackets above is null and (4.3) is established.

Suppose the dominant root and corresponding eigenvector of A have been obtained. Then the corresponding eigenvector of AT may be obtained by re-applying the power method, or more simply (since λ1 is now known), by solving the singular homogeneous system

  (AT – λ1I)x = 0

to obtain a non-trivial vector x. The “deflated” matrix

  B = A – λ1c1r1T = 0·c1r1T + λiciriT

is now formed. Clearly B has the same roots as A except that the dominant root λ1 has been replaced by zero. The root λ2 of B may now be obtained by a further application of the power method. The process may be repeated for successive roots but the error accumulation is rather rapid. Other methods of carrying out the deflation process are discussed by Feller and Forsythe.14 If A is symmetric ri = ci and a separate calculation of r1 is not required.

(3) Multiple Roots

If multiple roots occur two cases must be distinguished:

1.   The canonical form of J is diagonal and hence
  nk = 1,     k = 1,2,…,m.
In this case the principal vectors are all of order 1 and therefore the eigenvectors are n in number and form a complete set. The matrix A is said to be non-defective.
2. The order of one or more of the Jordan boxes exceeds one and the eigenvectors do not form a complete set. The matrix A is said to be defective.

If the matrix A is non-defective there are two or more independent eigenvectors corresponding to the same root. Furthermore any linear combination of these independent eigenvectors is itself an eigenvector. Thus the power method will converge to the dominant latent root but the eigenvector obtained will not be unique. By starting from different initial vectors the process will in general converge to different possible eigenvectors associated with the dominant root so that finally all of the independent eigenvectors are obtained.

If the matrix A is defective the form of a power of the Jordan box given in (3.19) shows that a single limit of the sequence Ax,A2x,…,Apx exists. For example, if λ1 is dominant and n1 = 3 then

  J3p1) = λ1p   1p-1   ½ p(p–1)λ1p-2
   λ1p   1p-1  
    λ1p  
 
  = ½ p(p–1)λ1p-2 12   1   1 .
p(p–1) (p–1)
   12 1
p(p–1) (p–1)
    12
p(p–1)

Therefore

  J3p1) = ½ p(p–1)λ1p-2 0   0   1 .
00
0

Thus the sequence Ax,A2x,…,Apx converges always to the eigenvector associated with the largest root. The power method therefore fails to obtain the principal vectors of order greater than one.

C. Methods Employing the Characteristic Polynomial

The characteristic polynomial of a matrix A is defined as

  P(λ) = | A – λI |.

The equation P(λ) = 0 is called the characteristic equation of the matrix A. P(λ) is a polynomial of degree n and for a canonical matrix the expansion of the determinant clearly yields

  P(λ) = | J – λI | = k – λ)nk .

Thus the zeros of P(λ) are the latent roots of J and each root occurs with the same multiplicity in P(λ) as in J. This also holds true for a general matrix A since similar matrices have the same characteristic polynomial, as will now be demonstrated.

Suppose A and B are similar matrices, then

  A = S-1BS

and since

  | S-1 | | S | = | S-1S | = | I | = 1,
then
  | B – λI | = | S-1 | | B – λI | | S | = | S-1BS – λS-1IS | = | A – λI |.

Clearly then the latent roots of a matrix may be obtained in the following two steps:

1.   Compute the coefficients of the characteristics polynomial P(λ); and
2. Solve the characteristic equation P(λ) = 0.

Since the degree of P(λ) is the same as the order of the matrix the numerical solution of the characteristic equation may be very difficult. The problem of solving polynomial equations is taken up in Chapter 5. We now proceed to consider two closely related models for generating the characteristic polynomial of a matrix.

(1) The Method of Bingham 15

Let

  P(λ) = λn + c1λn-1 + … + cn

be a polynomial with zeros λ12,…,λn. Defining

  sk = λik ,                                             (4.4)

it is possible with the help of Newton’s identities (reference 16, p. 147) to express the coefficients of P(λ) in terms of the quantities sk, k = 1,2,…,n as follows:

  c1 = -s1
 c2 = -1/2 (c1s1 + s2)
 c2 = -1/3 (c2s1 + c1s2 + s3)
and in general   
 ck = -1/k (ck-1s1 + ck-2s2 + … + sk) .                           (4.5)

The quantities sk corresponding to the characteristic polynomial of a canonical matrix J are easily obtained, for clearly the sum of the diagonal elements of J is equal to s1. Furthermore, Eqs. (3.18) and (3.19) show that the sum of the diagonal elements of Jk is equal to sk.

The sum of the diagonal elements of a matrix A is usually called the trace of A or more briefly Tr A. Using this term we may write

  Tr Jk = sk .                                              (4.6)

Moreover, the trace is invariant under a similar transformation for if

  B = S-1AS

then in terms of components

  bij    = s-1ikakmsmj
and  
  Tr B = bii = s-1ikakmsmi
          = akm (smis-1ik)
          = akmδmk = akk = Tr A.

Consequently if (4.6) is true for the canonical form of a matrix A it is also true for A. Hence

  Tr Ak = sk.                                             (4.7)

To summarize, the coefficients of the characteristic polynomial of the matrix A may be obtained in the following steps:

1.   Compute sk = Tr Ak, k = 1,2,…,n;
2. Compute the coefficients ci from (4.5).

(2) The Method of Frame 17

The Latent Roots. Let c0,c1,…,cn be the coefficients of the characteristic polynomial of the matrix A and let Fj(A) be a sequence of polynomial functions of A defined by

  F0(A) = I,                       (4.8)
  Fk(A) = AFk-1(A) + ckI,     k = 1,2,…,n.  

Now

  Tr[AF0(A)] = s1
  Tr[AF1(A)] = s2 + c1s1
  Tr[AF2(A)] = s3 + c1s2 + c2s1

and in general

  Tr[AFk-1(A)]= sk + c1sk-1 + ... + ck-1s1
  = -kck

by virtue of (4.5).

ck = -1/k Tr[AFk-1(A)].

Using this expression for ck, Eqs. (4.8) may be written

  F0(A) = I,                       (4.9)
  Fk(A) = AFk-1(A) – 1/k Tr[AFk-1(A)] I.  

Equations (4.9) define the sequence of operations in the Frame method.

The Bingham process computes the sequence of matrices A,A2,A3,…,An and from the traces s1,s2,…,sn of these matrices computes the coefficients by the use of Newton’s identities. The Frame method also uses repeated multiplication by A but at each step modifies the matrix so as to derive the coefficients directly without explicit use of Newton’s identities. The Frame method has the further advantage that

  Fn(A) = P(A) = 0.                                        (4.10)

This relation serves as a useful overall check on the accuracy of the computation.

The Principal Vectors. The adjoint of (A – λI) may be expressed as a polynomial in λ with coefficients formed from the sequence of matrices (4.8) as follows:

  Δ(A – λI) = – {λn-1F0(A) + λn-2F1(A) + … + Fn-1(A)}.           (4.11)
 
For
  (A – λI) Δ(A – λI) = λnI + λn-1(F1–AF0) + … + λ(Fn-1–AFn-2) – AFn-1.
But
  Fk – AFk-1 = ckI,     k = 1,2,…,n,

by (4.8) and since Fn = 0 then

  (A – λI) Δ(A – λI) = [λn + c1λn-1 + … + cn] I
   = | A – λI | I

by the definition of the characteristic equation. Thus Δ(A – λI) as given by (4.11) is indeed the adjoint of (A – λI). When evaluated at a value of λ equal to a latent root the columns are proportional to the eigenvectors associated with that root. Furthermore since λ is general in the expression (4.11) the derived adjoints may be obtained by differentiation. The derived adjoints furnish all the principal vectors as demonstrated in section 3C.

D. Error Analysis

The power method is an iterative process and error accumulation is therefore not a problem. In the use of the Frame method however the error accumulation may be expected to be rather rapid and a bound on the error will now be established.

Assume that A is non-defective and consider the series of matrices Fk(A) defined by (4.8). Because of round-off error the computed value k(A) will differ from Fk(A) and we may write

  k(A) – Fk(A) = Ek.

The propagation of the error introduced at the first step will be investigated first. Let the length of the longest column vector of Ek be denoted by εk. Then

  k+1(A) – Fk+1(A) = A[k(A) – Fk(A)] – 1/k Tr{A[k(A) – Fk(A)]}I
   = AEk1/k Tr(AEk)I.

Since for any vector x,

  | Ax | ≤ | λ | | x |

where λ is the dominant root of A, then

  εk+1
≤ | λ k + n/k | λ k
  
= | λ |
n+k
k
εk
εn-1
≤ | λ |n-1
(2n-1)! 
[(n-1)!]2
ε0 .

The error computed above is due to the round-off committed at the first step. Since a similar round-off occurs at each step the total error bound ε will be a sum of such terms as follows:

  ε < ε0
[2(n–k)+1]! 
[(n–k)!]2
| λ |n-k . 

For large values of n the error accumulation may therefore be very severe but in work with matrices the maximum error bounds are usually found to be too pessimistic.

E. Improving an Approximate Solution

Let A be a non-defective matrix and let S be the matrix which diagonalizes A. If S0 is an approximation to S then the method of Jahn18 may be used to improve this approximation. The object is to compute S where AS = SΛ and Λ is a diagonal matrix. Since S0 ≐ S we have

  S0-1AS0 = Λ1 + B

where Λ1 is diagonal and B is a matrix whose elements are small; in particular the diagonal elements are zero. Let the diagonal elements of Λ1 be λ12,…,λn and let C = [crs] where

  crs =
brs
λr – λs
.

Then S1 = S0 (I + C) is an improved approximation to S correct to second order. For clearly B = CΛ1 – Λ1C

S0-1AS0 = Λ1 + CΛ1 – Λ1C     and     AS1 = AS0 (I + C).

Hence

  AS0 = S01 + CΛ1 – Λ1C)
 
AS1 = S01 + CΛ1 – Λ1C) (I + C)
   = S01 + CΛ1 – Λ1C + Λ1C + CΛ1C – Λ1C2)
   = S0 (I + C) Λ1 + S0 (CΛ1 – Λ1C) C
   = S1Λ1   plus terms of second order in C.

F. Comparison of Methods

The amount of computation required to generate the characteristic equation by either the Frame or Bingham method is of the order n4 multiplications. In addition the solution of the equation so obtained will require considerable labor. Consequently if only a few of the roots are wanted, use of the power method is indicated. If a complete solution is desired, the power method has serious shortcomings. Firstly, the process is not uniform since special modifications are required to care for special cases such as complex or multiple roots. Furthermore in the case of a defective matrix it fails to obtain the principal vectors of order > 1.

The attractive features of the Frame method are its complete generality and uniformity. Unlike the power method the process proceeds in a purely routine manner independent of the type of roots encountered. The completion of the method of course assumes the solution of a polynomial equation of degree equal to the order of the matrix and some of the essential difficulties may re-appear in this phase of the problem. In any case the solution of polynomial equations is an important problem in its own right and the development of suitable methods is worth considerable effort. It therefore seems reasonable to base the solution of the problem of latent roots on the assumption of the solvability of the characteristic equation.

If the round-off error in the Frame method should be so severe that the desired accuracy is not attained, the method of Jahn may be used to improve the accuracy. Each application of the Jahn method requires the equivalent of about four matrix multiplications.

Notes

a.   The term of the general solution corresponding to a pair of complex roots is periodic.
b. In an automatic computer the numbers must be kept within machine capacity.
c. Since any scalar multiple of an eigenvector is also an eigenvector, the term “largest eigenvector” makes no sense except as an elliptical expression for “the eigenvector corresponding to the largest root”.
d. The assumption that λ1 occupies the first position is not necessary to the argument. If λ1 is in the rth position the associated eigenvector of J becomes er.
e. The case of a pair of complex roots may be handled by a slight modification of the method discussed above. See reference 12; or reference 13, page 299.

 

5.   Zeros of a Polynomial

A. Methods Depending on an Initial Approximation

The problem of locating the zeros of a polynomial of high degree is greatly simplified when good initial approximations to the zeros are known. This situation is often met in practice because of the frequent occurrence of families of polynomial equations. If, for example, the solution of a polynomial equation is required within some larger iterative process, the coefficients may be expected to vary gradually from one step to the next and the set of roots obtained at one step may be expected to provide good approximations to the roots of the next equation. Under these conditions, the Birge-Vieta19 method and the quadratic factor method are probably the best, since they converge quadratically near a root. Before proceeding to a discussion of these methods certain useful processes will be developed.

1. Synthetic Division16

Let Pn(x) = a0xn + a1xn-1 + … + an-1x + an be a polynomial of degree n with real coefficients. Let

  Q(x) = b0xn-1 + b1xn-2 + … + bn-1

where the coefficients bi are defined by

  b0 = a0 ,                             (5.1)
  bi = ai + pbi-1 ,       i = 1,2,…,n.  

Then

  (x–p) Q(x) + bn  = (x–p) [b0xn-1 + (a1+pb0)xn-2 + … + (an-1+pbn-2)] +
                                           an+pbn-1
   = a0xn + a1xn-1 + … + an .

Therefore

  Pn(x) = (x–p) Q(x) + bn                                     (5.2)
and
  Pn(p) = bn .

Hence bn is the value of the polynomial evaluated at p and when bn is zero, Q(x) is the so-called reduced equation. The effect of the algorithm (5.1) is to divide the polynomial P(x) by the linear factor (x–p) to obtain the quotient polynomial Q(x) and the remainder bn. The algorithm is therefore called synthetic division. Adopting the convention that all coefficients with negative subscripts are zero, Eq. (5.1) may be written more concisely as

  bi = ai + pbi-1,     i = 0,1,2,…,n.                              (5.3)

Synthetic division by factors of any degree is easily defined but the case of a quadratic factor is of particular interest. In this case let

  bi = ai + pbi-1 + qbi-2,     i = 0,1,2,…,n–1,                       (5.4)
  bn = an + qbn-2.

Then

  (x2 – px – q) Q(x) + bn-1x + bn = Pn(x)                        (5.5)

as may be verified by expanding the left-hand side.

2. Change of Origin

A translation or change of origin to the point x = p is defined by the substitution y = x – p. The change in the coefficients of Pn(x) induced by such a translation may be derived from the relation

  Pn(x)= a0(y+p)n + a1(y+p)n-1 + … + an
   = b0yn + b1yn-1 + … + bn.

The coefficients bk are most conveniently computed by the process known as Horner’s method.16 This method is nothing more than a repeated application of synthetic division to the successive quotient polynomials. For n = 4 the scheme of computation is as follows:

    a0   a1   a2   a3   a4         p
     pb0     pb1     pb2     pb3     └──
──────────────────────┘
    b0   b1   b2   b3   b4  
     pc0     pc1     pc2  
─────────────────┘
    c0   c1   c2   c3  
     pd0     pd1  
────────────┘
    d0   d1   d2  
     pe0  
───────┘
    e0   e1  
───┘
    f0  

The polynomial in y is then

  f0y4 + e1y3 + d2y2 + c3y + b4.

To prove that the polynomial in y may be so obtained let the successive quotient polynomials Qk(x) be defined by

  Pn(x) = (x–p)Qn-1(x) + bn,
  Qn-k(x) = (x–p)Qn-k-1(x) + bn-k,     k = 1,2,…,n.

Then

  Pn(x) = (x–p)[(x–p)Qn-2 + bn-1] + bn         (5.6)
   = (x–p){(x–p)[((x–p)Qn-3 + bn-2] + bn-1} + bn
   = … = (x–p)nb0 + (x–p)n-1b1 + … + (x–p)bn-1 + bn
   = b0yn + b1yn-1 + … + bn.

Repeating the differentiation of the relation

  Pn(x) = b0(x–p)n + b1(x–p)n-1 + … + bn-1(x–p) + bn

yields the values of the successive derivatives of Pn(x) evaluated at the point p. Thus

 
dkPn(x)    p
dxk
= k!bn-k,     k = 0,1,2,…,n.                         (5.7)

3. The Birge-Vieta Process

The so-called Birge-Vieta process for locating a real root of a polynomial was first proposed by Vieta in 1600 and was discarded by later mathematicians as requiring too much computation. It was not until the advent of the modern desk calculator that the method was re-established by Birge and came into wide use. The method is perhaps the most satisfactory known for obtaining real roots with a desk calculator.

The Birge-Vieta method is based on the well-known Newton-Raphson process (reference 20, page 84) which is defined as follows: Let be a real roots of the equation

  f(x) = 0.

Then for any value x0 sufficiently close to the quantities xi defined by

  xi+1 = xi
f(xi)
f '(xi)
,     i = 0,1,2,… ,

will converge to the root . The convergence is quadratic; i.e., the difference xi is approximately squared at each step. When the function f(x) is a polynomial the discussion of Horner’s method and Eq. (5.7) make it clear that two synthetic divisions with p = x0 will furnish the quantities

  bn = Pn(x0)   and   bn-1 = Pn'(x0) .

The process is then repeated using the quantity

  x1 = x0
bn
bn-1
,

and in general

  xi+1 = xi
bn(xi)
bn-1(xi)
.                                       (5.8)

Equation (5.8) defines the Birge-Vieta process. This process is usually employed only for real roots, but may be extended to complex roots by simply allowing the variables x0,x1,… to be complex. Since a polynomial is an analytic function21 the derivative Pn'(x) exists for x complex, and the process carries over without change to complex numbers. (Note, however, that if the initial approximation x0 is real, the successive xi remain real.) Every operation is now complex and hence the amount of computation is increased by a factor of about four.

The complex Birge-Vieta method can be applied equally well to the solution of polynomial equations with complex coefficients, but for real polynomials the method to be described below is generally considered to be superior.

4. The Quadratic Factor Method

Since the complex zeros of a real polynomial occur in conjugate pairs it is possible to factor any real polynomial into a set of real linear and quadratic factors. Using a method similar to the Birge-Vieta process it is possible to improve an approximate quadratic factor by repeated synthetic divisions of the type described by Eq. (5.4). The goal is to so adjust the coefficients p and q as to make both remainders bn-1 and bn zero. Treating bn-1 and bn as functions of p and q and expanding in Taylor’s series we have

  bn(p+δp, q+δq) = bn+
∂bn
∂p
δp +
∂bn
∂q
δq + …
  bn-1(p+δp, q+δq) = bn-1+
∂bn-1
∂p
δp +
∂bn-1
∂q
δq + … .

A generalization of the Newton-Raphson process to two variables then gives the desired corrections δp and δq as solutions of the pair of linear equations

  bn+
∂bn
∂p
δp +
∂bn
∂q
δq = 0                          (5.9)
  bn-1+
∂bn-1
∂p
δp +
∂bn-1
∂q
δq = 0.    

The quadratic factor method as first proposed by Bairstow22 and as presented in standard works23, 24 on numerical analysis requires the use of a second synthetic division to evaluate the derivatives in (5.9) and the application of the corrections δp and δq so derived. A modification of the method, which provides a more uniform computational scheme[a] and simplifies the extension to a higher order process, will now be described. The proof is also simplified.

We defined a modified algorithm for synthetic division, similar to that given by (5.4), namely,

  ci = ai + pci-1 + qci-2,     i = 0,1,2,…,n.                        (5.10)

This process is completely uniform, without exception for the computation of cn. Comparing with (5.4)

  ci = bi,                     i = 0,1,2,…,n-1.
  cn = bn + pcn-1,

letting

  Q(x) = c0xn-2 + … + cn-2,

Eq (5.5) becomes

  Pn(x) = (x2 – px – q)Q(x) + cn-1x + (cn – pcn-1).               (5.11)

The modification of the quadratic factor method now consists in choosing δp and δq so as to reduce to zero the quantities cn-1 and cn rather than bn-1 and bn. Note that the vanishing of cn-1 and cn is the necessary and sufficient condition for the vanishing of bn-1 and bn.

Treating x, p and q as independent variables and differentiating (5.11) partially with respect to p we have

  0 = (x2 – px – q)
∂Q(x)
∂p
– xQ(x) + x
∂cn-1
∂p
+
∂cn
∂p
– p
∂cn-1
∂p
– cn-1
xQ(x) + cn-1 = (x2 – px – q)
∂Q(x)
∂p
+ x
∂cn-1
∂p
+ (
∂cn
∂p
– p
∂cn-1
∂p
).

Thus ∂cn-1/∂p and ∂cn/∂p are respectively the first and second remainders on applying the modified algorithm to the polynomial xQ(x) + cn-1, i.e. to the coefficients c0…cn-1. Let the coefficients so obtained be d0…dn-1, then

  dn-2 =
∂cn-1
∂p
    and     dn-1 =
∂cn
∂p
.                          (5.12)

Differentiating (5.11) with respect to q we have

  0 = (x2 – px – q)
∂Q(x)
∂q
– Q(x) + x
∂cn-1
∂q
+ (
∂cn
∂q
– p
∂cn-1
∂q
)

and the first and second remainders on applying the modified algorithm to Q(x) are ∂cn-1/∂q and ∂cn/∂q respectively. Hence

  dn-3 =
∂cn-1
∂q
    and     dn-2 =
∂cn
∂q
.                          (5.13)

Applying Eq. (5.9) to the quantities cn and cn-1 and substituting the expressions obtained above for the derivatives we have

  cn-1+ dn-2δp + dn-3δq = 0.
  cn+ dn-1δp + dn-2δq = 0.

Thus:

  δp=
-(cn-1dn-2 – cndn-3)
D
                                     (5.14)
  δq=
(cn-1dn-1 – cndn-2)
D
  D= dn-22 – dn-1dn-3.

The complete uniformity of the method is clearly show in the following example for n = 5. The initial approximations are p0 and q0. Coefficients with negative subscripts are by convention assumed to be zero.

  a0 a1 a2 a3 a4 a5     p0,q0
  p0c-1 p0c0 p0c1 p0c2 p0c3 p0c4   ───
  q0c-2 q0c-1 q0c0 q0c1 q0c2 q0c3
 ───────────────────┘
  c0 c1 c2 c3 c4 c5
  p0d-1 p0d0 p0d1 p0d2 p0d3
  q0d-2 q0d-1 q0d0 q0d1 q0d2
 ───────────────┘
  d0 d1 d2 d3 d4
 
  ci = ai + p0ci-1 + q0ci-2,     i = 0,1,2,…,n;
  di = ci + p0di-1 + q0di-2,     i = 0,1,2,…,(n–1);

δp and δq are then given by (5.14).

In comparison with the complex Birge-Vieta method the quadratic factor method is seen to require only half as many multiplications per iteration since it requires only two real multiplications per step while the complex Birge-Vieta method requires one complex[b] multiplication per step. Because the coefficients computed in the complex Birge-Vieta process are in general complex, two registers are required to store the real and complex parts. The storage required is therefore double that required for the quadratic factor method. On the other hand, some additional calculation is required to resolve the quadratic factors obtained to yield the actual roots. The time required for this additional calculation is, however, rather insignificant.

5. Higher Order Processes

The Newton-Raphson process for obtaining the solution of the equation f(x) = 0 may be derived by truncating the Taylor’s series expansion

  f(x) = f(x0) + (   x0)f '(x0) + (x – x0)2
f "(x0)
2!
+ …  

after the second term. In a similar way higher order processes may be derived by considering further terms of the expansion. Thus a third order process is obtained by solving

  0 = f() ≅ f(x0) + ( – x0)f '(x0) + ( – x0)2
f "(x0)
2!

for . Then

  ≅ x0
f '(x0) ± √ {[f '(x0)]2 – 2f(x0)f "(x0)}
f "(x0)
               (5.15)

where the smaller of the two possible correction terms is chosen.

In the Birge-Vieta method the convergence is already quadratic near a single root and little is gained by a higher order process. In the case of a double root the convergence of the Birge-Vieta method is linear and may be made quadratic by the use of (5.15). Use of the higher order process may therefore be advisable if multiple roots are encountered. Similar remarks apply to the application of a higher order process based on the quadratic factor method.

The value of f "(x0) is given by (5.7) and may be obtained by a third synthetic division.

To extend the quadratic factor method to a higher order process we adopt a new notation for the remainders as indicated in the scheme below.

  a0   a1   a2   an                      (5.16)
 
  c0   c1   cn-2     B   A  
 
  d0   d1   dn-4     E   D     C  
 
  e0   en-6     I     H     G   F  

Successive lines of the scheme correspond to successive applications of the modified algorithm. Using the notation

  Ap =
∂A
∂p

Eqs. (5.12) and (5.13) become

  Ap = C         Aq = D
  Bp = D         Bq = E.

Because of the complete uniformity of the algorithm the relation between any two quantities in the scheme above is determined by their relative position. For example, since G and C stand in the same relation as do D and A and since D = Aq, then G = Cq = Apq. Clearly then

  App = F         Bpp = G
  Apq = G         Bpq = H
  Aqq = H         Bqq = I.

Extending the Eqs. (5.9) to include second order terms of the Taylor’s series and using the present notation we have

  0 = A + Apδp + Aqδq +
App
2
δp2 + Apqδpδq +
Aqq
2
δq2
  0 = B + Bpδp + Bqδq +
Bpp
2
δp2 + Bpqδpδq +
Bqq
2
δq2.

Hence

  Cδp + Dδq = – (A +
F
2
δp2 + Gδpδq +
H
2
δpδq2)
  Dδp + Eδq = – (B +
G
2
δp2 + Hδpδq +
I
2
δpδq2).

This pair of non-linear equations may be solved by iteration providing that δp and δq are not too large. Each approximation to the values of δp and δq are substituted in the right-hand side and the resulting linear equations are solved for a new approximation. The initial values of δp and δq are taken as zero.

6. Factors of Higher Degree

Aitken25 has studied the extraction of real factors of higher degree than second by means of methods called “penultimate remaindering”. The modified quadratic factor method may also be extended to extract factors of higher degree. Unless all roots are known to be real the real factor must be of even degree. The case of a quartic factor will be developed briefly.

As for the quadratic factor the synthetic division algorithm

  bi = ai + pbi-1 + qbi-2 + rbi-3 + sbi-4,       i = 0,1,…,n-3,
bn-2 = an-2 + qbn-4 + rbn-5 + sbn-6,
bn-1 = an-1 + rbn-4 + sbn-5,
bn = an + sbn-4,

yields the quotient polynomial

  Q(x) = b0xn-4 + b1xn-5 + … + bn-4

such that

  Pn(x) = (x4 - px3 - qx2 - rx - s)Q(x) + bn-3x3 + bn-2x2 + bn-1x + bn.

Again the modified algorithm

  ci = ai + pci-1 + qci-2 + rci-3 + sci-4,       i = 0,1,…,n,

yields the same quotient polynomial Q(x). Adopting the notation

  D  =  cn-3 = bn-3,
C  =  cn-2 = bn-2 + pcn-3 = bn-2 + pD
B  =  cn-1 = bn-1 + pcn-2 + qcn-3 = bn-1 + pC + qD
A  =  cn = bn + pcn-1 + qcn-2 + rcn-3 = bn + pB + qC + rD
and
α  =  (x4 - px3 - qx2 - rx - s)

we have

  Pn(x)  =   αQ(x) + Dx3 + (C-pD)x2 + (B-qD-pC)x + (A-rD-qC-pB).  (5.17)

Note that the vanishing of the quantities A, B, C, and D is the necessary and sufficient condition for the vanishing of the remainders bn, bn-1, bn-2 and bn-3.

Differentiating Eq. (5.17) we have

  0 = αQp(x) - x3Q(x) + x3Dp + x2(Cp-pDp-D) + x(Bp-qDp-pCp-C) +
                Ap - rDp - qCp - pBp - B,                   (5.18)
 
0 = αQq(x) - x2Q(x) + x3Dq + x2(Cq-pDq) + x(Bq-qDq-pCq-D) +
                Aq - rDq - qCq - pBq - C,                   (5.19)
 
0 = αQr(x) - xQ(x) + x3Dr + x2(Cr-pDr) + x(Br-qDr-pCr) +
                Ar - rDr - qCr - pBr - D,                     (5.20)
 
0 = αQs(x) - Q(x) + x3Ds + x2(Cs-pDs) + x(Bs-qDs-pCs) +
                As - rDs - qCs - pBs.                       (5.21)

Re-arranging the last equation to the form

  Q(x) = αQs(x) + x3Ds + x2(Cs-pDs) + x(Bs-qDs-pCs) +
                As - rDs - qCs - pBs

shows that the application of the modified algorithm to Q(x) leaves Ds as the first remainder. Thus if the scheme of two successive applications of the algorithm is shown as

  a0   a1   an-4   an-3   an-2   an-1   an  
 
  c0   c1   cn-5   cn-4     D     C   B   A  
 
  d0     K     J     I     H     G     F   E  

we have Ds = K. The coefficient of x2 in (5.21) is

  Cs - pDs = Cs - pK.

But from the form of the remainders in (5.17) it is clear that the coefficient of x2 is J - pK.

J = Cs.

Similarly Bs = I and As = H. Rearranging (5.18) to the form

  x3Q(x) + Dx2 + Cx + B = αQp(x) + x3Dp + x2(Cp-pDp) + x(Bp-qDp-pCp)                + Ap - rDp - qCp - pBp

shows the left-hand side to be the polynomial formed of the coefficients c0,c1,…,cn-4, D, C, B, and therefore the derivatives may be obtained as before but with K, J, I, and H replaced respectively by H, G, F, and E. Thus Dp = H, Cp = G, Bp = F, Ap = E. In a similar way derivatives with respect to q and r may be obtained from Eqs. (5.19) and (5.20) and in matrix notation we have finally

  Ap Aq Ar As = E F G H .
Bp Bq Br Bs F G H I
Cp Cq Cr Cs G H I J
Dp Dq Dr Ds H I J K

Expanding A, B, C, and D in Taylor’s series and retaining only linear terms we have (compare Eq. (5.9))

  E F G H δp = -A .                            (5.22)
F G H I δq -B
G H I J δr -C
H I J K δs -D

Solutions of the fourth order linear system (5.22) yield the necessary corrections. This method removed four roots at a time but because of the greater complexity the quadratic factor method is probably to be preferred. When finally obtained, the quartic factor must itself be resolved. A method of solution is considered below.

7. Solution of the Quartic[c]

The following method is easily programmed for an automatic computer and could be used in conjunction with the method discussed above. However, it is probably of greatest value as a method for desk calculators and is believed to be superior for this purpose to existing methods. The method is capable of extracting either real or complex roots but works essentially well if all four roots are complex.

Given a real quartic

  x4 + Ax3 + Bx2 + Cx + D,

two real quadratic factors are assumed to be x2 + ax + b and x2 + αx + β.

Then

              a + α = A                               (5.23)
b + αa + β = B (5.24)
αb + βa = C (5.25)
βb = D. (5.26)

Setting

  b = D ,   α = C-Aβ ;       a = A - α                           (5.27)
β b-β

satisfies all but the second of the equations above. Imposing the further condition

  ø(β) = B - b - β - αa = 0                                   (5.28)

satisfies (5.24) and the values of a, b, α, β so obtained furnish the desired solution of the quartic in terms of its quadratic factors.

Equation (5.28) may be solved for β by the method of false position, using the fact that β is real and less than √D in absolute value. If all the roots are complex than β is also positive and (5.28) has a single real root in (0,√D). The computation of ø(β) is carried out by evaluating b, α and a according to (5.27) and substituting in (5.28).

The method fails only if b - β = 0. This case can arise only if C - AD½ = 0 and a solution may then be obtained directly from the relations

  b = β = D½,       a + α = A,       aα = B - 2D½.

Example.

  x4 + 2.5504x3 + 37.1185x2 - 38.4650x + 520.3597 = 0
b =
520.3597
β
;           α =
-38.4650 - 2.5504β
b - β
;          
a = 2.5504 - α; ø = 37.1185 - b - β - aα.
 
Choose β0
D½
2
.
β     b     α     a     ø
12. 43.36 -2.20 4.75 -7.80
14. 37.17 -3.20 5.75  4.35
13.2839 39.1722 -2.7945 5.3449 -0.4011
13.3443 38.9949 -2.8264 5.3768 -0.0237
13.3481 38.9838 -2.8284 5.3788  0.0000

A useful modification of the method consists in first making the substitution x= y + h with h = -A/4 so that the resulting equation in y contains on term in y3. Considering the equation in y we now have A = 0 and Eqs. (5.27) and (5.28) simplify to

  b = D/β,         α =
C
b - β
,         a = - α,                      (5.27a)
ø(β) = B - b + α2 - β = 0. (5.28a)

The special case mentioned above now exhibits itself immediately by the fact that C = 0.

An attempt to use a Newton-Raphson process to solve Eq. (5.28a) leads to a difficult expression for dø(β)/dβ. If, however, ø is expressed as a function of the new variable

  γ = β + b
then
ø(γ) = B - γ + d
ø'(γ) =
- 1 + γd2
C2/2
where
d =
C2.
γ2 - 4D

Although the method works for either real or complex roots the behavior of ø(γ) is greatly simplified by the assumption that all roots are complex, for then ø(γ) has exactly one real zero and since b and β are both positive γ > 2D½. Also ø'(γ) < 0 and ø"(γ) > 0 for γ > 2D½. Thus ø(γ) has one real zero and no minimum or inflection point in the interval (2√D,∞). Because of these properties the zero of ø(γ) is very easy to locate.

Using a machine which will store d as a constant multiplier the only quantities necessary to record are γ and ø. However since the final value of γ2 - 4D is required to compute b - β = √γ2 - 4D and since ø' need not be recomputed in the final stages it is convenient to record both of these quantities as well.

Example. The substitution x = y - 0.6376 in the quartic of the previous example gives

  y4 + 34.6793y2 - 83.7248y + 559.4971 = 0.

Computing C2, C2/2 and 4D we have

  d =
7009.8421
γ2 - 2237.9164
,       ø = 34.6783 - γ + d
ø' = -
1 + γd2 .
3504.9210

A few rough mental calculations show that ø(50) > 0 and ø(52) < 0.

γ     γ2 - 4D     ø     ø'
51.0000 363.0836 2.9857 -6.4237
51.4648 410.7092 0.2821 -5.2774
51.5182 416.2085 0.0032 -5.1694
51.5188 416.2704 0.0001

Hence b - β = √416.2704 = 20.4027,

  b = 35.9607,     β = 15.5581     α = - 4.1036 = -a.

Thus

  y = -2.0518 ± 5.6348i
2.0518 ± 3.3687i
x = -2.6894 ± 5.6348i
1.4142 ± 3.3687i.

The quartic formed from these roots agrees with the given quartic to four decimal places in each coefficient.

Since

  ø"(γ) = 2C2(3γ2 + 4D)
2 - 4D)3

is easily computed it is possible to use the higher order correction

  - ø' ± √ ø' 2 - 2øø"
ø"

where the sign is chosen to give the smaller correction. Application of this in the preceding example gives ø"(51) = 2.9410 and a correction of 0.5288 which is much better than that given by the Newton-Raphson method. However the simpler method is probably to be preferred.

8. Solution of the Sextic

An extension of the method of real factors to the sextic yields a practical computational method as follows: Assume that the sextic

  P6(x) = x6 + Ax5 + Bx4 + Cx3 + Dx2 + Ex + F

has real factors x2 + αx + β and x4 + ax3 + bx2 + cx + d. Again assuming that A is made zero by translation we obtain the following relations

  a =                                       (5.29)
b + β + α2 = B
c + bα - βα = C
d + cα - bβ = D
dα - cβ = E
= F.

Then if d = F/β, γ = d - β2,

  α = E/2 ± √  ( E/2 )2 + F + β [-D + β(B-β)] ,     α real          (5.30)
γ γ γ

and β is chosen as any real root of

  ø(β) = E/β - C + α(B - 2β + α2 - d/β) = 0                       (5.31)

the values of α and β so obtained satisfy Eqs. (5.29). The remaining coefficients are then furnished by the relations

  b = B - β + α2,       c = E - dα .
β

At least one root of (5.31) lies in the interval (-F1/3,F1/3). If all the zeros of P6(x) are complex the zeros of (5.31) are all positive and exactly three in number. Hence either the interval (0,F1/3) or the interval (F1/3,∞) contains one real zero only. The two values of α given by Eq. (5.30) introduce a complication but it is usually easy to recognize which value should be chosen.

The method fails only if d - β2 = 0, in which case β = F1/3 and (since α is finite) FB3 - D3 = 0. The solution is then given immediately by

  β = F3,                   α = 1 (3β - 2B + D ),
3E β

the expression for α being obtained as the limiting form of (5.30).

Example:

  A = 0       B = 46.5813       C = -89.2555
D = 1355.5763 E = -2198.2332 F = 10076.5517
d =
10076.5517
β
γ = d - β2 δ =
-1099.1166
γ
α =
δ ± √ δ2 + 100076.5517 + β [β(46.5813 - β) - 1355.5763]
d - β2
ø =
-2198.2332 + 89.2555 + α [46.5813 - 2β + α2 - d ].
ββ

Since 0 ≤ β ≤ F1/3 ≅ 22 choose β0 = 10. Linear prediction is used even if two successive values of ø do not differ in sign. R2 represents the square of the radical.

β d   γ   δ   R2   α   ø
10. 1000. 900. -1.22 1.68 -2.52 37.
12.   840. 700. -1.57 0.72 -2.42   8.
12.5   806.1241 649.8741 -1.6913 0.4863 -2.3886   2.2605
12.7   793.4292 632.1392 -1.7387 0.3741 -2.3503   0.2354
12.7226   792.0198 630.1552 -1.7442 0.3613 -2.3453   0.0049
12.7231   791.9887 630.1114 -1.7443 0.3610 -2.3452   0.0002

B. Initial Approximations

1. Method of Search

Any point in the complex plane from which one of the methods of section A will converge to a particular zero of given polynomial will be considered a satisfactory initial approximation to that zero. If the complex plane is covered with a set of points on a sufficiently fine mesh then at least one point of the set will be sufficiently close to each zero of a given polynomial so as to serve as a suitable initial approximation. If bounds on the roots are known, the number of mesh points to be considered is finite and a program which will choose in succession the points of this finite set may therefore be used to provide initial approximations. Since the fineness of the mesh required is not generally known in advance it is advisable to cover the region of interest first on a coarse mesh and repeat on successively finer meshes until all zeros are located.

This approach was employed in locating the zeros of the truncated series for the exponential26 namely

  Sn(z) = zk/k!

for values of n ≤ 23. Approximations were chosen on a rectangular mesh and each was introduced as an initial approximation to (a) the Birge-Vieta process if the point lay on the real axis, or (b) the quadratic factor method if the point was complex.

This method was found to be extremely slow for values of n > 10. In order to study the behavior of the process, an optional routine was included in the program to print out each successive approximation if desired. This record of successive approximation showed two reasons for the slowness of the process. (1) In general a very close first approximation was required for convergence. This meant tat the search had to be conducted on a very fine mesh. (2) The process would often follow blind leads with the remainders continuing to decrease for many iterations before finally diverging. Because of the computing time required this method was discarded as impractical.

2. Graeffe’s Method27

Let r1,r2,…,rn be the zeros of the polynomial

  a0xn + a1xn-1 + … + an .

The coefficients are expressible in terms of the zeros as follows:16

  - a1/a0 = r1 + r2 + … + rn
  a2/a0 = r1r2 + r2r3 + … + rn-1rn
  - a3/a0 = r1r2r3 + …
  (-1)n an/a0 = r1r2…rn .

If the zeros are real and widely separated it is clear that

  - a1/a0 ≐ r1 ;
  - a2/a1 = (a2/a0)(- a0/a1) =
r1r2 (1 + r1r3/r1r2 + … + rn-1rn/r1r2)
r1 (1 + r2/r1 + r3/r1 + … + rn/r1)
≐ r2 ;

and in general

  - ak/ak-1 ≐ rk .

Thus a polynomial with widely separated roots breaks into a set of linear equations whose zeros are approximately the zeros of the original polynomial. More generally if a set of s zeros of equal magnitude occur and the rest are widely separated the polynomial breaks into (n–s) linear equations and a cluster of (s+1) successive coefficients defining a polynomial of degree s whose zeros are (approximately) the s zeros of equal modulus.

The Graeffe process consists in generating a sequence of polynomials such that the zeros of any one of the sequence are the negative squares of the zeros of the preceding polynomial. Applications of the process k times raises the original zeros to the 2k power. Hence if all zeros are distinct a few iterations serve to separate them widely. If b0,b1,…,bn are coefficients of the polynomials so generated, we have

  |b1/b0| ≐ |r12k|;     |b2/b1| ≐ |r22k|;   etc.

Hence |r1|,|r2|,…,|rn| may be obtained. More generally if rp+1,rp+2,…,rp+s are a set of s roots with equal moduli then the polynomial

  bpys + bp+1ys-1 + … + bp+s

has the s zeros, rp2k,…,rp+s2k. Therefore

  |bp+s/bp| = |rp+j2k|
and
  |rp| = |rp+1| = … = |rp+s| ≐ [|bp+s/bp|]1/(2ks).                     (5.32)

Thus the moduli of all the roots are obtainable.

Certain refinements of Graeffe’s method allow the zeros r1,r1,…,rn themselves to be evaluated. If, however, some of the zeros are complex this involves essentially the solution of the polynomial equation

  bpys + … + bp-s = 0

and so leads back to the original problem of solving a polynomial equation of high degree if s is large.

The root-squaring process is obtained by rearranging the polynomial equation

  a0xn + a1xn-1+ … + an = 0                                  (5.33)

so that odd powers of x appear on one side of the equation and even powers on the other. Squaring this relation gives

  (a0xn + a2xn-2 + …)2 = (a1xn-1 + a3xn-3 + …)2 .

The substitution y = -x2 then gives

  b0yn + b1yn-1 + … + bn = 0                                 (5.34)

where

  b0 = a02,
  b1 = a12 – 2a0a2,
  b1 = a12 – 2a1a3 + 2a0a4,
and in general
  bk = ak2 + 2 (-1)jak-jak+j ,     k = 0,1,2,…,n,                 (5.35)
where
  r = min[k,(n–k)].

Equation (5.35) defines the Graeffe root-squaring process and the roots of (5.34) are the negative squares of the roots of (5.33).

C. A General Method of Solution

The problem of finding all the zeros of a polynomial of high degree is of very frequent occurrence and a satisfactory general method of solution is greatly to be desired. The standard methods are unsatisfactory however because of their failure in certain special cases. For example, the Birge-Vieta and quadratic factor methods may fail to converge unless a very good first approximation is available; the method of false position and Sturm’s functions fail to locate complex roots; and methods such as that of Lin28 which are included under the process of penultimate remaindering converge only under certain conditions as established by Aitken25. The method of Graeffe also fails in certain cases as pointed out in the previous section.

A mathematician using a desk computer is not seriously troubled by these special cases since they may be fairly easily recognized and an intelligent choice of a different method usually resolves the difficulty. Conceivably a computer may also be programmed to recognize the special cases and choose a new method accordingly. However, decisions of this type which may appear simple to a human are exceedingly difficult to program and a single general method is to be preferred.

Moore has considered this situation in some detail29 and has suggested a possible general method of solution. Moore’s method is considered impractical because of the amount of computation required. The number of multiplications required is of the order n3 (where n is the degree of the polynomial) and each involves complex numbers represented with a floating decimal point.

A general method of solution is now proposed consisting of the following five steps:

1.   Apply the Graeffe process several times to determine
  |r1|,|r2|,…,|rn|
approximately.
2. Make the substitution y = x–d and determine the coefficients of the corresponding polynomial in y.
3. Re-apply the Graeffe process to determine
  |r1–d|,|r2–d|,…,|rn–d|.
4. Compute in succession the points of intersection of the two families of concentric circles; the one centered at the origin and having radii |r1|,|r2|,…,|rn|, the other at x = d and having radii |r1–d|,|r2–d|,…,|rn–d|.
5. Use the points so computed as initial approximations for (a) the Birge-Vieta process if the point is real, or (b) the quadratic factor method if the point is imaginary.

Since the zeros are known to lie on each of the two families of circles described in step 4 it is clear that the first four steps alone could provide solutions. Such a method would however suffer from the following disadvantages:

1.   The root-squaring process (requiring n2/2 multiplications) may have to be repeated a large number of times to evaluate |r1|,|r2|,…,|rn| to the accuracy required.
2. Further errors may be introduced in locating the intersection of circles which cut obliquely.
3. Some of the intersections do not correspond to zeros and must be rejected by some means of selection.

If, however, the method is used only to supply initial approximations for step 5 the above disadvantages cause no difficulty. For clearly, the intersections not corresponding to zeros will be rejected and the genuine approximations will be improved to the accuracy desired, by a method requiring only 2n or 4n multiplications per iteration and possessing quadratic convergence.

In general the Graeffe process should be repeated only enough times that the given approximations will converge to the desired zero. No general rule can be given for attaining this[d] but suppose the accuracy required is such that two zeros differing by less than one part in hundred need not be separated, i.e. they will appear as a pair of roots with moduli equal to the geometric mean of their true moduli.

Since (0.99)29 ≐ 0.005 it is clear that two roots with moduli in the ratio 0.99 will after 9 iterations have the ratio 0.005. In practice fewer than nine iterations may well be used, the number being increased if difficulties are encountered.

The choice of the parameter d in shifting the origin is arbitrary but should be made a function of the moduli of the roots. The choice of d = rmax/5 has been used with success. A more complex dependence on the moduli of the roots may be indicated for certain types of polynomials.

Since one iteration of the Graeffe process and a change of origin each require n2/2 multiplications, the whole process described (assuming k applications of the Graeffe process and neglecting the computation of the intersections of the circles) requires in all (k + ½)n2 multiplications. This figure compares favorably with the minimum of n2 multiplications required simply to evaluate the polynomials at each of the n zeros.

Because of the wide range in magnitude of the numbers derived from Graeffe’s process it is clear that floating point operation will be required. Floating point operation is probably to be desired in any case since careful scaling of the coefficients to avoid over-capacity numbers is rendered unnecessary.

The advantages offered by the proposed method may now be summarized:

1.   The method applies to any polynomial with real coefficients;
2. The method is uniform and only simple decisions need be provided for in the program;
3. Initial approximations are not required;
4. No preliminary estimate of bounds on the roots is necessary;
5. No preliminary scaling of the coefficients is required;
6. The use of complex numbers is avoided;
7.  The number of multiplications required is of the order of n2.

Further discussion of the method will be found in Chapter 6 where suitable checks and other programming details are considered.

Notes

a.   Uniformity is desirable in automatic computation since exceptions may require as much programming as the main process.
b. A complex multiplication requires four real multiplications and two additions.
c. The solution of the quartic given here appeared in substantially the same form in Progress Report No. 29 of the Harvard Computation Laboratory under the title “Practical Methods of the Quartic and Sextic”.
d. A detailed study of the behavior of the Graeffe process has been made by Ostrowski30 but for our purposes the following rough analysis will suffice.

 

Chapter 6   Machine Programs

A. Summary of Calculations

The general solution of the model proposed in Chapter 1 has been shown in Chapters 3 and 4 to be expressible in terms of the latent roots and principal vectors of the matrix (I - A)-1B. The particular solution for functions of the form z = geμt was shown in section 3F to be expressible as x = (I - A - μB)-1g.

The solution was obtained by the following steps:

1.   The matrices (I - A - μB)-1 were computed for the desired range of values of μ, namely μ = 0, 0.015, 0.020, 0.025, 0.030, 0.035.
2.   The matrix (I - A)-1 (obtained in (1) for μ = 0) was pre-multiplied into B to obtain (I - A)-1B.
3.   The Frame method (section 4C) was applied to the matrix (I - A)-1B to generate the characteristic equation and the matrices F0,F1,…,Fn, required in the calculation of the principal vectors.
4.   The method of section 5C was applied to the solution of the characteristic equation.
5.   The principal vectors of (I - A)-1B were evaluated by means of Eq. (4.11).
6.   The “modal” matrix S formed of the principal vectors was inverted.

All programs were made to handle a system of general order and in general the calculations for the six systems of order 5, 6, 10, 11, 20, 21 were run concurrently. This mode of operation had the advantage that the programs could be checked out on the low order system.

The programs employed will be examined in some detail in the following sections. The desirability of uniformity in machine programs discussed in section 2A will be stressed throughout.

B. Matrix Subroutines

1. Introduction

Since it was not clear at the outset what matrix methods might be required, it was decided to organize the programming around a number of basic subroutines which were designed to carry out all of the more important matrix operations likely to be encountered. In this way any method to be employed would consist of a relatively simple main program piecing together the subroutines required. So far as possible the use of the subroutines should reduce the programming of matrix operations to the simplicity of the programming of the corresponding algebraic operations.

In designing the subroutines a number of requirements were laid down. These requirements are listed below:

1.   The routines are to be completely general with respect to the order n (n < 60).
2.   Every matrix is to be stored so as to include a sum check for each row and column and must be stored in some standard way so that the location of any row many be completely specified by the row number and the location of the first element of the matrix.
3.   The ten fast storage registers 190-199 are reserved for certain control numbers (e.g., n is stored in 199) and other uses within the routines.
4.   The remaining fast storage registers 0-189 are not to be used except as directed by the main program.
5.   The operations required in calling the routine should be as simple as possible.
6.   Each routine should be as flexible as possible. For example, a single routine should be capable of recording on tape either a single specified row of a matrix or of recording the specified row followed by all the succeeding rows; the choice of operation to be made when the routine is called.
7.   All checks are, so far as possible, to be completed in the main program and not in the subroutines.
8.   Since they are expected to be used repeatedly the subroutines are to be made as efficient as possible in terms of computing time.

Since the fast storage registers can accommodate a matrix of order fifteen at most, the first requirement makes the use of slow storage registers necessary.

A vector x = [x1,x2,…,xn] will be said to be stored in location b if the components x1,x2,…,xn are stored respectively in the registers b+1,b+2,…,b+n. The negative sum of the components will be designated by x0, and will be stored in register b. In this way the location of the vector may be completely specified by the order n and the number b. Also each vector carries with it its own check since the sum of the elements x0,x1,x2,…,xn must be zero. The use of the negative sum makes the process of summing the check more uniform.

Each row of a matrix A = [aij] may be considered as a vector and will be stored as indicated above. The collection of negative row sums then forms a “zeroth” column of the matrix. In the same way negative column sums are included as a zero row and finally the negative sum of this row completes the (n+1)×(n+1) square array:

 
a00  a01  a02  …   a0n
a10 a11 a12 a1n
a20 a21 a22 a2n
an0 an1 an2 ann

where

 
a0i = – aji,             i = 0,1,2,…,n;
ai0 = – aij, i = 0,1,2,…,n.

Note that

 
a00 =   aij

by either definition above.

Sum checks prove extremely useful in linear operations and for this reason all matrices (including and output on magnetic tape) are stored as indicated above.

The simplest use of the sum-check is for checking the components of a vector each time it enters into the computation. As an example of other ways in which the sum-check may be used, consider the matrix multiplication

  y = Ax

which is normally defined as

  yi = aijxj,             i = 1,2,…,n.

Since the matrix A is stored with a zeroth row this definition may be extended to i = 0,1,2,…,n. Then

 
yi = aij xj
 = aij xj.

The quantity in parentheses is zero for each j so that finally

  yi = 0.                                           (6.1)

This last is an identity which checks the whole matrix multiplication. Note the complete uniformity of the operations.

In machine calculation the round-off error may cause a discrpancy of as much as (n+1) in the last column in the identity (6.1). Hence a tolerance of (n+1) should be allowed in checking. It should, however, be standard practice to then “correct” the sum y0 so that

  yi = 0.

In this way the only error to be allowed at each step is the maximum round-off error of that step. Otherwise the error accumulates and it becomes difficult to estimate what tolerance should be allowed.

A matrix may be stored most compactly by assigning the first element to the register following the last element of the proceeding row. It is necessary, however, to space the rows in slow storage so that the transfer of a given row to slow storage will not destroy numbers of the following row.[a] The rth row of a matrix A beginning at location b (which stores a00) is therefore stored at location b+rs where s is the minimum number of blocks of then registers required to store (n+2)[b] numbers. The number s is computed from n and is stored in register 198.

To illustrate the foregoing consider the matrix A stored at slow storage location 1830 where

 
A = 1    2  – 4 .
0 3 5
1 1 2

Then n = 3, s = 1 and the complete matrix is stored as shown below where the numerical quantities on the left of the arrow are stored in the corresponding registers indicated on the right.

 
11   – 2   – 6   – 3   →   1830   1831   1832   1833
1   1   2   – 4 1840   1841   1842   1843
– 8   0   3   5 1850   1851   1852   1853
– 4   1   1   2 1860   1861   1862   1863

The fast storgage registers 190-199 are utilized as follows:

 
Register   Contents
  190 Unassigned
  191 Unassigned
  192 Serial
  193 Page control
  194 Row number
  195 10 in low order columns
  196 1   in low order columns
  197 Column number
  198 s
  199 n

In order to meet the requirement of flexibility without increasing the number of orders required in calling the subroutine, the following device is used. If a certain control number (e.g., the row number r) is essentially positive then the positive absolute value may be used where necessary in the routine. Thus in calling the subroutine this control number may be entered either positivelly or negatively, and at some point in the subroutine the sign of the number may be used to make a choice between two alternative operations. Examples will be found in the discussion of the individual subroutines.

Although checks are prepared in the subroutine (e.g., the components of a vector transferred from slow storage will be summed), the final check is left to the main program if possible. This makes it possible to program reruns instead of check-stops, if desired, and also makes it easier to determine at what point in the main program a failure has occurred.

2. Floating Vector Operation

Floating point operation (see section 2A) offers decided advantages in reducing round-off error and in avoiding overflow.[c]

3. Matrix Programs

C. Matrix Inversions

1. A New Machine Method

2. Inversion of a Complex Matrix

D. The Method of Frame

E. Solution of the Characteristic Equation

1. The Graeffe Process

2. The Quadratic Factor Method

F. The Principal Vectors

If the Frame method (section 4C) is applied to a matrix A to obtain the matrices F0,F1,…,Fn, then Eq. (4.11) may be used to evaluate the adjoint Δ(A - λI) of the matrix A - λI. Denoting the ith column of the matrix Fj by fij, the ith column νi(λ) of the adjoint is given by

  νi(λ) = - [λn-1fi0 + λn-2fi1 + … + fi,n-1] .                       (6.10)

If λk is a latent root of the matrix A, any non-trivial columns of the adjoint Δ(A - λI) are proportional to the associated eigenvector. Hence it usually suffices to compute only a single column νik) for each distinct root λk. If multiple roots occur the required number of independent columns of the derived adjoint may be computed by differentiating Eq. (6.10).

The polynomial (6.10) may be evaluated most efficiently by grouping the operations as follows:

  νik) = - (…((λkfi0 + fi1k + fi2k + … + fi,n-1) .               (6.11)

This is simply an extension of the synthetic division process to a polynomial with vector coefficients. Because of the extreme range of magnitudes to be encountered in the quantities fij[s] the calculation requires floating point operation. The roots λk in general are complex so that floating point complex operation is required. Such operation is extremely slow and may be avoided by the use of floating vector operation as follows: The root λk is first put in the form λk = rkμk where rk is real and |μk| = 1. Equation (6.11) may now be written as

  νik) = - (…((rkn-1fi0k + rkn-2fi1k + … + fi,n-1) .             (6.12)

The powers of rk are formed using floating point operation and the vectors rkn-1-jfij, j = 0,1,…,n-1, are computed using floating point vector operation as described in Section A. If the largest exponent of this set of vectors is p then p+2 is subtracted from each exponent.[t] Thus the most significant quantities in the expression (6.12) will be of the order of 1 in the fourteenth machine column. Each vector may therefore be shifted by the amount indicated by the corresponding exponent and the subsequent calculations carried out at a fixed decimal point located to the left of the highest order column. Since |μk| = 1 the subsequent operations will not cause the occurrence of over-capacity numbers.

Notes

a.   If for example n = 6 the transfer of a vector to slow storage registers 0-6 also affects registers 7, 8 and 9 since numbers are transferred in groups of ten. This difficulty might be avoided by preceding each transfer to slow storage by a transfer from slow storage. In this way only the slow storage transfer registers which are changed in the interim will cause any change in slow storage. This would allow more efficient use of storage at a cost of some computing time.
b.   An extra location is reserved for an “exponent” as described in section 6B2.
c.   The occurrence of over-capacity numbers.
d.   Except for the “external transfer” orders which require a simple translation from the printer output. In the following programs this translation has been made.
e.   Note the assumption here that y is a multiple of 10 (y ≡ 0 mod 10). Hence the limitation noted in Table 0.
f.   See “Call and Conditional Call”, section 2B.
g.   High accuracy addition is used to avoid possible overflow which may occur in the course of addition even though the final sum is known to be within capacity. The read to and from the condition register is to provide a zero of the proper sign for the high order part of the number to be added.
h.   Subject to the limitation x ≡ 0 mod 10.
i.   If a sum of n products is rounded off after summing, the maximum error is one in the last column. If each product is rounded off before summing, the maximum error is n in the last column.
j.   The largest component before normalization then approaches the latent root.
k.   The number of orders must not exceed twenty-three. See “Call and Conditional Call”, section 2B.
l.   Forsythe32 has suggested the use of the pivot element whose absolute value is nearest the quantity n√|A|. The application of this method of course requires a knowledge of the determinant |A|.
m.   The solution may of course be obtained by inverting A and computing A-1c. This is, however, less efficient than the process here described.
n.   From the example of section 5C it is seen that if the moduli of two roots are in the ratio 0.99 the cross terms should, after nine iterations, be reduced to about .005 of the value of the square term.
o.   The Graeffe process must be carried out with floating point operation. See section 2A.
p.   Multiplicity is used here in the sense of roots of equal moduli.
q.   See section 5C for the definition of the circles.
r.   If m = 0 the new root is still divided out once as a final check.
s.   For example in the case of the system of order 20 the components of fil are of the order of unity while the components of fi,18 are of the order of 10-17. However, the vector fi18 may not be considered negligible since it becomes a dominant term for small values of the root λk.
t.   Any scalar multiple of an eigenvector is also an eigenvector so that multiplication by 10-p-2 is a valid operation.

 

Chapter 7   Results and Conclusions

The tables of initial data and final results appear in Appendices I-VII and are described in section A of this chapter. The chapter is concluded with remarks of a general nature in section B.

A. Table of Results

Equation (1.2) of Chapter 1 which describes the economic model under consideration is repeated here for reference:

  xi aijxj bij dxj/dt = zi ;     i = 1,2,…,n.                 (1.2)

As described in Chapter 1 the variables x1,x2,…xn represent the outputs of each of n industries making up a given economy. Table 27 of Appendix VII shows the industry classification[a]  adopted for three systems of order 21, 11 and 6. For the 21 industry classification the numbers 1-21 indicate which of the variables x1-x21 are assigned to represent each industry; e.g. x5 represents the output of the chemical industry. For the 11 and 6 industry classifications the converging lines indicate which industries of the 21 country classification are grouped together to be represented as a single industry in the lower order classification.

Three open[b]  systems of order 20, 10 and 5 are derived from the foregoing systems by suppressing in each the variable representing households and considering it instead as a final demand on the remaining system.

Tables 2-26 contain square matrices of various orders. Except in Tables 8-13 of Appendix III where the row and column number of each element is shown directly, the columns of the table represent the columns of the corresponding matrix. Because sum checks prove as useful in hand computation as in machine computation, each matrix is recorded with a zeroth row containing the negative sums of the columns and a zeroth column containing the negative sums of the rows as described in section 6B1.

The matrix A = [aij] of flow coefficients derived[a]  from data supplied by the Bureau of Labor Statistics is shown in Appendix I. The corresponding matrix of capital coefficients B = [bij], supplied by the Harvard Economic Research Project, appears in Appendix II. The broken lines in each of these tables (2-7) indicate the submatrix of coefficients which is retained in the open systems considered. The column and row sums must of course be adjusted for these systems.

Four decimal places were given in the initial data (Tables 2-7) and all results are rounded to four decimal places. Twelve decimal places were carried in the calculation of Tables 8-19. A final check on the results shown in Tables 20-24 and the agreement obtained are described at the end of this section.

To avoid repetition, the use of the tables in the solution of the economic model will be described only in terms of the system of order five and all references to tables will be to the fifth order system.

Expressed in matrix notation and specialized to the particular final demand functions to be considered, Eq. (1.2) appears as

  (I – A)x – B = geμt .                                     (7.1)

The general solution of Eq. (7.1) is a sum of

1.   A particular integral (section 3F) of the form
  x = [I – A – μB]-1geμt, and
2. A complementary function of the form x = SeJ-1tc .

(See Eqs. (3.31) and (3.12).) The general solution is therefore

  x = [I – A – μB]-1geμt + SeJ-1tc                             (7.2)

where g is a given vector determining the demand function and c is a vector to be determined by the initial conditions = x(0). Thus

  = (I – A – μB)-1g + Sc
and
  c = S-1 [ – (I – A – μB)-1g].                               (7.3)

The matrix[c]  S is given in Table 20, the inverse S-1 in Table 25, and the matrix [I – A – μB]-1 appears in Table 8 for various values of μ. Thus for a particular final demand specified by given values of μ and g, and a given initial condition it is possible to compute the vector c from Eq. (7.3).

The only quantity now left undefined in Eq. (7.2) is the matrix J. Let us denote the kth eigenvalue of the matrix (I – A)-1B by λk. Then λk appears as the number at the head of the kth column of the matrix S in Table 20. The matrix J is then a diagonal matrix with elements jkk = λk. Thus J-1 is a diagonal matrix with γk = 1/λk and hence eJ-1t is given by

 
eJ-1t = eγ1t         .
 eγ2t      
 eγ3t    
 eγ4t  
 eγ5t

Finally eJ-1tc is the column vector

  [ c1eγ1t, c2eγ2t, …, c5eγ5t ].

If one of the latent roots λk is zero then the results must be interpreted according to section 3D as follows: The corresponding component ck of the vector c is made zero. This in turn impose a linear restraint on the initial condition according to Eq. (7.3). For if S-1k is the kth row of the matrix S-1 then

  ck = S-1k [ – (I – A – μB)-1g] = 0.

The complete system of eigenvectors and latent roots is given for the systems of order 5, 6, 10 and 11. To each complex conjugate pair of latent roots there corresponds a conjugate pair of eigenvectors. These pairs appear in Appendix II as double columns separated by the symbol ±i. Partial solutions were also obtained for the system of order 20 but due to round-off error accumulation the complete set was not obtained to satisfactory accuracy. The final check used on the eigenvectors νk was the identity [A–λkk = 0. For all eigenvectors show in Appendix V agreement to six or more places was obtained.

B. Conclusions

The results of the present study show the feasibility of obtaining general solutions of large order systems of linear differential equations. The rapid accumulation of round-off error indicates the need for maintaining rather extreme accuracy in the calculations and suggests that the rate of error accumulation in a given process is of major importance. The method of solution of polynomial equations (proposed in section 5C) has proven satisfactory for polynomials of degree twenty and less. In spite of the rather large amount of calculation required by the root-squaring process it is believed that in the absence of initial approximations to the roots this is perhaps the most satisfactory method available for equations of high degree.

The matrix subroutines described in section 6B have proven very valuable in reducing problem preparation time in the present work. However, this does not necessarily imply that the subroutines possess practical value for general use. There are two major reasons for this situation:

1.   In using a subroutine the intimate knowledge possessed by the original programmer gives him a decided advantage over a programmer who wishes to use the subroutine according to simple written instructions without learning the program in detail.
2. In case of trouble in checking out a program the mathematician may be at a serious disadvantage if his program includes subroutines are unfamiliar to him.

As evidence of the practicability of the matrix subroutines for general use the present routines have been used with profit and a minimum of difficulty by another member of the Staff of the Computation Laboratory in the solution of a vibration problem.34  Indeed, many of the difficulties in the use of subroutines arise in possible conflicts in the use of sequence or number storage and these in turn stem in part from the rather limited storage capacity of most computers. The advantage of large storage capacity and of separate sequence and number storage combine to make the efficient use of a library of subroutines a practical possibility for Mark IV.

As a result of the experience gained in their use a number of changes in the subroutines are now proposed.

1.   Whereas in the initial programming the major emphasis was placed on efficiency in terms of computing time, it is believed that more emphasis should be placed on ease of use at the cost of some loss in efficiency. In particular the orders required to call a routine should be as simple and standard as possible.
2. To allow for the possible use of several subroutines in the same program the subroutines should be so designed that they may be moved to any section of the sequence storage simply by assigning a new set of (consecutive) line numbers to the orders of the program. The only difficulty arises in the making of calls within the routine and this may be cared for by making all calls relative to the present line number. This may be done by reading out of the line number register on any line which is later to be called and placing it in some storage register which is later used in making the call. Alternatively, this may be done by reading out of the line number register just prior to making a call and adding the necessary amount to this number to give the line number which is to be called.
3. In the present routines allowance was made for the programming of reruns in the main program by deferring the completion of checks until after the main program has been called. This makes added work in the preparation of the main program. Also the important advantage of reliability of a thoroughly tested and frequently used subroutine is lost at one of the most crucial points; namely in the check. However, the check may be completed in the routine and allowance made for a rerun by providing for the storage of a “rerun line” in some particular register to be called by the subroutine in the case of a check failure. Provision should also be made for introducing any desired tolerance for use in checks within the subroutine.

The machine computations were begun during the final testing of Mark IV when the machine was first coming into operation. Hence conditions were far from ideal and the lack of experience in programming and operation combined with machine failures to cause a considerable loss of time. However it is under just such adverse conditions that the relative ease of programming reruns on Mark IV proves especially valuable. The reliability of the computer has now been greatly improved and at present writing it has been operating with more than 85% good running time during the month of November 1953.

The possibilities of the Leontief dynamic economic model have not been exhausted by the present study. For example, a step-by-step solution might well be attempted which could take into account the irreversibility of certain economic processes such as the investment in capital goods. It is believed that the economic interpretation of the present results will prove valuable in pointing the direction to be taken by further research.

Notes

a.   Further details of the classification schemes and the aggregation of coefficients may be found in references 33 and 2.
b. See Chapter 1 or reference 2 for the definitions of the various terms used in this section.
c. The matrix S is the modal matrix whose columns are formed of the principal vectors of the matrix (I-A)-1B as described in Chapter 3. The number at the head of each column of S is the associated latent root. These roots therefore make up the diagonal elements of the matrix J. The principal vectors were obtained from the matrix (I-A)-1B, (shown in Table 14), by the methods described in Chapters 4-6. In the tables of Appendix V they are normalized to unit length.

 

Bibliography

1. Leontief, W., The Structure of American Economy, 1919-1939, Oxford University Press, 1951.
2. Leontief, W., Studies in the Structure of American Economy, Oxford University Press, 1953.
3. Mitchell, H.F., “The Machine Solution of Simultaneous Linear Systems”, Doctoral Theses, Harvard University, 1948.
4. Staff of the Computation Laboratory, Description of a Relay Calculator, Annals of the Computation Laboratory, Vol. XXIV, Harvard University Press, 1949.
5. Staff of the Computation Laboratory, A Description of the Mark IV Calculator, Annals of the Computation Laboratory, Vol. XXVIII, Harvard University Press (to be published).
6. Staff of the Computation Laboratory, “Final Report on Functions for Mark IV”, Harvard Computation Laboratory, Progress Report No. 22, Section II, May 1952.
7. Wilkes, M.V., Wheeler, D.J., and Gill, S., The Preparation of Programs for an Electronic Digital Computer, Addison-Wesley Press, 1951.
8. Aitken, A.C., Determinants and Matrices, Oliver and Boyd, Seventh Ed., 1951.
9. Frazer, R.A., Duncan, W.J., and Collar, R.A., Elementary Matrices, Cambridge University Press, 1950.
10. Turnbull, H.W., and Aitken, A.C., The Theory of Canonical Matrices, Blackie and Son, Ltd., London, 1952.
11. Geršgorin, S., “Über die Abgrenzung der Eigenwerte einer Matrix”, Izvestiia Akademie Nauk SSSR, VII, (Mat Klass.) (1931) pp. 749-754.
12. Aitken, A.C., “Studies in Practical Mathematics II. The evaluation of the latent roots and latent vectors of a matrix”, Proc. Roy. Soc., Edinburgh 57, 269-304 (1937).
13. Zurmuhl, R., Matrizen, Springer-Verlag, Berlin, 1950.
14. Feller, W., and Forsythe, G.E., “Matrix Transformations for Obtaining Characteristic Vectors”, Quarterly Journal of Applied Mathematics, 8 (1950).
15. Hotelling, H., “Some new methods in matrix calculation”, Annals of Mathematical Statistics, 14 (1943).
16. Dickson, L.E., Theory of Equations, John Wiley and Sons, Inc., New York.
17. Fettis, D.H.E., “A method for obtaining the characteristic equation of a matrix and computing the associated modal columns”, Quart. Jour. Appl. Math. 8, 206-212 (1950).
18. Jahn, H.A., “Improvement of an approximate set of latent roots and modal columns by methods akin to those of classical perturbation theory”, Quarterly Journal of Mechanics and Applied Mathematics, Vol. 1 (1948).
19. Marchant Methods, MM-225, Marchant Calculating Machine Co., Oakland, California.
20. Whittaker, E., and Robinson, G., The Calculation of Observations, 4th Ed., Blackie and Sons, Ltd., 1946.
21. Churchill, Ruel, V., Introduction to Complex Variables and Applications, McGraw-Hill, 1948.
22. Bairstow, L., Applied Aerodynamics, Longman’s Green & Co., London, 1920.
23. Milne, W.E., Numerical Calculus, Princeton University Press, 1949.
24. Hartree, D.R., Numerical Analysis, Oxford at the Clarendon Press, 1952.
25. Aitken, A.C., Studies in Practical Mathematics, VII., “On the Theory of Methods of Factorizing Polynomials by Iterated Division”, Proc. Roy. Soc., Edinburgh, Vol. 63, p. 326 (1951).
26.  Iverson, K.E., “The Zeros of the Partial Sums of e”, Mathematical Tables and Other Aids to Computation 7 (1953).
27. Lovitt, W.V., Elementary Theory of Equations, Prentice-Hall, New York 1939.
28. Lin, S., “A method for finding roots of algebraic equations”, Journal of Mathematics and Physics, Mass. Inst. Tech. 22, (1943).
29. Moore, E.F., “A new general method for finding roots of polynomial equation”, Math. Tables and Other Aids to Computation 3, (1949).
30. Ostrowskis, “Recherches sur la methode de Graeffe et les zeros des polynomes et des series de Laurent”, Acta Math. 72, p. 99-155.
31. MacDuffee, C.C., “Vectors and Matrices”, The Carus Mathematical Monographs, published by the Mathematical Association of America, revised, January 1949.
32. Forsythe, George E., “Theory of Selected Methods of Finite Matrix Inversion and Decomposition”, Lecture in Math. 136, notes by D.G. Aronson and K. Iverson. INA Report 52-55 (1951).
33. “Estimates of the capital structure of American industries, 1947”, Harvard Economic Research Project, (hectographed, November, 1953).
34.   Staff of the Computation Laboratory, “Problems of Critical Frequency for a Propeller Blade Subjected to a Pure First Order Aerodynamic Excitation” Harvard Computation Laboratory, Progress Report No. 29, Section I, November 1953.


created:  2012-07-25 06:10
updated:2013-08-02 23:30