The Levenberg – Marquardt Algorithm (LMA)
The Levenberg – Marquardt Algorithm (LMA) is a commonly used routine for least squares optimization problems. The current NLLS verb is based upon an APL application developed by Gavaris (1988) which has been used extensively in the oceans science community. Indeed, numerous subsequent applications in other languages (e.g. FORTRAN, C, ACON) have been developed. To this list is now added J.
There are two attachments to this wiki page. These can be obtained by clicking on the attachment tab at the bottom of the NLLS page. One is the NLLS script (nlls_j602_5 april 2010) along with an example application from Haddon (2001) while the other (ls_prob_2 jan 2010) is a least squares problem described in Bard (1974), starting in section 5.21 and being referred to in the subsequent chapters. These provide good examples of how NLLS works and is a useful guide for other applications.
In all gradient methods,
thetai+1 = thetai + ri * vi
where ri and vi are the step size and direction respectively required to change the parameter set theta in order to minimize some objective function.
A direction vi is acceptable if and only if there exists a positive definite matrix, Ri such that
vi = - Ri qi
where qi is the gradient vector (first partial derivative).
Thus the basic equation of the ith iteration in any gradient method is
thetai+1 = thetai – ri Ri qi
The various gradient methods differ in the manner of choosing ri and Ri. In the Newton-Raphson method (to guarantee quadratic convergence), ri is 1 and Ri equals Hi-1, the Hessian matrix of second partial derivatives. In general, there is a minimum if Hi-1 and thus Ri is positive definite.
In the Levenberg – Marquardt Algorithm (LMA)
- Ri = (Ai + li Bi2)-1
Bi2 is a diagonal matrix whose elements coincide with the absolute values of the diagonal elements of Ai. Regarding li, it can be shown that if Bi2 is any positive definite matrix, then Ai + li Bi2 is positive definite for sufficiently large li, no matter what Ai. As li approaches infinity, the term li Bi2 dominates Ai, resulting in an extremely short step in the downhill direction, Bi2 being positive definite. On the other hand, as li decreases, vi approaches - Ri qi = Ai-1 qi, the Newton-Raphson direction. Thus, ri, the step size, is indirectly determined through the choice of li and in the LMA, can be assumed to be one. In each iteration, progress towards minimization of the objective function is evaluated. If the residual sum of squares is not decreasing, li is increased according to some rule (e.g. multiplied by order of magnitude) and the objective function re-evaluated. This has the effect of exploring smaller steps until the residual sums of squares decreases. Once this occurs, li is decreased, thus increasing the step size. In essence, use of li ensures that Ri, the Hessian matrix, remains positive definite and the algorithm is always going towards a minimum and stays away from saddle points.
The routine NLLS allows for both unconstrained and constrained optimization by setting cnstrnt equal to either ‘OFF’ or ‘ON’. Constraints are achieved through a penalty function, p, added to the residual sum of squares produced by the objective function
p = SUM (a / h (theta))
For each theta, two h terms are defined
hu = ucnstrnt - theta hl = theta - lcnstrnt
The constant, a, is a measure of the influence of the constraint. The larger the a, the greater the impact of the constraint. Now, a is scaled by the initial value of each parameter
- a = constant * thetai
The constant is user defined. In NLLS, it is set at 0.001 but can obviously be changed if the user desires the penalty function to have more influence on the optimization. Use of a implies that each lower and upper constraint is standardized by the size of theta
p = SUM a / h (theta)
= constant * ( thetai / (theta - lcnstrnt)) + constant * ( thetai / ucnstrnt - theta)) ….
Thus, as the theta terms approach the constraint, p increases, causing a change in the direction of the optimization.
Inputs to NLLS
The required inputs of the NLLS verb are the y (dependent) variable, the x (independent) variables, a set of starting estimates of the parameters (initial) and the constraints. As well, the least squares objective function needs to be defined which from the y and x variables and the parameter set produces a list of residuals.
Within the NLLS verb, a number of variables relevant to the minimization are used. These can be changed by the user although through experience, this has been found to be rarely necessary. These include con = 10 , limit = 100 (maximum number of steps in main loop and the convergence and tolerance criteria in the whilst statement relating to the relative change in the parameter set and residual sum of squares between steps.
The NLLS verb is executed as
‘objective function name’ NLLS initial
Note that the noun initial can be a scalar (in the case of a one parameter model) as within NLLS, the y parameter set is formed into a list by the assignment par =. ,y
Experience with this routine is thus far limited to models with less than 40 parameters and 400 observations although in principle much larger models should be able to be run. One quite complicated model (with lots of processing within the objective function) took just over a minute (Windows XP32 on laptop with Intel T7700 (2.4 Ghz) processor and 2 GB ram).
The NLLS verb produces a list of the parameter estimates for each loop executed. A table (stats) is produced upon completion which indicates
• Number of main loops • Final value of lambda • Number of observations • Number of parameters • Residual sum of squares • Mean square residual • AIC (analog for least squares)
It also produces a number of global variables, including the inverse of the Hessian (hess_inv), the covariance matrix (cov), the standard deviation of each parameter (par_se), the correlation matrix (corr) and the coefficient of variation of each parameter (cv). The final parameter set is in the noun parm.
This code is offered to the J community as a tool to assist in modeling and its further development is encouraged. It was initially written when the author was just learning J and thus is a bit ‘loopy’. It is hoped that through use, it can be further optimized. It has received a fair bit of testing but errors are always a possibility. Thus, if and when these are encountered, these should be reported to the author who will make the changes on the wiki.
There's also a page on which to work on the code as a collaborative project.
Bard, Y. 1974. Nonlinear Parameter Estimation. Academic Press. New York. 341 pp.
Gavaris, S. 1988. An adaptive framework for the estimation of population size. CAFSAC res doc 88 / 29.
Haddon, M. 2001. Modelling and quantitative methods in fisheries. Chapman and Hall Press, New York. 406 pp.
Contributed by Bob O’Boyle (email@example.com)