NLopt Algorithms

(Difference between revisions)
 Revision as of 06:06, 12 November 2008 (edit)Stevenj (Talk | contribs) (→Low-storage BFGS)← Previous diff Revision as of 06:10, 12 November 2008 (edit)Stevenj (Talk | contribs) (→Preconditioned truncated Newton)Next diff → Line 199: Line 199: ===Preconditioned truncated Newton=== ===Preconditioned truncated Newton=== - This algorithm in NLopt (specified by NLOPT_LD_LBFGS), is based on a Fortran implementation of the low-storage BFGS algorithm written by Prof. Ladislav Luksan, and graciously posted online under the GNU LGPL at: + This algorithm in NLopt (specified by NLOPT_LD_LBFGS), is based on a Fortran implementation of a preconditioned inexact truncated Newton algorithm written by Prof. Ladislav Luksan, and graciously posted online under the GNU LGPL at: * http://www.uivt.cas.cz/~luksan/subroutines.html * http://www.uivt.cas.cz/~luksan/subroutines.html - The original L-BFGS algorithm, based on Strang recurrences, was described by the papers: + NLopt includes several variations of this algorithm by Prof. Luksan. First, a variant preconditioned by the low-storage BFGS algorithm with steepest-descent restarting, specified as NLOPT_LD_TNEWTON_PRECOND_RESTART. Second, simplified versions NLOPT_LD_TNEWTON_PRECOND (same without restarting), NLOPT_LD_TNEWTON_RESTART (same without preconditioning), and NLOPT_LD_TNEWTON (same without restarting or preconditioning). - * J. Nocedal, "Updating quasi-Newton matrices with limited storage," ''Math. Comput.'' '''35''', 773-782 (1980). + The algorithms are based on the ones described by: - * D. C. Liu and J. Nocedal, "On the limited memory BFGS method for large scale optimization," ''Math. Programming' '''45''', p. 503-528 (1989). + + * R. S. Dembo and T. Steihaug, "Truncated Newton algorithms for large-scale optimization," ''Math. Programming'' '''26''', p. 190-212 (1982). I converted Prof. Luksan's code to c with the help of [[w:f2c|f2c]], and made a few minor modifications (mainly to include the NLopt termination criteria). I converted Prof. Luksan's code to c with the help of [[w:f2c|f2c]], and made a few minor modifications (mainly to include the NLopt termination criteria).

Revision as of 06:10, 12 November 2008

NLopt includes implementations of a number of different optimization algorithms. These algorithms are listed below, including links to the original source code (if any) and citations to the relevant articles in the literature (see Citing NLopt).

Even where I found available free/open-source code for the various algorithms, I modified the code at least slightly (and in some cases noted below, substantially) for inclusion into NLopt. I apologize in advance to the authors for any new bugs I may have inadvertantly introduced into their code.

Nomenclature

Each algorithm in NLopt is identified by a named constant, which is passed to the NLopt routines in the various languages in order to select a particular algorithm. These constants are of the form `NLOPT_{G,L}{N,D}_xxxx`, where `G`/`L` denotes global/local optimization and `N`/`D` denotes derivative-free/gradient-based algorithms, respectively.

For example, the `NLOPT_LN_COBYLA` constant refers to the COBYLA algorithm (described below), which is a local (`L`) derivative-free (`N`) optimization algorithm.

Many of the algorithms have several variants, which are grouped together below.

Global optimization

All of the global-optimization algorithms currently require you to specify bound constraints on all the optimization parameters.

DIRECT and DIRECT-L

DIRECT is the DIviding RECTangles algorithm for global optimization, described in:

• D. R. Jones, C. D. Perttunen, and B. E. Stuckmann, "Lipschitzian optimization without the lipschitz constant," J. Optimization Theory and Applications, vol. 79, p. 157 (1993).

and DIRECT-L is the "locally biased" variant proposed by:

• J. M. Gablonsky and C. T. Kelley, "A locally-biased form of the DIRECT algorithm," J. Global Optimization, vol. 21 (1), p. 27-37 (2001).

These is are deterministic-search algorithms based on systematic division of the search domain into smaller and smaller hyperrectangles. The Gablonsky version makes the algorithm "more biased towards local search" so that it is more efficient for functions without too many local minima. NLopt contains several implementations of both of these algorithms. I would tend to try `NLOPT_GN_DIRECT_L` first; YMMV.

First, it contains a from-scratch re-implementation of both algorithms, specified by the constants `NLOPT_GN_DIRECT` and `NLOPT_GN_DIRECT_L`, respectively.

Second, there is a slightly randomized variant of DIRECT-L, specified by `NLOPT_GLOBAL_DIRECT_L_RAND`, which uses some randomization to help decide which dimension to halve next in the case of near-ties.

The DIRECT and DIRECT-L algorithms start by rescaling the bound constraints to a hypercube, which gives all dimensions equal weight in the search procedure. If your dimensions do not have equal weight, e.g. if you have a "long and skinny" search space and your function varies at about the same speed in all directions, it may be better to use unscaled variants of these algorthms, which are specified as `NLOPT_GLOBAL_DIRECT_NOSCAL`, `NLOPT_GLOBAL_DIRECT_L_NOSCAL`, and `NLOPT_GLOBAL_DIRECT_L_RAND_NOSCAL`, respectively. These implementations have a number of hard-coded limitations on things like the number of function evaluations; I removed several of these limitations, but some remain.

Finally, NLopt also includes separate implementations based on the original Fortran code by Gablonsky et al. (1998-2001), which are specified as `NLOPT_GN_ORIG_DIRECT` and `NLOPT_GN_ORIG_DIRECT_L`.

Most of the above algorithms only handle bound constraints, and in fact require finite bound constraints (they are not applicable to unconstrained problems). They do not handle arbitrary nonlinear constraints. However, the `ORIG` versions by Gablonsky et al. include some support for arbitrary constraints, if you represent constraint violations by having your objective function return NaN or infinity (`HUGE_VAL`, in C).

Controlled Random Search (CRS) with local mutation

My implementation of the "controlled random search" (CRS) algorithm (in particular, the CRS2 variant) with the "local mutation" modification, as defined by:

• P. Kaelo and M. M. Ali, "Some variants of the controlled random search algorithm for global optimization," J. Optim. Theory Appl. 130 (2), 253-264 (2006).

The original CRS2 algorithm was described by:

• W. L. Price, "A controlled random search procedure for global optimization," in Towards Global Optimization 2, p. 71-84 edited by L. C. W. Dixon and G. P. Szego (North-Holland Press, Amsterdam, 1978).

The CRS algorithms are sometimes compared to genetic algorithms, in that they start with a random "population" of points, and randomly "evolve" these points by heuristic rules. In this case, the "evolution" somewhat resembles a randomized Nelder-Mead algorithm.

Only bound-constrained problems are supported by this algorithm.

This is my implementation of the "Multi-Level Single-Linkage" (MLSL) algorithm for global optimization by a sequence random local optimizations, proposed by:

• A. H. G. Rinnooy Kan and G. T. Timmer, "Stochastic global optimization methods," Mathematical Programming, vol. 39, p. 27-78 (1987). (Actually 2 papers — part I: clustering methods, p. 27, then part II: multilevel methods, p. 57.)

We also include a modification of MLSL use a Sobol low-discrepancy sequence (LDS) instead of pseudorandom numbers, which was argued to improve the convergence rate by:

• Sergei Kucherenko and Yury Sytsko, "Application of deterministic low-discrepancy sequences in global optimization," Computational Optimization and Applications, vol. 30, p. 297-318 (2005).

In either case, MLSL is a "multistart" algorithm: it works by doing a sequence of local optimizations (using some other local optimization algorithm) from random or low-discrepancy starting points. MLSL is distinguished, however by a "clustering" heuristic that helps it to avoid repeated searches of the same local optima, and has some theoretical guarantees of finding all local optima in a finite number of local minimizations.

The local-search portion of MLSL can use any of the other algorithms in NLopt, and in particular can use either gradient-based (`D`) or derivative-free algorithms (`N`) The local search uses the derivative/nonderivative algorithm set by nlopt_set_local_search_algorithm (currently defaulting to `NLOPT_LD_MMA` and `NLOPT_LN_COBYLA` for derivative/nonderivative searches, respectively).

LDS-based MLSL with gradient-based local search is specified as `NLOPT_GD_MLSL_LDS`, while LDS-based MLSL with derivative-free local search is specified by `NLOPT_GN_MLSL_LDS`. The non-LDS original MLSL (using pseudo-random numbers, currently via the Mersenne twister algorithm) is indicated by `NLOPT_GD_MLSL` and `NLOPT_GN_MLSL`, respectively.

StoGO

by Madsen et al. StoGO is a global optimization algorithm that works by systematically dividing the search space (which must be bound-constrained) into smaller hyper-rectangles via a branch-and-bound technique, and searching them by a gradient-based local-search algorithm (a BFGS variant), optionally including some randomness (hence the "Sto", which stands for "stochastic" I believe).

StoGO is written in C++, which means that it is only included when you compile the [[NLopt Installation#NLopt with C++ algorithms|C++ algorithms]] enabled.

StoGO is specified within NLopt by `NLOPT_GD_STOGO`, or `NLOPT_GD_STOGO_RAND` for the randomized variant.

Some referenes on StoGO are:

• S. Gudmundsson, "Parallel Global Optimization," M.Sc. Thesis, IMM, Technical University of Denmark, 1998.
• K. Madsen, S. Zertchaninov, and A. Zilinskas, "Global Optimization using Branch-and-Bound," unpublished (1998). A preprint of this paper is included in the `stogo` subdirectory of NLopt as `paper.pdf`.
• S. Zertchaninov and K. Madsen, "A C++ Programme for Global Optimization," IMM-REP-1998-04, Department of Mathematical Modelling, Technical University of Denmark, DK-2800 Lyngby, Denmark, 1998. A copy of this report is included in the `stogo` subdirectory of NLopt as `techreport.pdf`.

Local derivative-free optimization

Of these algorithms, only COBYLA currently supports arbitrary nonlinear inequality constraints; the rest of them support bound-constrained or unconstrained problems only.

COBYLA (Constrained Optimization BY Linear Approximations)

This is a derivative of Powell's implementation of the COBYLA (Constrained Optimization BY Linear Approximations) algorithm for derivative-free optimization with nonlinear inequality constraints, by M. J. D. Powell, described in:

• M. J. D. Powell, "A direct search optimization method that models the objective and constraint functions by linear interpolation," in Advances in Optimization and Numerical Analysis, eds. S. Gomez and J.-P. Hennart (Kluwer Academic: Dordrecht, 1994), p. 51-67.

and reviewed in:

• M. J. D. Powell, "Direct search algorithms for optimization calculations," Acta Numerica 7, 287-336 (1998).

It constructs successive linear approximations of the objective function and constraints via a simplex of n+1 points (in n dimensions), and optimizes these approximations in a trust region at each step.

The original code itself was written in Fortran by Powell and was converted to C in 2004 by Jean-Sebastien Roy (js@jeannot.org) for the SciPy project. The version in NLopt was based on Roy's C version, downloaded from:

and slightly modified (e.g. to incorporate all the NLopt termination criteria).

It is specified within NLopt as `NLOPT_LN_COBYLA`.

NEWUOA + bound constraints

This is an algorithm derived from the NEWUOA subroutine of M. J. D. Powell, converted to C and modified for the NLopt stopping criteria. I also modified the code to include a variant, NEWUOA-bound, that permits efficient handling of bound constraints.

The original NEWUOA performs derivative-free unconstrained optimization using an iteratively constructed quadratic approximation for the objective function. See:

• M. J. D. Powell, "The NEWUOA software for unconstrained optimization without derivatives," Proc. 40th Workshop on Large Scale Nonlinear Optimization (Erice, Italy, 2004).

The original algorithm is specified in NLopt as `NLOPT_LN_NEWUOA`, and only supports unconstrained problems. For bound constraints, my variant is specified as `NLOPT_LN_NEWUOA_BOUND`.

In the original NEWUOA algorithm, Powell solved the quadratic subproblems (in routines TRSAPP and BIGLAG) in a spherical trust region via a truncated conjugate-gradient algorithm. In my bound-constrained variant, we use the MMA algorithm for these subproblems to solve them with both bound constraints and a spherical trust region. In principle, we should also change the BIGDEN subroutine in a similar way (since BIGDEN also approximately solves a trust-region subproblem), but instead I just truncated its result to the bounds (which probably gives suboptimal convergence, but BIGDEN is called only very rarely in practice).

PRAXIS (PRincipal AXIS)

"PRAXIS" gradient-free local optimization via the "principal-axis method" of Richard Brent, based on a C translation of Fortran code downloaded from Netlib:

The original Fortran code was written by Richard Brent and made available by the Stanford Linear Accelerator Center, dated 3/1/73. The appropriate reference seems to be:

• Richard Brent, Algorithms for Minimization without Derivatives (Prentice-Hall, 1972). (Reprinted by Dover, 2002.)

Specified in NLopt as `NLOPT_LN_PRAXIS`

This algorithm was originally designed for unconstrained optimization. In NLopt, bound constraints are "implemented" in PRAXIS by the simple expedient of returning infinity (Inf) when the constraints are violated (this is done automatically—you don't have to do this in your own function). This seems to work, more-or-less, but appears to slow convergence significantly. If you have bound constraints, you are probably better off using COBYLA or NEWUOA-bound.

My implementation of almost the original Nelder-Mead simplex algorithm (specified in NLopt as `NLOPT_LN_NELDERMEAD`), as described in:

• J. A. Nelder and R. Mead, "A simplex method for function minimization," The Computer Journal 7, p. 308-313 (1965).

This method is simple and has demonstrated enduring popularity, despite the later discovery that it fails to converge at all for some functions. Anecdotal evidence suggests that it often performs well even for noisy and/or discontinuous objective functions. I would tend to recommend the Subplex method (below) instead, however.

The main change compared to the 1965 paper is that I implemented explicit support for bound constraints, using essentially the method proposed in:

• M. J. Box, "A new method of constrained optimization and a comparison with other methods," Computer J. 8 (1), 42-52 (1965).

and later reviewed in:

• J. A. Richardson and J. L. Kuester, "The complex method for constrained optimization," Commun. ACM 16 (8), 487-489 (1973).

Whenever a new point would lie outside the bound constraints, Box advocates moving it "just inside" the constraints. I couldn't see any advantage to using a fixed distance inside the constraints, especially if the optimum is on the constraint, so instead I move the point exactly onto the constraint in that case. The danger with implementing bound constraints in this way (or by Box's method) is that you may collapse the simplex into a lower-dimensional subspace. I'm not aware of a better way, however. In any case, this collapse of the simplex is somewhat ameliorated by restarting, such as when Nelder-Mead is used within the Subplex algorithm below.

Sbplx (based on Subplex)

This is my re-implementation of Tom Rowan's "Subplex" algorithm. As Rowan expressed a preference that other implementations of his algorithm use a different name, I called my implementation "Sbplx" (referred to in NLopt as `NLOPT_LN_SBPLX`).

Subplex (a variant of Nelder-Mead that uses Nelder-Mead on a sequence of subspaces) is claimed to be much more efficient and robust than the original Nelder-Mead, while retaining the latter's facility with discontinuous objectives, and in my experience these claims seem to be true. (However, I'm not aware of any proof that Subplex is globally convergent, and may fail for some objectives like Nelder-Mead; YMMV.)

I used the description of Rowan's algorithm in his PhD thesis:

• T. Rowan, "Functional Stability Analysis of Numerical Algorithms", Ph.D. thesis, Department of Computer Sciences, University of Texas at Austin, 1990.

I would have preferred to use Rowan's original implementation, posted by him on Netlib:

Unfortunately, the legality of redistributing or modifying this code is unclear, because it lacks anything resembling a license statement. After some friendly emails with Rowan in which he promised to consider providing a clear open-source/free-software license, I lost touch with him and is old email address seems invalid.

Since the algorithm is not too complicated, however, I just rewrote it. There seem to be slight differences between the behavior of my implementation and his (probably due to different choices of initial subspace and other slight variations, where his paper was ambiguous), but the number of iterations to converge on my test problems seems to be quite close (within 10% of the number of function evaluations for most problems, sometimes fewer, sometimes more).

The only major difference between my implementation and Rowan's, as far as I can tell, is that I implemented explicit support for bound constraints (via the method in the Box paper as described above). This seems to be a big improvement in the case where the optimum lies against one of the constraints.

Of these algorithms, only MMA supports arbitrary nonlinear inequality constraints; the rest support bound-constrained or unconstrained problems only.

MMA (Method of Moving Asymptotes)

My implementation of the globally-convergent method-of-moving-asymptotes (MMA) algorithm for gradient-based local optimization, specified in NLopt as `NLOPT_LD_MMA`, as described in:

• Krister Svanberg, "A class of globally convergent optimization methods based on conservative convex separable approximations," SIAM J. Optim. 12 (2), p. 555-573 (2002).

This is an improved variant of the original MMA algorithm published by Svanberg in 1987.

At each point x, MMA forms a local approximation using the gradient of f and the constraint functions, plus a quadratic "penalty" term to make the approximations "conservative" (upper bounds for the exact functions). The precise approximation MMA forms is difficult to describe in a few words, because it includes nonlinear terms consisting of a poles at some distance from x (outside of the current trust region), almost a kind of Pade approximant. The main point is that the approximation is both convex and separable, making it trivial to solve the approximate optimization by a dual method. Optimizing the approximation leads to a new candidate point x. The objective and constraints are evaluated at the candidate point. If the approximations were indeed conservative (upper bounds for the actual functions at the candidate point), then the process is restarted at the new x. Otherwise, the approximations are made more conservative (by increasing the penalty term) and re-optimized.

(If you contact Professor Svanberg, he has been willing in the past to graciously provide you with his original code, albeit under restrictions on commercial use or redistribution. The MMA implementation in NLopt, however, is completely independent of Svanberg's, whose code we have not examined; any bugs are my own, of course.)

Low-storage BFGS

This algorithm in NLopt (specified by NLOPT_LD_LBFGS), is based on a Fortran implementation of the low-storage BFGS algorithm written by Prof. Ladislav Luksan, and graciously posted online under the GNU LGPL at:

The original L-BFGS algorithm, based on variable-metric updates via Strang recurrences, was described by the papers:

• J. Nocedal, "Updating quasi-Newton matrices with limited storage," Math. Comput. 35, 773-782 (1980).
• D. C. Liu and J. Nocedal, "On the limited memory BFGS method for large scale optimization," Math. Programming' 45, p. 503-528 (1989).

I converted Prof. Luksan's code to c with the help of f2c, and made a few minor modifications (mainly to include the NLopt termination criteria).

Preconditioned truncated Newton

This algorithm in NLopt (specified by NLOPT_LD_LBFGS), is based on a Fortran implementation of a preconditioned inexact truncated Newton algorithm written by Prof. Ladislav Luksan, and graciously posted online under the GNU LGPL at:

NLopt includes several variations of this algorithm by Prof. Luksan. First, a variant preconditioned by the low-storage BFGS algorithm with steepest-descent restarting, specified as `NLOPT_LD_TNEWTON_PRECOND_RESTART`. Second, simplified versions `NLOPT_LD_TNEWTON_PRECOND` (same without restarting), `NLOPT_LD_TNEWTON_RESTART` (same without preconditioning), and `NLOPT_LD_TNEWTON` (same without restarting or preconditioning).

The algorithms are based on the ones described by:

• R. S. Dembo and T. Steihaug, "Truncated Newton algorithms for large-scale optimization," Math. Programming 26, p. 190-212 (1982).

I converted Prof. Luksan's code to c with the help of f2c, and made a few minor modifications (mainly to include the NLopt termination criteria).