Levenberg Marquardt for multivariable nonlinear least squares regression

by Trent Guidry12. August 2009 14:01
The Levenberg Marquardt algorithm is a modification of the Gauss Newton algorithm and is a fairly widely used method for solving least squares nonlinear regression problems. While the Levenberg Marquardt algorithm tends to converge somewhat slower than the Gauss Newton algorithm, it usually has far fewer problems with divergence and instability.
 
Multiple linear least squares regression was derived for linear systems.
Multiple Linear Regression
Mulitple Linear Regression Jacobian
Gauss Newton linearizes a non linear system and applies the multiple linear least squares regression formula.  Gauss Newton often has stability issues, especially when then starting point is far from the solution point.
Gauss Newton
Levenberg modified the Gauss Newton algorithm and added a gradient descent term to assist with convergence.
Levenberg
Marquardt modified the Levenberg equation by replacing the identity term with the diagonal of the Jacobian trasponse times the Jacobian, which  created the  Levenberg Marquardt algorithm which is shown below.  This was done to improve the rate of convergence when compared to the Levenberg algorithm.
Levenberg Marquardt
Where the variable definitions for this equation are the same as in the previous post on the Gauss Newton algorithm.
The Levenberg Marquardt algorithm works by starting with an initial guess of the parameters. These are used as P0. The residual error and the Jacobian are then solved for. The equation above is then used to solve for P1. At this new P1 the residual error is computed. If the new residual error is greater than the original residual error, then the new P1 is rejected, λ is increased, and a new P1 is calculated. If the new residual error is less than the starting residual error, then the new P1 values are accepted and λ is decreased. The new accepted P1 becomes the new P0 and the algorithm is repeated until it either converges or goes unstable.
The code for the Levenberg Marquardt class is shown at the bottom of this post.
The λ factor used in this implementation of the algorithm is similar to that in the Wikipedia article, with λ0 set to 100 and v to 10.
There are two constructors for the LevenbergMarquardt class. The first takes the function to be regressed, an array of regression parameters, a Parameter array of the independent variables, a double array of the data, and an integer that indicates the number of points to use when numerically computing the derivatives in the Jacobian. The second constructor is similar to the first except that it defaults the number of parameters to use when computing the derivatives to 3. The constructor for this class allocates memory for the Jacobian, residual error, and initial regression parameter values that are stored in matrix fields.  It also creates the an instance of the derivatives class that will be used to compute the Jacobian.
The Iterate function invokes an iteration of the Levenberg Marquardt algorithm. This function first iterates through the data points and computes the Jacobian matrix using numerical partial derivatives. The Levenberg Marquardt equation is then used to solve for the next set of parameters and the model parameter values are then set to these values.
using System;
using System.Diagnostics;
using NumericalMethods.FourthBlog;namespace NumericalMethods
{public class LevenbergMarquardt{private Matrix _jacobian;private Matrix _residuals;private Matrix _regressionParameters0;private Derivatives _derivatives;private Parameter[] _regressionParameters;private Parameter[] _observedParameters;private Func<double> _regressionFunction;private double[,] _data;private double _l0 = 100.0;private double _v = 10.0;public LevenbergMarquardt(Func<double> regressionFunction, Parameter[] regressionParameters, Parameter[] observedParameters, double[,] data, int numberOfDerivativePoints){Debug.Assert(data.GetLength(0) == observedParameters.Length + 1);_data = data;_observedParameters = observedParameters;_regressionParameters = regressionParameters;_regressionFunction = regressionFunction;int numberOfParameters = _regressionParameters.Length;int numberOfPoints = data.GetLength(1);_derivatives = new Derivatives(numberOfDerivativePoints);_jacobian = new Matrix(numberOfPoints, numberOfParameters);_residuals = new Matrix(numberOfPoints, 1);_regressionParameters0 = new Matrix(numberOfParameters, 1);}public LevenbergMarquardt(Func<double> function, Parameter[] regressionParameters, Parameter[] observedParameters, double[,] data) :this(function, regressionParameters, observedParameters, data, 3){}public void Iterate(){int numberOfPoints = _data.GetLength(1);int numberOfParameters = _regressionParameters.Length;double currentResidual = 0.0;for (int i = 0; i < numberOfPoints; i++){for (int j = 0; j < _observedParameters.Length; j++){_observedParameters[j].Value = _data[j, i];}double functionValue = _regressionFunction();double residual = _data[_observedParameters.Length, i] - functionValue;_residuals[i, 0] = residual;currentResidual += residual * residual;for (int j = 0; j < numberOfParameters; j++){_jacobian[i, j] = _derivatives.ComputePartialDerivative(_regressionFunction, _regressionParameters[j], 1, functionValue);}}for (int i = 0; i < numberOfParameters; i++){_regressionParameters0[i, 0] = _regressionParameters[i];}Matrix jacobianTranspose = _jacobian.Transpose();Matrix jacobianTransposeResiduals = jacobianTranspose * _residuals;Matrix jacobianTransposeJacobian = jacobianTranspose * _jacobian;Matrix jacobianTransposeJacobianDiagnol = new Matrix(jacobianTransposeJacobian.RowCount, jacobianTransposeJacobian.RowCount);for (int i = 0; i < jacobianTransposeJacobian.RowCount; i++){jacobianTransposeJacobianDiagnol[i, i] = jacobianTransposeJacobian[i, i];}Matrix delta = null;Matrix newRegressionParameters = null;double newResidual = currentResidual + 1.0;_l0 /= _v;while (newResidual > currentResidual){newResidual = 0.0;_l0 *= _v;Matrix matLHS = jacobianTransposeJacobian + _l0 * jacobianTransposeJacobianDiagnol;delta = matLHS.SolveFor(jacobianTransposeResiduals); ;newRegressionParameters = _regressionParameters0 + delta;for (int i = 0; i < numberOfParameters; i++){_regressionParameters[i].Value = newRegressionParameters[i, 0];}for (int i = 0; i < numberOfPoints; i++){for (int j = 0; j < _observedParameters.Length; j++){_observedParameters[j].Value = _data[j, i];}double functionValue = _regressionFunction();double residual = _data[_observedParameters.Length, i] - functionValue;newResidual += residual * residual;}}_l0 /= _v;}}
}

Comments (15) -

Bob
BobFrance
8/29/2009 6:25:33 PM #

This method is well for non-singular matrix. But when the matrice is singular the LU transformation failed.
I think the SVD resolution can solve this problem. Do you an implementation of the SVD for the Levenberg-Marquardt method?

Trent
TrentUnited States
8/31/2009 9:15:50 AM #

Yes.  If the matrix becomes singular, then the LU decomposition fails.   That blows up the whole algorithm.

Fortunately, at least for me and the problems I usually deal with, that is a fairly uncommon occurrence.

Adding an SVD decomposition is one of those things on my very long list of things I want to do but haven’t done yet.

Unfortunately, I have no idea when I will have time to get around to doing that though.

C
CBrazil
10/26/2009 8:08:42 PM #

Hi Trent

First let me say you have a very interesting blog.

In a problem I have the matrix is always going singular after a few iterations. I guess the problem is because I have much more free parameters than input/output observations. What could I do in this case?

Thanks,

Trent
TrentUnited States
10/30/2009 2:23:19 PM #

Yes, if you have more regression parameters than you do number of data points, then the matrix probably would become singular.
  
For that case, the problem could probably be exactly solved if the number of unknowns equaled the number of knowns.  In that case the residual error would be 0.  If the number of unknowns exceeded the number of knowns, then there would be an infinite number of solutions that give the minimum square error of 0.

C
CBrazil
10/31/2009 8:16:47 AM #

Thanks!

I tried using SVD instead of LU, so the decomposition doesn't fails anymore. But I don't know if the SVD solution for the Levenberg-Marquardt equation is still valid when the Jacobian is becoming singular. Neither do I know if the extra performance impact of performing SVD is worth just to avoid the exceptions. May I ask how would you deal with this problem, besides changing the number of regression and observation parameters?

Regards,

Trent
Trent
11/5/2009 4:08:41 AM #

If you have more regression parameters than observed points, then there is a good chance that there isn’t a unique solution.  In that case, more than one solution exists that gives the minimum residual error of 0.

The best approach would probably be to reduce the number of regression parameters.  If you can’t do that, fixing some regression parameters to some value, say 0, and regressing the others might work.

Alternatively, a different regression method could be used.  Instead of using the Levenberg Marquardt algorithm,  a function could be created that returns the sum of the square error for a given set of regression parameters.  A general optimization routine, such as the BFGS algorithm, using the regression parameters as optimization parameters, could then be used to solve for the parameter values that minimize the square error.

Radoslav Hristov
Radoslav HristovUnited Kingdom
2/1/2010 3:49:12 AM #

Hi,
I've tried without success to modify the code to use SVD instead of LU but so far I fail to make it work. Any chance of posting the SVD version of the code?

Thanks,
Rad

Trent
Trent
2/1/2010 10:09:39 AM #

I would like to eventually add an SVD decomposition, but I really haven't had time yet to do that. I have no idea right now on an ETA for that one.

Trent

Radoslav Hristov
Radoslav HristovUnited Kingdom
2/3/2010 1:13:56 PM #

I've managed to make my SVD implmentation work but it returns wrong results for all the cases where there would be singular values. For the rest of the cases it seems to be working which is strange. I'm obviously missing something.
Rad

C
CBrazil
11/3/2009 1:27:36 AM #

Trent,

Isn't the Levenberg Marquardt equation

(JtJ + λI)(P1 – P0) = JtEp0

instead of

(JtJ + λdiag(JtJ))(P1 – P0) = JtEp0

as its stated in your blog post?

Trent
Trent
11/5/2009 4:21:49 AM #

I think the equation you listed above is the Levenberg algorithm, which predates the Levenberg Marquardt algorithm.

There is more information on that in the link below.

http://www.cs.toronto.edu/~roweis/notes/lm.pdf

Harry
HarryUnited Kingdom
9/12/2009 11:12:39 PM #

How can I estimate the confidence interval from this class

Trent
TrentUnited States
9/20/2009 9:41:23 AM #

That is yet another one of those things that I plan to eventually do and post about, but haven’t yet done.

Harry
HarryUnited Kingdom
10/29/2009 8:56:26 PM #

Hey Trent, Will you consider working in Nigeria? contact me on my email

Trent
Trent
10/30/2009 2:24:17 PM #

Thanks for the interest, but I recently accepted another job in the Houston area.