Question

Can someone explain LM algorithm, GDA algorithm, and BFGS Algorithm with example and detail steps...

Can someone explain LM algorithm, GDA algorithm, and BFGS Algorithm with example and detail steps please.thank you

0 0
Add a comment Improve this question Transcribed image text
Answer #1

`Hey,

Note: Brother in case of any queries, just comment in box I would be very happy to assist all your queries

Levenberg-Marquardt is a popular alternative to the Gauss-Newton method of finding the minimum of a function F (x) that is a sum of squares of nonlinear functions,

F(x)=1/2sum_(i=1)^m[f_i(x)]^2.

Let the Jacobian of fi (x) be denoted J_i(x), then the Levenberg-Marquardt method searches in the direction given by the solution p to the equations

(J_k^(T)J_k+lambda_kI)p_k=-J_k^(T)f_k,

where lambda_k are nonnegative scalars and I is the identity matrix. The method has the nice property that, for some scalar Delta related to lambda_k, the vector p_k is the solution of the constrained subproblem of minimizing ||J_kp+f_k||_2^2/2 subject to ||p||_2<=Delta

The method is used by the command FindMinimum[f, {x, x0}] when given the Method -> LevenbergMarquardt option.

The cursive N symbol is used to represent this particular distribution, which represents a nasty looking density equation that is parameterized by:Screenshot at 2013-06-27 13:10:39

  • μ0 and μ1: mean vectors
  • Σ: the covariance matrix
  • φ: what we vary to get different distributions

Covariance

This is a two by two matrix, and for a standard normal distribution with zero mean, we have the identity matrix. As the covariance gets larger (e.g., if we multiply it by a factor > 1), it spreads out and squashes down. As covariance gets smaller (multiply by something less than 1), the distribution gets taller and thinner. If we increase the off-diagonal entry in the covariance matrix, we skew the distribution along the line y=x. If we decrease the off-diagonal entry, we skew the distribution in the opposite direction.

Screenshot at 2013-06-27 13:16:00 (copy)

Mean

Screenshot at 2013-06-27 13:16:00

Basically think of BFGS as a way of finding a minimum of an objective function, making use of objective function values and the gradient of the objective function.

First order method means gradients (first derivatives) (and maybe objective function values) are used, but not Hessian (second derivatives). Think of, for instance, gradient descent and steepest descent, among many others.

Second order method means gradients and Hessian are used (and maybe objective function values). Second order methods can be either based on

  1. "Exact" Hessian matrix (or finite differences of gradients), in which case they are known as Newton methods or

  2. Quasi-Newton methods, which approximate the Hessian-based on differences of gradients over several iterations, by imposing a "secant" (Quasi-Newton) condition. There are many different Quasi-Newton methods, which estimate the Hessian in different ways. One of the most popular is BFGS. The BFGS Hessian approximation can either be based on the full history of gradients, in which case it is referred to as BFGS, or it can be based only on the most recent m gradients, in which case it is known as limited memory BFGS, abbreviated as L-BFGS. The advantage of L-BFGS is that is requires only retaining the most recent m gradients, where m is usually around 10 to 20, which is a much smaller storage requirement than n*(n+1)/2 elements required to store the full (triangle) of a Hessian estimate, as is required with BFGS, where n is the problem dimension. Unlike (full) BFGS, the estimate of the Hessian is never explicitly formed or stored in L-BFGS (although some implementations of BFGS only form and update the Choelsky factor of the Hessian approximation, rather than the Hessian approximation itself); rather, the calculations which would be required with the estimate of the Hessian are accomplished without explicitly forming it. L-BFGS is used instead of BFGS for very large problems (when n is very large), but might not perform as well as BFGS. Therefore, BFGS is preferred over L-BFGS when the memory requirements of BFGS can be met. On the other hand, L-BFGS may not be much worse in performance than BFGS.

Kindly revert for any queries

Thanks.

Add a comment
Know the answer?
Add Answer to:
Can someone explain LM algorithm, GDA algorithm, and BFGS Algorithm with example and detail steps...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT