Can someone explain LM algorithm, GDA algorithm, and BFGS Algorithm with example and detail steps...

Question

Question

Can someone explain LM algorithm, GDA algorithm, and BFGS Algorithm with example and detail steps...

Can someone explain LM algorithm, GDA algorithm, and BFGS Algorithm with example and detail steps please.thank you

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

`Hey,

Note: Brother in case of any queries, just comment in box I would be very happy to assist all your queries

Levenberg-Marquardt is a popular alternative to the Gauss-Newton method of finding the minimum of a function F (x) that is a sum of squares of nonlinear functions,

F(x)=1/2sum_(i=1)^m[f_i(x)]^2.

Let the Jacobian of fi (x) be denoted J_i(x) , then the Levenberg-Marquardt method searches in the direction given by the solution to the equations

(J_k^(T)J_k+lambda_kI)p_k=-J_k^(T)f_k,

where lambda_k are nonnegative scalars and is the identity matrix. The method has the nice property that, for some scalar Delta related to , the vector p_k is the solution of the constrained subproblem of minimizing ||J_kp+f_k||_2^2/2 subject to ||p||_2<=Delta

The method is used by the command FindMinimum[f, x, x0] when given the Method -> LevenbergMarquardt option.

The cursive N symbol is used to represent this particular distribution, which represents a nasty looking density equation that is parameterized by: Screenshot at 2013-06-27 13:10:39

μ0 and μ1: mean vectors
Σ: the covariance matrix
φ: what we vary to get different distributions

Covariance

This is a two by two matrix, and for a standard normal distribution with zero mean, we have the identity matrix. As the covariance gets larger (e.g., if we multiply it by a factor > 1), it spreads out and squashes down. As covariance gets smaller (multiply by something less than 1), the distribution gets taller and thinner. If we increase the off-diagonal entry in the covariance matrix, we skew the distribution along the line y=x. If we decrease the off-diagonal entry, we skew the distribution in the opposite direction.

Screenshot at 2013-06-27 13:16:00 (copy)

Mean

Screenshot at 2013-06-27 13:16:00

Basically think of BFGS as a way of finding a minimum of an objective function, making use of objective function values and the gradient of the objective function.

First order method means gradients (first derivatives) (and maybe objective function values) are used, but not Hessian (second derivatives). Think of, for instance, gradient descent and steepest descent, among many others.

Second order method means gradients and Hessian are used (and maybe objective function values). Second order methods can be either based on

"Exact" Hessian matrix (or finite differences of gradients), in which case they are known as Newton methods or
Quasi-Newton methods, which approximate the Hessian-based on differences of gradients over several iterations, by imposing a "secant" (Quasi-Newton) condition. There are many different Quasi-Newton methods, which estimate the Hessian in different ways. One of the most popular is BFGS. The BFGS Hessian approximation can either be based on the full history of gradients, in which case it is referred to as BFGS, or it can be based only on the most recent m gradients, in which case it is known as limited memory BFGS, abbreviated as L-BFGS. The advantage of L-BFGS is that is requires only retaining the most recent m gradients, where m is usually around 10 to 20, which is a much smaller storage requirement than n*(n+1)/2 elements required to store the full (triangle) of a Hessian estimate, as is required with BFGS, where n is the problem dimension. Unlike (full) BFGS, the estimate of the Hessian is never explicitly formed or stored in L-BFGS (although some implementations of BFGS only form and update the Choelsky factor of the Hessian approximation, rather than the Hessian approximation itself); rather, the calculations which would be required with the estimate of the Hessian are accomplished without explicitly forming it. L-BFGS is used instead of BFGS for very large problems (when n is very large), but might not perform as well as BFGS. Therefore, BFGS is preferred over L-BFGS when the memory requirements of BFGS can be met. On the other hand, L-BFGS may not be much worse in performance than BFGS.

Kindly revert for any queries

Thanks.

Add a comment

Answer 2

Can someone explain LM algorithm, GDA algorithm, and BFGS Algorithm with example and detail steps...

Homework Answers

Add Answer to:
Can someone explain LM algorithm, GDA algorithm, and BFGS Algorithm with example and detail steps...

Post as a guest

Earn Coins

Hi Can someone explain in detail the answer ? public class Example { public static void...

Can someone explain in great detail what a job structure is?

Describe in detail the steps of the K-means algorithm. Make sure that you define the input...

altrusim: explain what altrusistic personality ie, give a detail example of someone demonstrating altruistic behavior bystander...

Can someone please explain to me in detail how to do this problem? Please show all...

Explain in detail why the aggregate demand curve slopes downward in the standard IS-LM model. Then...

can someone please explain this in detail. thanks! Four resistors are connected to a battery, as...

can someone explain in detail the process in completing this question, please and thank you. The...

In c++, can someone explain and provide an example of a nested if and else, using...

can someone please explain how to do this with as much detail as possible. someone tried...

Can someone explain LM algorithm, GDA algorithm, and BFGS Algorithm with example and detail steps...

Homework Answers

Add Answer to: Can someone explain LM algorithm, GDA algorithm, and BFGS Algorithm with example and detail steps...

Post as a guest

Earn Coins

Add Answer to:
Can someone explain LM algorithm, GDA algorithm, and BFGS Algorithm with example and detail steps...