What does: "t. Train it to fit the target concept -2 + x1 + x2 > 0." mean? in machine leaning ...

Question

Question

What does: "t. Train it to fit the target concept -2 + x1 + x2 > 0." mean? in machine leaning ...

What does: "t. Train it to fit the target concept -2 + x1 + x2 > 0." mean? in machine leaning (Delta training function)

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

The statement actually indicates that the Sum of your Input variables "x1" and "x2" will always remain greater than 2, and that is the target you have to achieve via Neural Networks classification algorithm where we use Delta Training function in order to train our various input data sets.

You can calculate your results on the basis of the following concept :

The Delta Rule uses the difference between target activation and obtained activation to drive learning. The activation function is called a Linear Activation function, in which the output node’s activation is simply equal to the sum of the network’s respective input/weight products. A threshold activation function is dropped & instead a linear sum of products is used to calculate the activation of the output neuron. The strength of network connections (i.e., the values of the weights) is adjusted to reduce the difference between target and actual output activation (i.e., error).

4 2 3 4

During forward propagation through a network, the output (activation) of a given node is a function of its inputs. The inputs to a node, which are simply the products of the output of preceding nodes with their associated weights, are summed and then passed through an activation function before being sent out from the node. Thus, we have the following:

0*YakkSSORq5iLGqEz.jpg

and

0*LPO0KCwbyVRyM-1q.jpg

where ‘Sj’ is the sum of all relevant products of weights and outputs from the previous layer i, ‘wij’ represents the relevant weights connecting layer i with layer j, ‘ai’ represents the activation of nodes in the previous layer i, ‘aj’ is the activation of the node at hand, and ‘f’ is the activation function.

For any given set of input data and weights, there will be an associated magnitude of error, which is measured by an error function. The Delta Rule employs the error function for what is known as Gradient Descent learning, which involves the ‘modification of weights along the most direct path in weight-space to minimize error’, so change applied to a given weight is proportional to the negative of the derivative of the error with respect to that weight.

The Error/Cost function is commonly given as the sum of the squares of the differences between all target and actual node activation for the output layer. For a particular training pattern (i.e., training case), the error is thus given by:

0*ZMiNPnVW_ZkJsotc.jpg

where ‘Ep’ is total error over the training pattern, ½ is a value applied to simplify the function’s derivative, ’n’ represents all output nodes for a given training pattern, ‘tj’ sub n represents the Target value for node n in output layer j, and ‘aj’ sub n represents the actual activation for the same node. This particular error measure is attractive because its derivative, whose value is needed in the employment of the Delta Rule and is easily calculated. The error over an entire set of training patterns is calculated by summing all ‘Ep’:

0*LaY3VegONr4MZQOd.jpg

where ‘E’ is the total error, and ‘p’ represents all training patterns. An equivalent term for E in earlier equation is Sum-of-squares error. A normalized version of this equation is given by the Mean Squared Error (MSE) equation:

0*RxbQReJfZmtuxkDx.jpg

where ‘P’ and ’N’ are the total number of training patterns and output nodes, respectively. It is the error of both previous equations, that gradient descent attempts to minimize (not strictly true if weights are changed after each input pattern is submitted to the network. The error over a given training pattern is commonly expressed in terms of the Total Sum of Squares error, which is simply equal to the sum of all squared errors overall output nodes and all training patterns. ‘The negative of the derivative of the error function is required in order to perform Gradient Descent Learning’. The derivative of our equation(which measures error for a given pattern ‘p’) above, with respect to a particular weight ‘wij’ sub ‘x’, is given by the chain rule as:

0*hSY28kiKptUuP62k.jpg

where ‘aj’ sub ‘z’ is activation of the node in the output layer that corresponds to weight ‘wij’ sub x.It follows that:

0*Ri3Yy-qTsx4xuxhx.jpg

and

0*nzuS6U_0t-tsOYu5.jpg

Thus, the derivative of the error over an individual training pattern is given by the product of the derivatives of our prior equation:

0*FNwGMEkIEXGrj3b0.jpg

Because Gradient Descent learning requires that any change in a particular weight be proportional to the negative of the derivative of the error, the change in a given weight must be proportional to the negative of our prior equation. Replacing the difference between the target and actual activation of the relevant output node by d, and introducing a learning rate epsilon, that equation can be re-written in the final form of the Delta Rule:

0*mSo97zCX2gnQBqej.jpg

Delta Rule for Perceptrons

The reasoning behind the use of a Linear Activation function here instead of a Threshold Activation function can now be justified: Threshold activation function that characterizes both the McCulloch and Pitts network and the perceptron is not differentiable at the transition between the activations of 0and 1 (slope = infinity), and its derivative is 0 over the remainder of the function. Hence, Threshold activation function cannot be used in Gradient Descent learning. Whereas a Linear Activation function (or any other function that is differential) allows the derivative of the error to be calculated.

1*t9qCtZ_hMGTYiEGt7kwT6w.jpeg

Three-dimensional depiction of an Actual error surface

1*jnG5--4AcQSNHaFqquiu7A.jpeg

The two-dimensional depiction of the error surface

Add a comment

Answer 2

What does: "t. Train it to fit the target concept -2 + x1 + x2 > 0." mean? in machine leaning ...

Homework Answers

Add Answer to:
What does: "t. Train it to fit the target concept -2 + x1 + x2 > 0." mean? in machine leaning ...

Post as a guest

Earn Coins

can you help me understanding the nonlinear fit concept, especially the highlighted function torial 12.nb Nonlinear...

b. x'j(t) = xi(t) – 2x2(t); X'z(t) = 2x1(t) + x2(t); x1(0) = 0, x2(0) =...

Consider a preference over bundles (x1,x2), where x1 ≥ 0 and x2 ≥ 0. Suppose the...

Problem 2: A firm has the following production function: f(x1,x2) = x1 + x2 A) Does...

= = 3, Cov(X1, X2) = 2, Cov(X2, X3) = -2, Let Var(X1) = Var(X3) =...

if U(x1,x2)= (x1-9)^2+(x2-4)^2 what is the demand function of x1 (be aware, this is a circle)

Let X1,X2 be two independent exponential random variables with λ=1, compute the P(X1+X2<t) using the joint...

X1 and X2 are binary indicators of failure for two parts of a machine. Independent tests...

9 Let Xi, X2, ..., Xn be an independent trials process with normal density of mean 1 and variance 2. Find the moment generating function for (a) X (b) S2 =X1 + X2 . (c) Sn=X1+X2 + . . . + Xn. (d) An...

P9.3 A random process X(t) has the following member functions: x1 (t) -2 cos(t), x2(t)2 sin(t),...

What does: "t. Train it to fit the target concept -2 + x1 + x2 > 0." mean? in machine leaning ...

Homework Answers

Add Answer to: What does: "t. Train it to fit the target concept -2 + x1 + x2 > 0." mean? in machine leaning ...

Post as a guest

Earn Coins

Add Answer to:
What does: "t. Train it to fit the target concept -2 + x1 + x2 > 0." mean? in machine leaning ...