Question

Q1) Which two of the following describe bias-variance trade-off between MC and TD? A) The MC...

Q1) Which two of the following describe bias-variance trade-off between MC and TD?

A) The MC algorithm reduces variance by sampling until the terminal state, leading to higher bias.

B) The MC algorithm reduces bias by sampling until the terminal state, leading to higher variance.

C) The TD algorithm reduces variance by sampling a small number of time steps, leading to higher bias.

D) The TD algorithm reduces bias by sampling a small number of a time steps, leading to higher variance.

Question 2) What is the difference between on-policy and off-policy learning?

A)On-policy learning learns by evaluating the results of a behavior policy to perform policy improvement on a target policy, whereas off-policy learns from experience by evaluating a target policy and performing policy improvement on the target policy.

b) On-policy learning learns from experience by evaluating a target policy and performing policy improvement on the target policy, whereas off-policy learning learns by evaluating the results of a behavior policy to perform policy improvement on a target policy.

C) On-policy learning learns from experience by evaluating a target policy and performing policy improvement on the target policy, whereas off-policy learning learns by evaluating the target policy to perform policy improvement on a behavior policy.

D) On-policy learning learns from experience by evaluating a behavior policy and performing policy improvement on the target policy, whereas off-policy learning learns by evaluating the results of a behavior policy to perform policy improvement on the behavior policy.

Question 3) Which two statements describe eligibility traces?

A) Eligibility traces down weight the contribution of states that are rarely visited to computing average Vs) or Q(s,a).

B) Eligibility traces encourage further exploration of the state space.

C) Eligibility traces assign credit to action.

D) Eligibility traces assign credit to both the most frequently visited and last visited states.

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Q1) Which two of the following describe bias-variance trade-off between MC and TD?

B) The MC algorithm reduces bias by sampling until the terminal state, leading to higher variance.

AND

C) The TD algorithm reduces variance by sampling a small number of time steps, leading to higher bias.

Description:

TD can learn before knowing the final outcome (learn online after every step), MC must wait until end of episode before return is known.

TD can learn without the final outcome (learn from incomplete sequences), MC can only learn from complete sequences. TD works in continuting environments, while MC only works for episodic (terminating) environments.

MC has high variance, zero bias, which leads to good convergence properties, not very sensitive to initial value and very simple to understand and use.

TD has low variance but some bias, which renders more efficient than MC, TD(0) converges to , and more sensitive to initial value.

----------------------------------------------------------------------------------------------------------------------------------------------------------------

Question 2) What is the difference between on-policy and off-policy learning?

b) On-policy learning learns from experience by evaluating a target policy and performing policy improvement on the target policy, whereas off-policy learning learns by evaluating the results of a behavior policy to perform policy improvement on a target policy.

Description:

  1. On policy learning : It learns on the job. which means it evaluates or improves the policy that is used to make the decisions.

(In other words) it directly learns a policy which gives you decisions about which action to take in some state.

2. Off policy learning : It evaluates one policy ( target policy ) while following another policy ( behavior policy )

just like we learn to do something while observing others doing the same thing.

----------------------------------------------------------------------------------------------------------------------------------------------------------

Question 3) Which two statements describe eligibility traces?

A) Eligibility traces down weight the contribution of states that are rarely visited to computing average Vs) or Q(s,a).

D) Eligibility traces assign credit to both the most frequently visited and last visited states.

Add a comment
Know the answer?
Add Answer to:
Q1) Which two of the following describe bias-variance trade-off between MC and TD? A) The MC...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT