Exercise 12.6 At each stage, one can either pay 1 and receive a coupon that is equally likely to be any of n types, or one can stop and receive a final reward of jr if one's current collection of...
Exercise 12.6 At each stage, one can either pay 1 and receive a coupon that is equally likely to be any of n types, or one can stop and receive a final reward of jr if one's current collection of coupons contains exactly j distinct types. Thus, for instance, if one stops after having previously obtained six coupons whose successive types were 2, 4, 2, 5, 4, 3, then one would have earned a net return of 4r -6. The objective is to maxi- mize the expected net return. 246 Stochastic Dynamic Programming We want to solve this as a dynamic programming problem (a) What are the states and actions? (b) Define the optimal value function and give the optimality equation. (c) Give the one-stage lookahead policy