Problem 2. Basic statistics. (10 pts.)
A DNA sequence S was generated by a series of L independent throws of a four-faced die, whose faces ‘A’, ‘C’, ‘G’, ‘T’ have probabilities πA, πC, πG, and πT respectively. You don't know these probabilities of course, and only have the sequence S to go by. How will you estimate the probabilities πA, πC, πG, and πT from sequence S ? (Show your reasoning as well as formulas for the estimates.)
Problem 3. Basic statistics (10 pts.)
Instead of considering the sequence S (from Problem 2) as being generated by random throws of a die, let’s say the sequence (of length L) evolved from an ancestral sequence that had the same length (L) but was composed of all ‘A’s. The sequence S has NA ‘A’s and L – NA other nucleotides (C or G or T).
Assume that:
● each position in the ancestral sequence evolved independently until
today, which is 1000 generations later,
● in each generation, a nucleotide mutates with probability μ and stays the
same with probability 1-μ,
● no nucleotide will have mutated twice in the 1000 generations (the
chances are exceedingly low, and you may ignore the possibility).
a) Calculate the expected number of A’s in S as a function of μ. (5 points)
b) Then, by equating this expectation to the observed count NA , write down a
formula for μ as a function of NA. (3 points)
c) Using this formula for μ, compute its value when L = 10000 and NA = 1000. (2
points)
Problem 2. Basic statistics. (10 pts.) A DNA sequence S was generated by a series of...