Align the same two sequences in part one with the new scoring scheme:
This question relates to Bioinformatics --- Genome Sequence Analysis.
Below doesn't match the question above but should give you an idea what it should look like. Answer should be in this format:
Part 1:
Given
s(i,j) = 1 if vi = vj
s(i,j) = 0 if vi != vj
d = 0
Let us start with an empty table. For easier understanding, the cell coordinates of each cell have been marked here.
C | G | A | T | A | C | T | ||
(0,0) | (0,1) | (0,2) | (0,3) | (0,4) | (0,5) | (0,6) | (0,7) | |
G | (1,0) | (1,1) | (1,2) | (1,3) | (1,4) | (1,5) | (1,6) | (1,7) |
A | (2,0) | (2,1) | (2,2) | (2,3) | (2,4) | (2,5) | (2,6) | (2,7) |
T | (3,0) | (3,1) | (3,2) | (3,3) | (3,4) | (3,5) | (3,6) | (3,7) |
T | (4,0) | (4,1) | (4,2) | (4,3) | (4,4) | (4,5) | (4,6) | (4,7) |
C | (5,0) | (5,1) | (5,2) | (5,3) | (5,4) | (5,5) | (5,6) | (5,7) |
G | (6,0) | (6,1) | (6,2) | (6,3) | (6,4) | (6,5) | (6,6) | (6,7) |
T | (7,0) | (7,1) | (7,2) | (7,3) | (7,4) | (7,5) | (7,6) | (7,7) |
The cell (0,0) is initialized by filling it with 0.
C | G | A | T | A | C | T | ||
0 | ||||||||
G | ||||||||
A | ||||||||
T | ||||||||
T | ||||||||
C | ||||||||
G | ||||||||
T |
The entries of each successive cell in the first row and first column are obtained by adding the gap penalty (which accounts for insertions and deletions, denoted by d here) to the cell before it. Since d = 0 here, all the entries of the first row and first column will be 0.
C | G | A | T | A | C | T | ||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
G | 0 | |||||||
A | 0 | |||||||
T | 0 | |||||||
T | 0 | |||||||
C | 0 | |||||||
G | 0 | |||||||
T | 0 |
The entry in a given cell (i,j) is given by:
where m(i,j-1), m(i-1,j) and m(i-1,j-1) are the entries of the cells (i,j-1), (i-1,j) and (i-1,j-1) respectively.
Let us consider cell (1,1) as our cell (i,j). We have G and C which is a mismatch since they are not identical. So, s(1,1) = 0.
Then m(i,j-1) +d = m(1,0) + 0 = 0
m(i-1,j) + d = m(0,1) + 0= 0
m(i-1,j-1) + s(i,j) = m(0,0) + 0 = 0
Therefore m(1,1) which is the maximum of the above 3 values (0, 0 and 0) is 0.
C | G | A | T | A | C | T | ||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
G | 0 | 0 | ||||||
A | 0 | |||||||
T | 0 | |||||||
T | 0 | |||||||
C | 0 | |||||||
G | 0 | |||||||
T | 0 |
Let us now consider cell (1,2). We have G and G, which is a match. Therefore s(i,j) = 1.
We have:
m(i,j-1) +d = m(1,1) + 0 = 0
m(i-1,j) + d = m(0,2) + 0= 0
m(i-1,j-1) + s(i,j) = m(0,1) + 1 = 1
Therefore m(1,1) which is the maximum of the above 3 values (0, 0 and 1) is 1.
C | G | A | T | A | C | T | ||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
G | 0 | 0 | 1 | |||||
A | 0 | |||||||
T | 0 | |||||||
T | 0 | |||||||
C | 0 | |||||||
G | 0 | |||||||
T | 0 |
Filling the rest of the table this way, we have:
C | G | A | T | A | C | T | ||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
G | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
A | 0 | 0 | 1 | 2 | 2 | 2 | 2 | 2 |
T | 0 | 0 | 1 | 2 | 3 | 3 | 3 | 3 |
T | 0 | 0 | 1 | 2 | 3 | 3 | 3 | 4 |
C | 0 | 1 | 1 | 2 | 3 | 3 | 4 | 4 |
G | 0 | 1 | 2 | 2 | 3 | 3 | 4 | 4 |
T | 0 | 1 | 2 | 2 | 3 | 3 | 4 | 5 |
The next step is to trace back the cells from which we obtained the highest alignment score i.e. 5 corresponding to the cell (7,7).
Let us look at the 3 cells from which m(7,7) could have arosen. Since T and T is a match, s(7,7) = 1.
m(i,j-1) + d = m(7,6) + 0 = 4 + 0 = 4
m(i-1,j) + d = m(6,7) + 0= 4 + 0 = 4
m(i-1,j-1) + s(i,j) = m(6,6) + 1 = 4 + 1 = 5
Therefore we obtained m(7,7) from m(6,6).
Next we trace back the cell (6,6) and so on and we obtain the following table:
where the arrows show the trace-back of each cell. Therefore the alignment can be done as follows:
An arrow going diagonally means no gap. An arrow going upwards means that a gap needs to be inserted in the sequence that is depicted on the X axis and an arrow going to the left means that a gap needs to be inserted in the sequence depicted on the Y axis.
We go backwards for the alignment. We start with cell (7,7) which has a diagonal arrow emerging from it. So, we align the nucleotides corresponding to (7,7) i.e. T and T in both sequences.
T
T
Next, trace-back gives us (6,6) which has an upwards arrow emerging from it. So, we insert a gap in the sequence on the X axis.
- T
GT
Next, we have (5,6) which has a diagonal arrow. Therefore we align the corresponding nucleotides.
C - T
CGT
Applying this to the whole table, we get the following 3 alignments possible (with their respective scores with +1 for each match and 0 for each mismatch or gap):
C G A T A C - T
- G A T T C G T
0+1+1+1+0+1+0+1 = 5
OR
C G A - T A C - T
- G A T T - C G T
0+1+1+0+1+0+1+0+1 = 5
OR
C G A T A - C - T
- G A T - T C G T
0+1+1+1+0+0+1+0+1 = 5
Therefore the maximum alignment score is 5.
Notice that this is the same as the entry with maximum value in our initial table. So, if we are interested in only the maximum score and not the alignments themselves, the trace-back and further steps are not necessary and the maximum alignment score can be obtained from our alignment matrix itself.
Part 2:
Given
s(i,j) = 1.5 if vi = vj
s(i,j) = -1 if vi != vj
d = 0.25
This is similar to part 1 but it has a different scoring scheme. The initialization step is slightly different, since there is a non-zero score for insertion/deletions i.e. gaps.
C | G | A | T | A | C | T | ||
0 | 0.25 | 0.5 | 0.75 | 1 | 1.25 | 1.5 | 1.75 | |
G | 0.25 | |||||||
A | 0.5 | |||||||
T | 0.75 | |||||||
T | 1 | |||||||
C | 1.25 | |||||||
G | 1.5 | |||||||
T | 1.75 |
We see that in the first row and first column, the entry in a given cell = entry of the sum above/to the left of it + 0.25 (gap penalty i.e. d). Since d was 0 in part 1, all the cells in the first row and first column had a value of 0. Since d is non-zero here, we follow the above.
Filling the rest of the alignment matrix is the same as part 1:
where s(i,j) = 1.5 for a match and -1 for a mismatch, and d = 0.25.
C | G | A | T | A | C | T | ||
0 | 0.25 | 0.5 | 0.75 | 1 | 1.25 | 1.5 | 1.75 | |
G | 0.25 | 0.5 | 1.75 | 2 | 2.25 | 2.5 | 2.75 | 3 |
A | 0.5 | 0.75 | 2 | 3.25 | 3.5 | 3.75 | 4 | 4.25 |
T | 0.75 | 1 | 2.25 | 3.5 | 4.75 | 5 | 5.25 | 5.5 |
T | 1 | 1.25 | 2.5 | 3.75 | 5 | 5.25 | 5.5 | 6.75 |
C | 1.25 | 2.5 | 2.75 | 4 | 5.25 | 5.5 | 6.75 | 7 |
G | 1.5 | 2.75 | 4 | 4.25 | 5.5 | 5.75 | 7 | 7.25 |
T | 1.75 | 3 | 4.25 | 4.5 | 5.75 | 6 | 7.25 | 8.5 |
As discussed above, the maximum entry in the matrix corresponds to the maximum alignment score. Therefore the maximum alignment score is 8.5.
Trace-back gives us:
The following alignments are obtained:
C G A T A T C - T
- G A T - - C G T
0.25+1.5+1.5+1.5+0.25+0.25+1.5+0.25+1.5 = 8.5
OR
C G A - T A C - T
- G A T T - C G T
0.25+1.5+1.5+0.25+1.5+0.25+1.5+0.25+1.5 = 8.5
We see that the following alignment that we obtained from part 1 is missing:
C G A T A C - T
- G A T T C G T
This is because in part 2, the score for a mismatch was -1 but for a gap it was 0.25. However, in part 1, both mismatch and gap had the same score of 0. Since mismatch was more heavily penalized than gap in part 2, the above alignment which has an A/T mismatch was not obtained during the alignment process, which only takes the alignments with the maximum scores into account.
Align the same two sequences in part one with the new scoring scheme: This question relates...
*SOLVE QS 13 ONLY 11. (5 pts) We would like to align two DNA sequences: (v)GATTCGT, and (w) GAATTAGTT based on the following scoring scheme as discussed in class: s(i i-1 if v w (matches) ii) s(i, j) = 0 if vis wh (mismatches); ii) d 0 What would be the maximum alignment score? Explain how you get the result. (indels: insertions or deletions). 12. (5 pts) Align the same two sequences in the previous problem with the new scoring...
(2pts) The quality score Q30 refers to the base calling accuracy of: (a) 90% (b) 100% (c) 99.90% (d) More than 30% (5 pts) We would like to align two DNA sequences: (v) CGATACT, and (w) GATTCGT based on the following scoring scheme as discussed in class: i) s(i, j) = 1 if v;= w; (matches); ii) si, j) = 0 if vi!= w; (mismatches); iii) d = 0 (indels: insertions or deletions). What would be the maximum alignment score?...
Write Matlab or Python scripts to solve. Output the aligned sequences with the maximum alignment score. We would like to align two DNA sequences: (v) C G A T A C T, and (w) G A T T C G T based on the following scoring scheme as discussed in class: i) s(i, j) = 1 if vi = wj (matches); ii) s(i, j) = 0 if vi != wj (mismatches); iii) d = 0 (indels: insertions or deletions). What...
Write a Matlab script to solve. Output the aligned sequences with the maximum alignment score. We would like to align two DNA sequences: (v) C G A T A C T, and (w) G A T T C G T based on the following scoring scheme i) s(i, j) = 1.5 if vi = wj (matches); ii) s(i, j) = -1.0 if vi != wj (mismatches); iii) d = 0.25 (indels: insertions or deletions). D is the gap penality What...
Scoring Scheme: 3-3-2-1 Part III.The two reactions involved in quantitatively determining the amount of iodate in solution are: IO3-(aq) + 5 I-(aq) + 6 H+(aq) --> 3 I2(aq) + 3 H2O(l) followed by reaction of the I2: I2(aq) + 2 S2O32- --> 2 I-(aq) + S4O62-(aq). What is the stoichiometric factor, that is the number of moles of Na2S2O3 reacting with one mole of KIO3? Factor =
The answer to this problem is A. Could you please show all of the work to arrive at that answer? Thank you! Page of 15 ZOOM 6 QUESTION 1 1 m/s Two pulses on a string approach each other at speeds of 1 m/s. A snapshot graph at t 0 is shown on the right. What is the correct snapshot graph at t 2 s? T T т т 1 2 5 7 8 9 1 m/s Д В. А....
draw the missing structures for the letters Scheme 1 Br S NaOH CI + A B + С H2N NH2 Scheme 2 1. MCPBA Na2Cr2O7 D E 2. H20, H30+ H30+ Scheme 3 1. LIAIH4 PBr3 F G Н 2. H30+ ether Scheme 4 1. H20, Hg(OAc)2 DMP H I 2. NaBH4 CH2Cl2 Scheme 5 OH tosyl chloride Na J K pyridine ОН
Select the major product of the following reaction scheme Reaction Н NaOH HC Acelone, ETOH 25°C А 0 ora CH нс Hc E он о Ho CH HO с нс (No Reaction) P х W N FE 80 BR 74 2 75 # 3 tab $ 4 Q % 5 W 6 E & 7 20 lock А T S Y D
For each of the following short reaction schemes, draw the structures of the compounds that would be formed at each spot in the scheme where there is a letter Scheme 1 1. NaBH4 SOCI2 А B H 2. H20 Scheme 2 tosyl chloride H3C-0 Na он с D pyridine H3C-OH Scheme 3 1. H20, Hg(OAC)2 PCC E F 2. NaBH4 CH2Cl2 Scheme 4 Br S NaOH G H + Br I H2N NH2 Scheme 5 1. MCPBA Na2Cr207 J K...
True or False : 14 of 15 FINISH QULZ TRUE FALSE 1) This scheme shows the correct transformation (reactant to product) when conducting the reaction shown 3) NOME MO 5) LDATEC, THE SPRCH, Q Zoom 1) NaOme 2) Х Br Br Ph 3) NaOme 4) H/H20, A 5) LDA, -78°C, THE 6) PhCH2B1 MULTIPLE CHOICE . O idiool Select the reaction arrow that best represents the following reaction. Zoom А A B B С . Oil.coé s a b с...