Align the same two sequences in part one with the new scoring scheme: This question relates...

Question

Question

We would like to align two DNA sequences: (v) C GATACT, and (w) GATIC GT based on the following scoring scheme as discussed i Align the same two sequences in part one with the new scoring scheme:

i) s(i, j) = 1.5 if Vi = w; (matches); ii) s(i, j) = -1.0 if v; != w; (mismatches); iii) d = 0.25 (indels: insertions or dele This question relates to Bioinformatics --- Genome Sequence Analysis.

Below doesn't match the question above but should give you an idea what it should look like. Answer should be in this format:

с д д т т се с т т д |o i i |1 |1 |1 |1 |1 |1 | |і і | 0 | 1 |1 | |1 |1 |1 | 1 | 2 |2 |2 |2 А | |1 |2 |2 |2 |2 |2 |2 |2 |2 |

science biology

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

Part 1:

Given
s(i,j) = 1 if v_i = v_j

s(i,j) = 0 if v_i != v_j

d = 0

Let us start with an empty table. For easier understanding, the cell coordinates of each cell have been marked here.

		C	G	A	T	A	C	T
	(0,0)	(0,1)	(0,2)	(0,3)	(0,4)	(0,5)	(0,6)	(0,7)
G	(1,0)	(1,1)	(1,2)	(1,3)	(1,4)	(1,5)	(1,6)	(1,7)
A	(2,0)	(2,1)	(2,2)	(2,3)	(2,4)	(2,5)	(2,6)	(2,7)
T	(3,0)	(3,1)	(3,2)	(3,3)	(3,4)	(3,5)	(3,6)	(3,7)
T	(4,0)	(4,1)	(4,2)	(4,3)	(4,4)	(4,5)	(4,6)	(4,7)
C	(5,0)	(5,1)	(5,2)	(5,3)	(5,4)	(5,5)	(5,6)	(5,7)
G	(6,0)	(6,1)	(6,2)	(6,3)	(6,4)	(6,5)	(6,6)	(6,7)
T	(7,0)	(7,1)	(7,2)	(7,3)	(7,4)	(7,5)	(7,6)	(7,7)

The cell (0,0) is initialized by filling it with 0.

		C	G	A	T	A	C	T
	0
G
A
T
T
C
G
T

The entries of each successive cell in the first row and first column are obtained by adding the gap penalty (which accounts for insertions and deletions, denoted by d here) to the cell before it. Since d = 0 here, all the entries of the first row and first column will be 0.

		C	G	A	T	A	C	T
	0	0	0	0	0	0	0	0
G	0
A	0
T	0
T	0
C	0
G	0
T	0

The entry in a given cell (i,j) is given by:

mi, j) = max mli, j - 1)+d mi-1,)+d mi-1, j-1) + si, j)

where m(i,j-1), m(i-1,j) and m(i-1,j-1) are the entries of the cells (i,j-1), (i-1,j) and (i-1,j-1) respectively.

Let us consider cell (1,1) as our cell (i,j). We have G and C which is a mismatch since they are not identical. So, s(1,1) = 0.

Then m(i,j-1) +d = m(1,0) + 0 = 0

m(i-1,j) + d = m(0,1) + 0= 0

m(i-1,j-1) + s(i,j) = m(0,0) + 0 = 0

Therefore m(1,1) which is the maximum of the above 3 values (0, 0 and 0) is 0.

		C	G	A	T	A	C	T
	0	0	0	0	0	0	0	0
G	0	0
A	0
T	0
T	0
C	0
G	0
T	0

Let us now consider cell (1,2). We have G and G, which is a match. Therefore s(i,j) = 1.

We have:

m(i,j-1) +d = m(1,1) + 0 = 0

m(i-1,j) + d = m(0,2) + 0= 0

m(i-1,j-1) + s(i,j) = m(0,1) + 1 = 1

Therefore m(1,1) which is the maximum of the above 3 values (0, 0 and 1) is 1.

		C	G	A	T	A	C	T
	0	0	0	0	0	0	0	0
G	0	0	1
A	0
T	0
T	0
C	0
G	0
T	0

Filling the rest of the table this way, we have:

		C	G	A	T	A	C	T
	0	0	0	0	0	0	0	0
G	0	0	1	1	1	1	1	1
A	0	0	1	2	2	2	2	2
T	0	0	1	2	3	3	3	3
T	0	0	1	2	3	3	3	4
C	0	1	1	2	3	3	4	4
G	0	1	2	2	3	3	4	4
T	0	1	2	2	3	3	4	5

The next step is to trace back the cells from which we obtained the highest alignment score i.e. 5 corresponding to the cell (7,7).

Let us look at the 3 cells from which m(7,7) could have arosen. Since T and T is a match, s(7,7) = 1.

m(i,j-1) + d = m(7,6) + 0 = 4 + 0 = 4

m(i-1,j) + d = m(6,7) + 0= 4 + 0 = 4

m(i-1,j-1) + s(i,j) = m(6,6) + 1 = 4 + 1 = 5

Therefore we obtained m(7,7) from m(6,6).

Next we trace back the cell (6,6) and so on and we obtain the following table:

where the arrows show the trace-back of each cell. Therefore the alignment can be done as follows:

An arrow going diagonally means no gap. An arrow going upwards means that a gap needs to be inserted in the sequence that is depicted on the X axis and an arrow going to the left means that a gap needs to be inserted in the sequence depicted on the Y axis.

We go backwards for the alignment. We start with cell (7,7) which has a diagonal arrow emerging from it. So, we align the nucleotides corresponding to (7,7) i.e. T and T in both sequences.

T

Next, trace-back gives us (6,6) which has an upwards arrow emerging from it. So, we insert a gap in the sequence on the X axis.

- T

GT

Next, we have (5,6) which has a diagonal arrow. Therefore we align the corresponding nucleotides.

C - T

CGT

Applying this to the whole table, we get the following 3 alignments possible (with their respective scores with +1 for each match and 0 for each mismatch or gap):

C G A T A C - T

- G A T T C G T

0+1+1+1+0+1+0+1 = 5

OR

C G A - T A C - T

- G A T T - C G T

0+1+1+0+1+0+1+0+1 = 5

OR

C G A T A - C - T

- G A T - T C G T

0+1+1+1+0+0+1+0+1 = 5

Therefore the maximum alignment score is 5.

Notice that this is the same as the entry with maximum value in our initial table. So, if we are interested in only the maximum score and not the alignments themselves, the trace-back and further steps are not necessary and the maximum alignment score can be obtained from our alignment matrix itself.

Part 2:

Given
s(i,j) = 1.5 if v_i = v_j

s(i,j) = -1 if v_i != v_j

d = 0.25

This is similar to part 1 but it has a different scoring scheme. The initialization step is slightly different, since there is a non-zero score for insertion/deletions i.e. gaps.

		C	G	A	T	A	C	T
	0	0.25	0.5	0.75	1	1.25	1.5	1.75
G	0.25
A	0.5
T	0.75
T	1
C	1.25
G	1.5
T	1.75

We see that in the first row and first column, the entry in a given cell = entry of the sum above/to the left of it + 0.25 (gap penalty i.e. d). Since d was 0 in part 1, all the cells in the first row and first column had a value of 0. Since d is non-zero here, we follow the above.

Filling the rest of the alignment matrix is the same as part 1:

mi, j) = max mli, j - 1)+d mi-1,)+d mi-1, j-1) + si, j)

where s(i,j) = 1.5 for a match and -1 for a mismatch, and d = 0.25.

		C	G	A	T	A	C	T
	0	0.25	0.5	0.75	1	1.25	1.5	1.75
G	0.25	0.5	1.75	2	2.25	2.5	2.75	3
A	0.5	0.75	2	3.25	3.5	3.75	4	4.25
T	0.75	1	2.25	3.5	4.75	5	5.25	5.5
T	1	1.25	2.5	3.75	5	5.25	5.5	6.75
C	1.25	2.5	2.75	4	5.25	5.5	6.75	7
G	1.5	2.75	4	4.25	5.5	5.75	7	7.25
T	1.75	3	4.25	4.5	5.75	6	7.25	8.5

As discussed above, the maximum entry in the matrix corresponds to the maximum alignment score. Therefore the maximum alignment score is 8.5.

Trace-back gives us:

69 52 ST SET se 52

The following alignments are obtained:

C G A T A T C - T

- G A T - - C G T

0.25+1.5+1.5+1.5+0.25+0.25+1.5+0.25+1.5 = 8.5

OR

C G A - T A C - T

- G A T T - C G T

0.25+1.5+1.5+0.25+1.5+0.25+1.5+0.25+1.5 = 8.5

We see that the following alignment that we obtained from part 1 is missing:

C G A T A C - T

- G A T T C G T

This is because in part 2, the score for a mismatch was -1 but for a gap it was 0.25. However, in part 1, both mismatch and gap had the same score of 0. Since mismatch was more heavily penalized than gap in part 2, the above alignment which has an A/T mismatch was not obtained during the alignment process, which only takes the alignments with the maximum scores into account.

Add a comment

Answer 2

Align the same two sequences in part one with the new scoring scheme: This question relates...

Homework Answers

Add Answer to:
Align the same two sequences in part one with the new scoring scheme: This question relates...

Post as a guest

Earn Coins

*SOLVE QS 13 ONLY 11. (5 pts) We would like to align two DNA sequences: (v)GATTCGT, and (w) GAATTAGTT based on the following scoring scheme as discussed in class: s(i i-1 if v w (matches) ii) s(i, j)...

(2pts) The quality score Q30 refers to the base calling accuracy of: (a) 90% (b) 100%...

Write Matlab or Python scripts to solve. Output the aligned sequences with the maximum alignment score....

Write a Matlab script to solve. Output the aligned sequences with the maximum alignment score. We...

Scoring Scheme: 3-3-2-1 Part III.The two reactions involved in quantitatively determining the amount of iodate in...

The answer to this problem is A. Could you please show all of the work to...

draw the missing structures for the letters Scheme 1 Br S NaOH CI + A B...

Select the major product of the following reaction scheme Reaction Н NaOH HC Acelone, ETOH 25°C...

For each of the following short reaction schemes, draw the structures of the compounds that would...

True or False : 14 of 15 FINISH QULZ TRUE FALSE 1) This scheme shows the...

		C	G	A	T	A	C	T
	0	0	0	0	0	0	0	0
G	0	0	1	1	1	1	1	1
A	0	0	1	2	2	2	2	2
T	0	0	1	2	3	3	3	3
T	0	0	1	2	3	3	3	4
C	0	1	1	2	3	3	4	4
G	0	1	2	2	3	3	4	4
T	0	1	2	2	3	3	4	5

		C	G	A	T	A	C	T
	0	0	0	0	0	0	0	0
G	0	0	1	1	1	1	1	1
A	0	0	1	2	2	2	2	2
T	0	0	1	2	3	3	3	3
T	0	0	1	2	3	3	3	4
C	0	1	1	2	3	3	4	4
G	0	1	2	2	3	3	4	4
T	0	1	2	2	3	3	4	5

Align the same two sequences in part one with the new scoring scheme: This question relates...

Homework Answers

Add Answer to: Align the same two sequences in part one with the new scoring scheme: This question relates...

Post as a guest

Earn Coins

Add Answer to:
Align the same two sequences in part one with the new scoring scheme: This question relates...

		C	G	A	T	A	C	T
	0	0	0	0	0	0	0	0
G	0	0	1	1	1	1	1	1
A	0	0	1	2	2	2	2	2
T	0	0	1	2	3	3	3	3
T	0	0	1	2	3	3	3	4
C	0	1	1	2	3	3	4	4
G	0	1	2	2	3	3	4	4
T	0	1	2	2	3	3	4	5