Question

We would like to align two DNA sequences: (v) C GATACT, and (w) GATIC GT based on the following scoring scheme as discussed iAlign the same two sequences in part one with the new scoring scheme:

i) s(i, j) = 1.5 if Vi = w; (matches); ii) s(i, j) = -1.0 if v; != w; (mismatches); iii) d = 0.25 (indels: insertions or deleThis question relates to Bioinformatics --- Genome Sequence Analysis.

Below doesn't match the question above but should give you an idea what it should look like. Answer should be in this format:

с д д т т се с т т д |o i i |1 |1 |1 |1 |1 |1 | |і і | 0 | 1 |1 | |1 |1 |1 | 1 | 2 |2 |2 |2 А | |1 |2 |2 |2 |2 |2 |2 |2 |2 |

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Part 1:

Given
s(i,j) = 1 if vi = vj

s(i,j) = 0 if vi != vj

d = 0

Let us start with an empty table. For easier understanding, the cell coordinates of each cell have been marked here.

C G A T A C T
(0,0) (0,1) (0,2) (0,3) (0,4) (0,5) (0,6) (0,7)
G (1,0) (1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (1,7)
A (2,0) (2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (2,7)
T (3,0) (3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (3,7)
T (4,0) (4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (4,7)
C (5,0) (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (5,7)
G (6,0) (6,1) (6,2) (6,3) (6,4) (6,5) (6,6) (6,7)
T (7,0) (7,1) (7,2) (7,3) (7,4) (7,5) (7,6) (7,7)

The cell (0,0) is initialized by filling it with 0.

C G A T A C T
0
G
A
T
T
C
G
T

The entries of each successive cell in the first row and first column are obtained by adding the gap penalty (which accounts for insertions and deletions, denoted by d here) to the cell before it. Since d = 0 here, all the entries of the first row and first column will be 0.

C G A T A C T
0 0 0 0 0 0 0 0
G 0
A 0
T 0
T 0
C 0
G 0
T 0

The entry in a given cell (i,j) is given by:

mi, j) = max mli, j - 1)+d mi-1,)+d mi-1, j-1) + si, j)

where m(i,j-1), m(i-1,j) and m(i-1,j-1) are the entries of the cells (i,j-1), (i-1,j) and (i-1,j-1) respectively.

Let us consider cell (1,1) as our cell (i,j). We have G and C which is a mismatch since they are not identical. So, s(1,1) = 0.

Then m(i,j-1) +d = m(1,0) + 0 = 0

m(i-1,j) + d = m(0,1) + 0= 0

m(i-1,j-1) + s(i,j) = m(0,0) + 0 = 0

Therefore m(1,1) which is the maximum of the above 3 values (0, 0 and 0) is 0.

C G A T A C T
0 0 0 0 0 0 0 0
G 0 0
A 0
T 0
T 0
C 0
G 0
T 0

Let us now consider cell (1,2). We have G and G, which is a match. Therefore s(i,j) = 1.

We have:

m(i,j-1) +d = m(1,1) + 0 = 0

m(i-1,j) + d = m(0,2) + 0= 0

m(i-1,j-1) + s(i,j) = m(0,1) + 1 = 1

Therefore m(1,1) which is the maximum of the above 3 values (0, 0 and 1) is 1.

C G A T A C T
0 0 0 0 0 0 0 0
G 0 0 1
A 0
T 0
T 0
C 0
G 0
T 0

Filling the rest of the table this way, we have:

C G A T A C T
0 0 0 0 0 0 0 0
G 0 0 1 1 1 1 1 1
A 0 0 1 2 2 2 2 2
T 0 0 1 2 3 3 3 3
T 0 0 1 2 3 3 3 4
C 0 1 1 2 3 3 4 4
G 0 1 2 2 3 3 4 4
T 0 1 2 2 3 3 4 5

The next step is to trace back the cells from which we obtained the highest alignment score i.e. 5 corresponding to the cell (7,7).

Let us look at the 3 cells from which m(7,7) could have arosen. Since T and T is a match, s(7,7) = 1.

m(i,j-1) + d = m(7,6) + 0 = 4 + 0 = 4

m(i-1,j) + d = m(6,7) + 0= 4 + 0 = 4

m(i-1,j-1) + s(i,j) = m(6,6) + 1 = 4 + 1 = 5

Therefore we obtained m(7,7) from m(6,6).

Next we trace back the cell (6,6) and so on and we obtain the following table:

phponFHfr.png

where the arrows show the trace-back of each cell. Therefore the alignment can be done as follows:

An arrow going diagonally means no gap. An arrow going upwards means that a gap needs to be inserted in the sequence that is depicted on the X axis and an arrow going to the left means that a gap needs to be inserted in the sequence depicted on the Y axis.

We go backwards for the alignment. We start with cell (7,7) which has a diagonal arrow emerging from it. So, we align the nucleotides corresponding to (7,7) i.e. T and T in both sequences.

T

T

Next, trace-back gives us (6,6) which has an upwards arrow emerging from it. So, we insert a gap in the sequence on the X axis.

- T

GT

Next, we have (5,6) which has a diagonal arrow. Therefore we align the corresponding nucleotides.

C - T

CGT

Applying this to the whole table, we get the following 3 alignments possible (with their respective scores with +1 for each match and 0 for each mismatch or gap):

C G A T A C - T

- G A T T C G T

0+1+1+1+0+1+0+1 = 5

OR

C G A - T A C - T

- G A T T - C G T

0+1+1+0+1+0+1+0+1 = 5

OR

C G A T A - C - T

- G A T - T C G T

0+1+1+1+0+0+1+0+1 = 5

Therefore the maximum alignment score is 5.

Notice that this is the same as the entry with maximum value in our initial table. So, if we are interested in only the maximum score and not the alignments themselves, the trace-back and further steps are not necessary and the maximum alignment score can be obtained from our alignment matrix itself.

Part 2:

Given
s(i,j) = 1.5 if vi = vj

s(i,j) = -1 if vi != vj

d = 0.25

This is similar to part 1 but it has a different scoring scheme. The initialization step is slightly different, since there is a non-zero score for insertion/deletions i.e. gaps.

C G A T A C T
0 0.25 0.5 0.75 1 1.25 1.5 1.75
G 0.25
A 0.5
T 0.75
T 1
C 1.25
G 1.5
T 1.75

We see that in the first row and first column, the entry in a given cell = entry of the sum above/to the left of it + 0.25 (gap penalty i.e. d). Since d was 0 in part 1, all the cells in the first row and first column had a value of 0. Since d is non-zero here, we follow the above.

Filling the rest of the alignment matrix is the same as part 1:

mi, j) = max mli, j - 1)+d mi-1,)+d mi-1, j-1) + si, j)

where s(i,j) = 1.5 for a match and -1 for a mismatch, and d = 0.25.

C G A T A C T
0 0.25 0.5 0.75 1 1.25 1.5 1.75
G 0.25 0.5 1.75 2 2.25 2.5 2.75 3
A 0.5 0.75 2 3.25 3.5 3.75 4 4.25
T 0.75 1 2.25 3.5 4.75 5 5.25 5.5
T 1 1.25 2.5 3.75 5 5.25 5.5 6.75
C 1.25 2.5 2.75 4 5.25 5.5 6.75 7
G 1.5 2.75 4 4.25 5.5 5.75 7 7.25
T 1.75 3 4.25 4.5 5.75 6 7.25 8.5

As discussed above, the maximum entry in the matrix corresponds to the maximum alignment score. Therefore the maximum alignment score is 8.5.

Trace-back gives us:

69 52 ST SET se 52

The following alignments are obtained:

C G A T A T C - T

- G A T - - C G T

0.25+1.5+1.5+1.5+0.25+0.25+1.5+0.25+1.5 = 8.5

OR

C G A - T A C - T

- G A T T - C G T

0.25+1.5+1.5+0.25+1.5+0.25+1.5+0.25+1.5 = 8.5

We see that the following alignment that we obtained from part 1 is missing:

C G A T A C - T

- G A T T C G T

This is because in part 2, the score for a mismatch was -1 but for a gap it was 0.25. However, in part 1, both mismatch and gap had the same score of 0. Since mismatch was more heavily penalized than gap in part 2, the above alignment which has an A/T mismatch was not obtained during the alignment process, which only takes the alignments with the maximum scores into account.

Add a comment
Know the answer?
Add Answer to:
Align the same two sequences in part one with the new scoring scheme: This question relates...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT