The solution is done with python code and using score_matrix for dynamic programming.
1. recurrence relation is :
score_matrix[i,j] = max
(score_matrix[i – 1, j – 1] + GET_SCORE(X[i – 1], Y[j – 1]),
score_matrix[i, j – 1] -0.2
score_matrix[i – 1, j] -0.2);
here score_matrix[i,j] is the ith rown and jth column of the matrix, the first parameter of max calculate the score in case of match or mismatch of elements, the second parameter calculate the score in case element in 1st string is replaced by _ , and the last parameter calculate the score in case element in 2nd string is replaced by _ .
2. first three rows of the table are:
[[ 0.0, -1.0, -2.0, -3.0, -4.0, -5.0, -6.0,
-7.0, -8.0, -9.00, -10.0, -11.0,-12.0, -13.0,
-14.0,-15.0, -16.0,-17.0,-1.8.0 ,-19.0, -20.0]
[-1.0, -10.1, 0.0, -0.2, -0.4, -0.6, -0.8, -1.0, -1.2, -1.4,-1.6,
-1.8, -2.0, -2.2, -2.4, -2.6, -2.8, -3.0,
-3.2, -3.4, -3.6]
[-2.0, -0.3, -0.2, -0.1, 0.8, 0.6, 0.4, 0.2, 0, -0.2,
-0.4, -0.6, -0.8, -1.0, -1.2, -1.4, -1.6,
-1.8, -2.0, -2.2, -2.4]]
overall similarity code: 8.3
3. alignments:
-
C
T-
TC
-T-
TTC
--T-
GTTC
C--T-
CGTTC
-C--T-
ACGTTC
--C--T-
TACGTTC
---C--T-
ATACGTTC
----C--T-
CATACGTTC
-----C--T-
CCATACGTTC
------C--T-
CCCATACGTTC
-------C--T-
CCCCATACGTTC
--------C--T-
ACCCCATACGTTC
---------C--T-
TACCCCATACGTTC
----------C--T-
ATACCCCATACGTTC
-----------C--T-
AATACCCCATACGTTC
------------C--T-
TAATACCCCATACGTTC
-------------C--T-
CTAATACCCCATACGTTC
G-------------C--T-
GCTAATACCCCATACGTTC
-G-------------C--T-
AGCTAATACCCCATACGTTC
T-G-------------C--T-
-AGCTAATACCCCATACGTTC
GT-G-------------C--T-
--AGCTAATACCCCATACGTTC
CGT-G-------------C--T-
---AGCTAATACCCCATACGTTC
GCGT-G-------------C--T-
----AGCTAATACCCCATACGTTC
GGCGT-G-------------C--T-
-----AGCTAATACCCCATACGTTC
CGGCGT-G-------------C--T-
------AGCTAATACCCCATACGTTC
GCGGCGT-G-------------C--T-
-------AGCTAATACCCCATACGTTC
GGCGGCGT-G-------------C--T-
--------AGCTAATACCCCATACGTTC
TGGCGGCGT-G-------------C--T-
---------AGCTAATACCCCATACGTTC
CTGGCGGCGT-G-------------C--T-
----------AGCTAATACCCCATACGTTC
GCTGGCGGCGT-G-------------C--T-
-----------AGCTAATACCCCATACGTTC
CGCTGGCGGCGT-G-------------C--T-
------------AGCTAATACCCCATACGTTC
ACGCTGGCGGCGT-G-------------C--T-
-------------AGCTAATACCCCATACGTTC
AACGCTGGCGGCGT-G-------------C--T-
--------------AGCTAATACCCCATACGTTC
GAACGCTGGCGGCGT-G-------------C--T-
---------------AGCTAATACCCCATACGTTC
TGAACGCTGGCGGCGT-G-------------C--T-
----------------AGCTAATACCCCATACGTTC
GTGAACGCTGGCGGCGT-G-------------C--T-
-----------------AGCTAATACCCCATACGTTC
best alignment: GTGAACGCTGGCGGCGT-G-------------C--T-
-----------------AGCTAATACCCCATACGTTC
4. code for this:
import numpy as np d={'A':{'A':1,'G':-0.1,'C':-0.1,'T':-0.15},'G':{'A':-0.1,'G':1,'C':-0.15,'T':-0.1},'C':{'A':-0.1,'G':-0.15,'C':1,'T':-0.1},'T':{'A':-0.15,'G':-0.1,'C':-0.1,'T':1},} def GET_SCORE(n1, n2, penalty=-1, reward=1): return d[n1][n2] def global_alignment(X, Y, penalty=-1, reward=1): # initialize score matrix score_matrix = np.ndarray((len(X) + 1, len(Y) + 1)) for i in range(len(X) + 1): score_matrix[i, 0] = penalty * i for j in range(len(Y) + 1): score_matrix[0, j] =penalty * j # define each cell in the matrix by as the max score possible in that stage for i in range(1, len(X) + 1): for j in range(1, len(Y) + 1): match = score_matrix[i - 1, j - 1] + GET_SCORE(X[i - 1], Y[j - 1], penalty, reward) delete = score_matrix[i - 1, j] - 0.2 insert = score_matrix[i, j - 1] - 0.2 score_matrix[i, j] = max([match, delete, insert]) print(score_matrix) print() print(np.amax(score_matrix)) i = len(X) j = len(Y) align_X = "" align_Y = "" while i > 0 or j > 0: current_score = score_matrix[i, j] left_score = score_matrix[i - 1, j] if i > 0 and j > 0 and X[i - 1] == Y[j - 1]: align_X = X[i - 1] + align_X align_Y = Y[j - 1] + align_Y i = i - 1 j = j - 1 elif i > 0 and current_score == left_score + penalty: align_X = X[i - 1] + align_X align_Y = "-" + align_Y i = i - 1 else: align_X = "-" + align_X align_Y = Y[j - 1] + align_Y j = j - 1 print(align_X) print(align_Y) print() return align_X, align_Y X='GTGAACGCTGGCGGCGTGCT' y='AGCTAATACCCCATACGTTC' print(global_alignment(X,y))
Problem 2: Sequence similarity measure. Let 3 and y be two given DNA sequences, represented...