Problem

The mapping of genomes involves a large array of difficult computational problems. At the...

The mapping of genomes involves a large array of difficult computational problems. At the most basic level, each of an organism s chromosomes can be viewed as an extremely long string (generally containing millions of symbols) over the four-letter alphabet {A, C, G, T}. One family of approaches to genome mapping is to generate a large number of short, overlapping snippets from a chromosome, and then to infer the full long string representing the chromosome from this set of overlapping substrings.

While we won't go into these string assembly problems in full detail, here's a simplified problem that suggests some of the computational difficulty one encounters in this area. Suppose we have a set S = {s1, s2,…,sn} of short DNA strings over a q-letter alphabet; and each string st has length 21, for some number I > 1. We also have a library of additional strings T = {t1, t2,..., tm} over the same alphabet; each of these also has length 21. In trying to assess whether the string sb might come directly after the string sa in the chromosome, we will look to see whether the library T contains a string tk so that the first I symbols in tk are equal to the last I symbols in sa, and the last I symbols in tk are equal to the first I symbols in sb. If this is possible, we will say that tk corroborates the pair (sa, sb). (In other words, tk could be a snippet of DNA that straddled the region in which sb directly followed sa.)

Now we d like to concatenate all the strings in S in some order, one after the other with no overlaps, so that each consecutive pair is corroborated by some string in the library T. That is, we d like to order the strings in S as sq, si2,…, sin, where i1, i2,..., in is a permutation of {1,2,..., n}, so that for each j = 1,2,..., n - 1, there is a string tk that corroborates the pair (sij, sij+1). (The same string tk can be used for more than one consecutive pair in the concatenation.) If this is possible, we will say that the set S has a perfect assembly.

Given sets S and T, the Perfect Assembly Problem asks: Does S have a perfect assembly with respect to T? Prove that Perfect Assembly is NP-complete.

Example. Suppose the alphabet is {A, C, G, T}, the set S = {AG, TC, TA}, and the set T = {AC, CA, GC, GT} (so each string has length 21 = 2). Then the answer to this instance of Perfect Assembly is yes: We can concatenate the three strings in S in the order TCAGTA (so si1 = s2, si2 = s1, and si3 = s3). In this order, the pair (si1, si2) is corroborated by the string CA in the library T, and the pair (si2, si3) is corroborated by the string GT in the library T.

Step-by-Step Solution

Request Professional Solution

Request Solution!

We need at least 10 more requests to produce the solution.

0 / 10 have requested this problem solution

The more requests, the faster the answer.

Request! (Login Required)


All students who have requested the solution will be notified once they are available.
Add your Solution
Textbook Solutions and Answers Search
Solutions For Problems in Chapter 8