Question


The deoxyribonucleic acid (DNA) is a molecule that contains the genetic instructions required for the development and functioning of all known living organisms. The basic double-helix structure of the DNA was co-discovered by Prof. Francis Crick, a long-time faculty member at UCSD 0 The DNA molecule consists of a long sequence of four nucleotide bases: adenine (A), cytosine (C), gua- nine (G) and thymine (T). Since this molecule contains all the genetic information of a living organism, geneticists are interested in understanding the roles of the variuos DNA sequence patterns that are con- tinuously being discovered worldwide. One of the most common methods to identify the role of a DNA sequence is to compare it with other DNA sequences, whose functionality is already known. The more similar such DNA sequences are, the more likely it is that they will function similarly Your task is to write a C program, called dna.c, that reads three DNA sequences from a file called dna input.dat and prints the results of a comparison between each pair of sequences to the file dna output.dat. The input file dna.input.dat consists of three lines. Each line is a single se- quence of characters from the set fa, c, G, T), that appear without spaces in some order, terminated by the end of line character n. You can assume that the three lines contain the same number of characters, and that this number is at most 241 (including the character \n). Here is a sample input file ACGTTTTAAGGGCTGAGCTAGTCAGTTCATCGCGCGCGTATATCCTCGATCGATCATTCT CTCTAGACGTTTT AGTCAGTTC ACGTTTTAAGGGCTTAGAGCTTATGCTAATCGCGCGCGTATATCCTCGATCGATCATTCT AGTTAGTTAGTTCATCGGCGGCGTATATCCTCGATCGATCATTCT CTCTAGACGTTTTAAGGGCTGAGCCGGTCAGTTA Each of the three lines (shown with wrap-around above) consists of 95 characters: the 94 letters from (A, C, G, T) and the character In (not shown). The output file dna.output.dat must be structured as follows. For each pair of sequences #1 and #j, with i,je {1,2,3) and i > j, you should print ·A single line, saying Comparison between sequence #i and sequence ·The entire sequence #1 in the first row, and the entire sequence #1 in the third row. : e The comparison between the two sequences in the second (middle) row. This should be printed as follows. For each position, if the two bases are the same in both sequences then the corresponding base letter (one of A, C, G, T) should be printed; otherwise a blank should be printed. . A single line, saying The overlap percentage is xwhere x is a floating-point num- ber which measures the pencentage of letters that match in the two sequences. This number should be printed with a single digit of precision after the decimal point. Each line in the output file dna output.dat should contain at most 61 characters, including the end of line character \n. If the DNA sequences are longer than that, then each of the three rows mentioned

media%2F264%2F264aff17-3ffa-47f5-a33a-1a

0 0
Add a comment Improve this question Transcribed image text
Request Professional Answer

Request Answer!

We need at least 10 more requests to produce the answer.

0 / 10 have requested this problem solution

The more requests, the faster the answer.

Request! (Login Required)


All students who have requested the answer will be notified once they are available.
Know the answer?
Add Answer to:
The deoxyribonucleic acid (DNA) is a molecule that contains the genetic instructions required for the development...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Similar Homework Help Questions
  • Question #2 A DNA molecule can be specified using a string of the characters , ‘g...

    Question #2 A DNA molecule can be specified using a string of the characters , ‘g 'a', 't'. Each of these characters represents one of the four nucleobases Cytosine, Guanine, Adenine, and Thymine. Consult the wikipedia page on DNA for more details. A codon consists of a sequence of three DNA nucle- obases and can be represented by a string of length 3 consisting of characters from the set ‘c', ‘g', ‘a", ‘t'. So "cga" and "ttg" are examples of...

  • ***** PSEUDOCODE PLEASE****** ***** PSUDOCODE PLEASE****** Program 0 (Warm-up): Deoxyribonucleic acid, or DNA, is comprised of...

    ***** PSEUDOCODE PLEASE****** ***** PSUDOCODE PLEASE****** Program 0 (Warm-up): Deoxyribonucleic acid, or DNA, is comprised of four bases: (G)uanine, (C)ytosine, (A)denine and (T)hymine.  Ribonucleic acid, or RNA, is different than DNA in that it contains no Thymine; thymine is replaced with something called (U)racil.  For this assignment, you will create an array of 255 characters.  You must start by filling the array with random characters of G, C, A and T.   You must then print out the array.  Next, replace all the instances of Thymine...

  • Requirements: For this exercise you are to implement a small bioinformatics library for operations with DNA...

    Requirements: For this exercise you are to implement a small bioinformatics library for operations with DNA sequences, which, in this exercise, are represented as strings of only the characters A, C, G, and T. Three methods are required for this exercise and they are each detailed below. Kmers The term k-mer refers to all the possible substrings of length k that are contained in a DNA sequence. For example, given the DNA sequence AGATCGAGTG the 3-mers are: AGA GAT ATC...

  • C Program In this assignment you'll write a program that encrypts the alphabetic letters in a...

    C Program In this assignment you'll write a program that encrypts the alphabetic letters in a file using the Vigenère cipher. Your program will take two command line parameters containing the names of the file storing the encryption key and the file to be encrypted. The program must generate output to the console (terminal) screen as specified below. Command Line Parameters Your program must compile and run from the command line. The program executable must be named “vigenere” (all lower...

  • Genes are unique segements of A, T, C, and G sequences in our DNA. Each piece...

    Genes are unique segements of A, T, C, and G sequences in our DNA. Each piece of our DNA contains many genes. The 46 human chromosomes contain around 20,000 genes. Gene 2 DNA molecule Linus Pauling's research was centered on the the function of one particular gene, the HBB gene. His data indicated that the cell uses the information in the HBB gene as a "recipe" to construct hemoglobin. The hemoglobin recipe in normal individuals was accurate so their cells...

  • What’s the C++ code to this? So that my output is: CCTAGAATG | | X |...

    What’s the C++ code to this? So that my output is: CCTAGAATG | | X | | X | | GGACCTAAC Validity: 77.7778% Stability: 57.1429% Part #02 The goal is to write a complete C++ program that inputs 2 strings from the keyboard, where each string denotes a DNA strand such as CCTAGAATG. Assume the 2 strings are the same length. The program will then CS 109: htp:/bwww.csic.edu i109 Page I of 3 line up the two strands to see...

  • ASSIGNMENT For the DNA sequence given below, write the complementary DNA sequence that would complete the...

    ASSIGNMENT For the DNA sequence given below, write the complementary DNA sequence that would complete the double-strand.   DNA    3’—A   T   T  G   C   T   T   A  C   T   T  G   C   A   T -- 5’ DNA    5’-- Does it matter which strand is the ‘code strand’? The following two sequences look identical, except one runs 3’-5’ and the other 5’-3’. For each DNA sequence given below, write the mRNA sequence that would be coded from it. Make sure you indicate the direction of each mRNA strand (i.e. 3’ and 5’ ends).  Use the Universal triplet code to...

  • 1. (1 points) A deletion mutation in the leader sequence of the trp operon removes the...

    1. (1 points) A deletion mutation in the leader sequence of the trp operon removes the two tryptophan codons that are involved in attenuation. Predict the effect of this mutation on the expression of the trp structural genes in E. coli cells grown in media that lacks tryptophan. 2. (2 points) What protein family members are the main protein components of the RISC complex? How does the RISC complex target specific mRNAs for silencing? 3. (3 points) In bacteria, the...

  • array ,functions and loop for a c++ program

    DNA analysis is an important part of medical exploration and discovery.  In this assignment^*, we examine both hemoglobin genes foreach of four people, with each gene consisting of 444 DNA bases (A,C,T,G).  These are the four bases (letters) that make up DNA.Sickle-cell anemia is a disorder that is caused by a single mutation in the hemoglobin gene.A sickle hemoglobin gene has a T in the 20th position.A person is anemic if he/she has two sickle hemoglobin genes.A person is a...

  • 1 Overview and Background Many of the assignments in this course will introduce you to topics in ...

    1 Overview and Background Many of the assignments in this course will introduce you to topics in computational biology. You do not need to know anything about biology to do these assignments other than what is contained in the description itself. The objective of each assignment is for you to acquire certain particular skills or knowledge, and the choice of topic is independent of that objective. Sometimes the topics will be related to computational problems in biology, chemistry, or physics,...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT