The file genomic_dna.txt contains a section of genomic DNA, and
the file exons.txt contains a list of
start/stop positions of exons. Each exon is on a separate line and
the start and stop positions are
separated by a comma. The start and stop positions follow Python
conventions; they start from zero
and are inclusive at the start and exclusive at the end. Write a
program that will extract the exon
segments, concatenate them, and write them to a new file.
This is a tricky exercise with several parts. Before starting,
think about how to divide the complexity of
the problem to easier tasks.
Your program will have to:
read the exon file line by line
split each exon line into two numbers
turn those numbers into integers
extract the matching part of the genomic DNA sequence
concatenate all the exon sequences together
FILES:
exon.txt
5,58 72,133 190,276 340,398
gennomic_dna.txt
TCGATCGTACCGTCGACGATGCTACGATCGTCGATCGTAGTCGATCATCGATCGATCGACTGATCGATCGATCGATCGATCGATATCGATCGATATCATCGATGCATCGATCATCGATCGATCGATCGATCGATCGATCATATGTCAGTCGATGCATCGTAGCATCGTATAGTAGCTACGTAGCTACGATCGATCGATCGATCGTAGCTAGCTAGCTAGATCGATCATCATCGTAGCTAGCTCGACTAGCTACGTACGATCGATGCATCGATCGTAGCTAGTACGATCGCGTAGCTAGCATGCTACGTAGATCGATCGATGCATGCTAGCTAGCTAGCTACGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGTAGCTAGCTACGATCGATGCTACGTAGATCGATCGCTAGTAGATCGATCGCTAGCTAGCTGACTAGTACGCTGCTAGTAGTCAGCTAGATCGATGCTAGTCA
# dna.py
# Open file genomic_data.txt and store into fin file
object
fin = open("genonmic_dna.txt","r")
# Read the contents of the file and store to dna_sequence
dna_sequence = fin.read()
# Close the file object
fin.close()
# Open file exons.txt and store into fin file
object
fin = open("exon.txt","r")
# Read the contents of the file and split it by the lines and store
to a list called exons
exons = fin.read().splitlines()
# Close the file object
fin.close()
# String to store the final sequence
final_sequence = ""
# Iterate through each range in the exons list (every
value is in the form 'number,number' )
for exon in exons:
# Split it by the comma and store the two values to
starting_position and ending_position
starting_position, ending_position = exon.split(',')
# Currently starting_position and ending_position are simply
strings that represent a number
# Convert it to integer using int()
starting_position = int(starting_position)
ending_position = int(ending_position)
# Extract the matching part from the dna_sequence
final_sequence = final_sequence +
dna_sequence[starting_position:ending_position]
# Open file object for multiple.txt in write mode
fout = open("multiple.txt","w")
# Write the final_sequence
fout.write(final_sequence)
# Close the file object
fout.close()
Sample Input/Output
Suppose our genomic_dna.txt is as follows:
TCGATCGTACCGTCGACGATGCTACGATCGTCGATCGTAGTCGATCATCGATCGATCGACTGATCGATCGATCGATCGATCGATATCGATCGATATCATCGATGCATCGATCATCGATCGATCGATCGATCGATCGATCATATGTCAGTCGATGCATCGTAGCATCGTATAGTAGCTACGTAGCTACGATCGATCGATCGATCGTAGCTAGCTAGCTAGATCGATCATCATCGTAGCTAGCTCGACTAGCTACGTACGATCGATGCATCGATCGTAGCTAGTACGATCGCGTAGCTAGCATGCTACGTAGATCGATCGATGCATGCTAGCTAGCTAGCTACGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGTAGCTAGCTACGATCGATGCTACGTAGATCGATCGCTAGTAGATCGATCGCTAGCTAGCTGACTAGTACGCTGCTAGTAGTCAGCTAGATCGATGCTAGTCA
And exons.txt as follows:
5,58 72,133 190,276 340,398
then by running the python program, we get a new file multiple.txt whose content is as follows:
ACTTACGTACTA
The file genomic_dna.txt contains a section of genomic DNA, and the file exons.txt contains a list...
Write a Python function makeDict() that takes a filename as a parameter. The file contains DNA sequences in Fasta format, for example: >human ATACA >mouse AAAAAACT The function returns a dictionary of key:value pairs, where the key is the taxon name and the valus is the total number of 'A' and 'T' occurrences. For example, for if the file contains the data from the example above the function should return: {'human':4,'mouse':7}
The following genomic DNA sequence comes from the first exon of a human gene and contains the 3'-end of the 5'-untranslated region and the start of a long open reading frame that codes for 200 amino acids (a.k.a. coding sequence). Note: There are no introns in this short portion and only one strand of the genomic DNA is shown. Which of the following answers lists the first three amino acids of the translated protein correctly? Seconed Position tyr ser leu...
PYTHON The text file motifFinding.txt contains two strings of DNA code, separated by a new-line character. Write a program that opens the file, and stores its contents into two strings, s and t, respectively. Write code that will find all instances of the string t within the string s. At the end, your program should output the number of times the sub-string t occurs within s, along with the index of the starting position of each occurrence of t within...
In python Attached is a file called sequences.txt, it contains 3 sequences (one sequence per line). Also attached is a file called AccessionNumbers.txt. Write a program that reads in those files and produces 3 separate FATSA files. Each accession number in the AccessionNumbers.txt file corresponds to a sequence in the sequences.txt file. Remember a FASTA formatted sequence looks like this: >ABCD1234 ATGCTTTACGTCTACTGTCGTATGCTTTACGTCTACTGACTGTCGTATGCTTACGTCTACTGTCG The file name should match the accession numbers, so for 1st one it should be called ABCD1234.txt. Note:...
The deoxyribonucleic acid (DNA) is a molecule that contains the genetic instructions required for the development and functioning of all known living organisms. The basic double-helix structure of the DNA was co-discovered by Prof. Francis Crick, a long-time faculty member at UCSD 0 The DNA molecule consists of a long sequence of four nucleotide bases: adenine (A), cytosine (C), gua- nine (G) and thymine (T). Since this molecule contains all the genetic information of a living organism, geneticists are interested...
Random accesses to a file. A file contains a formatted list of 9999 integers that are randomly generated in the range of [1,9999]. Each integer occupies one single line and takes 4 characters' space per line. Alternatively, you can think that each number takes 5 characters' space, four for the number and one for the newline character. Write a C++ program using the seekg() and seekp() functions to insert the numbers 7777 through 7781 between the 6000-th and 6001-st numbers...
Write a Java program called EqualSubsets that reads a text file, in.txt, that contains a list of positive and negative integers (duplicates are possible) separated by spaces and/or line breaks. Zero may be included. After reading the integers, the program saves them in a singly linked list in the same order in which they appear in the input file. Then, without changing the linked list, the program should print whether there exists two subsets of the list whose sums are...
In either Java or Python 3, write a program that simulates a deterministic FSM. It will read from two input files. The first is a file describing an FSM The first line contains the alphabet as a series of characters separated by a single space - The second line contains the number of states as an integer k 2 1; states will be numbered 0,1,..., k -1. The start state is always state O The third line contains a series...
In either Java or Python 3, write a program that simulates a deterministic FSM. It will read from two input files. The first is a file describing an FSM The first line contains the alphabet as a series of characters separated by a single space - The second line contains the number of states as an integer k 2 1; states will be numbered 0,1,..., k -1. The start state is always state O The third line contains a series...
Exercise 6: Program exercise for 2D List Write a complete Python program including minimal comments (file name, your name, and problem description) that solves the following problem with the main function: Problem Specification: The following code reads values from the file object infile and stores them in the 2d list table2: (It assumes that each line contains values of elements in each row and the values are whitespace separated.) table2 = [] for line in infile: row=line.split() intRow = []...