Question

Roadmap To start, use the provided template file (on Blackboard): project_01_template.py. Replace the pass statements with yo4. The next step is to write a function named forward_frames that takes one argument seq. This function will identify all theIdentify ORFs in the provided genomic segment 1. Run gene_finder on the human_chr9_segment.fasta file with these arguments: m5 def read_one_seq_fasta(fasta file): Read a FASTA file that contains one sequence. *** seg = with open(fasta_file, r) as54 # Tests for one_frame function. Should print True in all cases. 55 print(\none_frame Tests) 56 print (one_frame(ATGTGAA

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Code implemented in python:

Note: Comments are written, minimal tests are performed to check if code is working

Code:

def read_one_seq_fasta(fasta_file):
seq=''
with open(fasta_file,'r') as f:
f.readline()
for line in f.readlines():
seq = seq + line[:-1]
return seq
def get_orf(seq):
'''this func finds orf when seq starts with ATG and ends in but doesnt include stop codon'''
cod = -3 #started with - 3 to account for the early counter change
while cod < len(seq):
cod +=3
codon = seq[cod:cod+3]
if codon in ['TGA','TAG','TAA']: #cuts off the stop codon from final seq
return seq[:cod]
return(seq)

def one_frame(seq):
'''This func outputs a list of the orfs in inputted sequence'''
nuc = -3 #started with -3 to account for early counter change
orf_list = [] #template for final list
while nuc < len(seq):
nuc += 3
if seq[nuc:nuc+3] == 'ATG':
orf_list.append(get_orf(seq[nuc:])) #calls get_orf when finds 'ATG'
nuc = nuc+len(get_orf(seq[nuc:])) #this length accounts for the length of the orf and adds to origanl
return orf_list


def forward_frames(seq):
'''This func finds all the possible orfs in a sequence places them all in one list'''
total_list = [] #created to be used as template for final list
slic = 0
while slic < 3:
total_list.extend(one_frame(seq[slic:])) #used extend to have only one list of all the orfs
slic += 1
return total_list

# copy and pasted this function from lab#5
def gc_content(seq):
'''This func returns the fraction of G and C in DNA'''
num_g = seq.count('G')
num_c = seq.count('C')
tot_gc = num_c + num_g
fract_gc = tot_gc / len(seq)
return fract_gc

def gene_finder(file_name, min_len, minGC):
'''this func takes all the orfs in a given file with the given requirements'''
final_list = []
sal = open(file_name, 'r')
contents = sal.read()
orf = find_all_orfs(contents)
index = 0
for seq in orf: #for each sequence in that list
if (len(orf[index]) >= min_len) and (gc_content(orf[index]) >= minGC): #parameter requirments
one_list = [] #created to be added in the final list
one_list.append(seq)
one_list.append(len(seq))
one_list.append(gc_content(seq))
final_list.append(one_list)
#print(index)
index += 1
sal.close()
print(final_list)
print(gc_content('ATGTGAA'))
print(get_orf('ATGTGAA'))
print(forward_frames('ATGATGAGATGAACCATGGGGTAA'))

Code Screenshots:

au AWN cadenen en ek na kina re A RAR 1 def read_one_seq_fasta(fasta file): seq= with open(fasta_file, r) as f: f.readlinUIT 115L.dppenugel Ullsey UL. #DIIS gel UIT Wien TITUS AIG nuc = nuc+len(get_orf(seq[nuc:])) #this length accounts for the le

Code Output (Few tests):

0.2857142857142857
ATG
['ATGATGAGA', 'ATGGGG', 'ATGAACCATGGGGTAA']

Working code output screenshot:

0.2857142857142857 ATG [ATGATGAGA, ATGGGG, ATGAACCATGGGGTAA ]

If you like my answer, hit thumbs up . Thank you.

Add a comment
Know the answer?
Add Answer to:
Roadmap To start, use the provided template file (on Blackboard): project_01_template.py. Replace the pass statements with...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Please develop a Java program to read in a piece of DNA sequence from a FASTA format sequence fil...

    Please develop a Java program to read in a piece of DNA sequence from a FASTA format sequence file (alternatively you can use the getRandomSeq(long) method of the RandomSeq class to generate a piece of DNA sequence), and then print out all the codons in three forward reading frames. Design a method called codon() that can be used to find all the codons from three reading frames. The method will take in an argument, the reading frame (1, 2, or...

  • python 2..fundamentals of python 1.Package Newton’s method for approximating square roots (Case Study 3.6) in a...

    python 2..fundamentals of python 1.Package Newton’s method for approximating square roots (Case Study 3.6) in a function named newton. This function expects the input number as an argument and returns the estimate of its square root. The script should also include a main function that allows the user to compute square roots of inputs until she presses the enter/return key. 2.Convert Newton’s method for approximating square roots in Project 1 to a recursive function named newton. (Hint: The estimate of...

  • In this problem, you should write one function named copy and increment. This function will have...

    In this problem, you should write one function named copy and increment. This function will have one parameter, which you can assume will be a list of integers. This function should return a copy of the parameter list, in which each number from the parameter list has been increased by 1. The function should not modify the values in the parameter list. For example, the code: values - 20, 40, 10, 60, 77, 2) other copy and incrementales) print values...

  • # DISCUSSION SECTION WORK: # # 1. STUDENTS: download this file, ds4.py, and wordsMany.txt, from #...

    # DISCUSSION SECTION WORK: # # 1. STUDENTS: download this file, ds4.py, and wordsMany.txt, from # http://www.cs.uiowa.edu/~cremer/courses/cs1210/etc/ds4/ # Save both in the same folder. # # 2. TA (aloud) and STUDENTS: Read the comments from START HERE! (just after these instructions) # to definition of anagramInfo function. Discuss any questions about what the functions should do. # # 3. TA demonstrate running anagramInfo("wordsMany.txt") on this unchanged file, to # see that it behaves reasonably despite having incomplete anagram-testing functions. #...

  • I'm a bit confused on how to get this program to run right. Here are the...

    I'm a bit confused on how to get this program to run right. Here are the directions: Part 1: Write a Python function called reduceWhitespace that is given a string line and returns the line with all extra whitespace characters between the words removed. For example, ‘This line has extra space characters ‘  ‘This line has extra space characters’ Function name: reduceWhitespace Number of parameters: one string line Return value: one string line The main file should handle the...

  • + Run C Code IMPORTANT: • Run the following code cell to create the input file,...

    + Run C Code IMPORTANT: • Run the following code cell to create the input file, biostats.csv, which you will be using later. 74, In [ ]: N %%file biostats.csv Name, Sex, Age, Alex, M, 41, Bert, M, 42, Dave, M, 39, Elly, F, 30, Fran, F, 33, Jake, M, F, Luke, M, 34, F Myra, M, M, 38, Ruth, F, 28, 22 22 323 47 47, Height, Weight 170 200 167 70 115 143 139 280 98 75, 350...

  • C++: Translating mRNA sequence help Homework Description Codon 1 You are working in a bioinformatics lab...

    C++: Translating mRNA sequence help Homework Description Codon 1 You are working in a bioinformatics lab studying messenger RNA (mRNA) sequences. mRNA is a sequence of the nucleotide bases (Adenine, Cytosine, Guanine, and Uracil) that conveys information stored in DNA to Ribosomes for translation into proteins. The bases in the sequences are denoted by the first letters of the nucleotide bases (e.g. A, C, G, and U). A sequence of mRNA is made up of hundres to thousands of nucleotide...

  • In C++ please! Function 2. Find Best Match (40 points) We will use the term genome...

    In C++ please! Function 2. Find Best Match (40 points) We will use the term genome to refer to the string that represents the complete set of genes in an organism, and sequence to refer to some substring or sub-sequence in the genome. Write a function called findBestMatch that takes a genome and a sequence and returns the similarity score of the best match found in the genome as a float. HINT: Problem 3 from Recitation 5 is very similar...

  • use MatLab to answer these questions 1. (10 points) Create an m-file called addup.m Use a...

    use MatLab to answer these questions 1. (10 points) Create an m-file called addup.m Use a for loop with k = 1 to 8 to sum the terms in this sequence: x(k) = 1/3 Before the loop set sumx = 0 Then add each term to sumx inside the loop. (You do not need to store the individual values of the sequence; it is sufficient to add each term to the sum.) After the loop, display sumx with either disp()...

  • FUNCTIONS In this assignment, you will revisit reading data from a file, and use that data...

    FUNCTIONS In this assignment, you will revisit reading data from a file, and use that data as arguments (parameters) for a number of functions you will write. You will need to: Write and test a function square_each(nums) Where nums is a (Python) list of numbers. It modifies the list nums by squaring each entry and replacing its original value. You must modify the parameter, a return will not be allowed! Write and test a function sum_list(nums) Where nums is a...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT