Question

The following should be done in a python code editor and should run on your computer's...

The following should be done in a python code editor and should run on your computer's terminal.

1. Your task for this exercise is to generate an amino acid usage report with counts only, and in no particular order. Do this by opening the file below, stripping the lines without amino acid text, and using a dictionary to store each of the 21 amino acids used along with their count as the value. Your output should look like this:

T: 69645 G: 95475 V: 91683 Y: 36836 H: 29255 .....

2. Modify your script from #1 to display only the top 5 most frequently used amino acids and add their percentage use. The output should be like this:

L: 139002 (10.7%) A: 123885 (9.6%) G: 95475 (7.4%) V: 91683 (7.1%) I: 77836 (6.0%)

A small version of the text we were given is shown below:

>gi|170079664|ref|YP_001728984.1| thr operon leader peptide [Escherichia coli st

r. K-12 substr. DH10B]

MKRISTTITTTITITTGNGAG

>gi|170079665|ref|YP_001728985.1| bifunctional aspartokinase I/homeserine dehydr

ogenase I [Escherichia coli str. K-12 substr. DH10B]

MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDALPNISDAERI

FAELLTGLAAAQPGFPLAQLKTFVDQEFAQIKHVLHGISLLGQCPDSINAALICRGEKMSIAIMAGVLEA

RGHNVTVIDPVEKLLAVGHYLESTVDIAESTRRIAASRIPADHMVLMAGFTAGNEKGELVVLGRNGSDYS

AAVLAACLRADCCEIWTDVDGVYTCDPRQVPDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPC

LIKNTGNPQAPGTLIGASRDEDELPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMSRARISVVLIT

QSSSEYSISFCVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAIISVVGDGMRTLRGISAKFFAAL

ARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQMLFNTDQVIEVFVIGVGGVGGALLEQLKRQQSW

LKNKHIDLRVCGVANSKALLTNVHGLNLENWQEELAQAKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAV

ADQYADFLREGFHVVTPNKKANTSSMDYYHQLRYAAEKSRRKFLYDTNVGAGLPVIENLQNLLNAGDELM

KFSGILSGSLSYIFGKLDEGMSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGRELELADIE

IEPVLPAEFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNIDEDGVCRVKIAEVDGNDPLFK

VKNGENALAFYSHYYQPLPLVLRGYGAGNDVTAAGVFADLLRTLSWKLGV

0 0
Add a comment Improve this question Transcribed image text
Answer #1

1) Developing a program for this task involves following steps:

a) Declaring an array list of 21 amino acids

b) Declaring a dictionary to store usage count of each amino acid

c) Opening given file and reading each line in the file

d) Updating usage count for each amino acid for each line read. Note that lines starting with ">gi" are skipped as these lines do not contain amino acid text.

e) Finally, output the amino acid usage report #1.

Python program for generating amino acid usage report is given below.

File Name: generateAminoAcidReport-1.py

import sys

def main(args):
if len(args) < 2:
print 'Usage: generateAminoAcidReport-1.py <file>'
return

aminoAcids = [ 'A', 'C', 'D', 'E', 'F',
'G', 'H', 'I', 'K', 'L',
'M', 'N', 'P', 'Q', 'R',
'S', 'T', 'V', 'W', 'Y',
'C' ]
aminoAcidUsageCounts = { }

# Initialize amino acid usage counts to zero
for acid in aminoAcids:
aminoAcidUsageCounts[acid] = 0

# Compute amino acid usage counts
for line in open(args[1]):
# Skip lines starting with ">gi"
if line.startswith(">gi"):
continue

for acid in aminoAcids:
aminoAcidUsageCounts[acid] = aminoAcidUsageCounts[acid] + line.count(acid)

# Generate amino acid usage report
report = ""
for acid in aminoAcids:
report = report + acid + ":" + str(aminoAcidUsageCounts[acid]) + " "
print report

if __name__ == '__main__':
main(sys.argv)

Above python script file may be run in the following way.

python generateAminoAcidReport-1.py inputFile.txt

Sample output is shown below:

A:92 C:24 D:46 E:54 F:30 G:66 H:16 I:51 K:37 L:89 M:24 N:39 P:29 Q:30 R:47 S:52 T:42 V:69 W:4 Y:20 C:24

2) Developing a program for this task involves following additional steps:

Steps a) to d) are same as mentioned above

e) Compute total usage count

f) Generate report by iterating through only five amino acids with top count. Print count along with percentage for each of the top 5 amino acids.

Python program for generating amino acid usage report is given below.

File Name: generateAminoAcidReport-2.py

import sys
import operator

def main(args):
if len(args) < 2:
print 'Usage: generateAminoAcidReport-2.py <file>'
return

aminoAcids = [ 'A', 'C', 'D', 'E', 'F',
'G', 'H', 'I', 'K', 'L',
'M', 'N', 'P', 'Q', 'R',
'S', 'T', 'V', 'W', 'Y',
'C' ]
aminoAcidUsageCounts = { }

# Initialize amino acid usage counts to zero
for acid in aminoAcids:
aminoAcidUsageCounts[acid] = 0

# Compute amino acid usage counts
for line in open(args[1]):
# Skip lines starting with ">gi"
if line.startswith(">gi"):
continue

for acid in aminoAcids:
aminoAcidUsageCounts[acid] = aminoAcidUsageCounts[acid] + line.count(acid)

# Compute total usage count
totalUsageCount = 0
for acid in aminoAcids:
totalUsageCount = totalUsageCount + aminoAcidUsageCounts[acid]

# Generate top five amino acid usage count with percentage
report = ""
for acid, count in sorted(aminoAcidUsageCounts.items(), key=operator.itemgetter(1), reverse=True)[:5]:
percent = count * 100.0 / totalUsageCount
report = report + acid + ": " + str(count) + " (" + "%.1f" % percent + "%) "
print report

if __name__ == '__main__':
main(sys.argv)

Above python script file may be run in the following way.

python generateAminoAcidReport-2.py inputFile.txt

Sample output is shown below:

A: 92 (10.4%) L: 89 (10.1%) V: 69 (7.8%) G: 66 (7.5%) E: 54 (6.1%)

Add a comment
Know the answer?
Add Answer to:
The following should be done in a python code editor and should run on your computer's...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • You need not run Python programs on a computer in solving the following problems. Place your...

    You need not run Python programs on a computer in solving the following problems. Place your answers into separate "text" files using the names indicated on each problem. Please create your text files using the same text editor that you use for your .py files. Answer submitted in another file format such as .doc, .pages, .rtf, or.pdf will lose least one point per problem! [1] 3 points Use file math.txt What is the precise output from the following code? bar...

  • Create a hangman program. Sample output from your program should look like the following: Current Status...

    Create a hangman program. Sample output from your program should look like the following: Current Status for userInputs= _ _ _ _ _ _ Enter next letter: a Current Status for userInputs=a _ _ _ _ _ _ Enter next letter: e Current Status for userInputs=ae _ _ _ _ e _ Enter next letter: i Current Status for userInputs=aei _ _ _ _ e _ Enter next letter: o Current Status for userInputs=aeio _ o _ _ e _...

  • Overview The purpose of this activity is to help the students to understand how replication, tran...

    TranslationOverview:The purpose of this activity is to help the students to understand how replication, transcription, and translation are connected. Students will use a sequence from a bacterial gene that confers resistance to antibiotics (carbapenems). They will be asked to apply the knowledge obtained in the class lecture to (1) find the promoter in the sequence, (2) determine the amino acid sequence of a fragment of the polypeptide, (3) "reverse translate" a fragment of the polypeptide, and (4) identify mutations in...

  • MATLAB ONLY gauss.jpg BELOW Instructions: The following problems can be done interactively or by writing the...

    MATLAB ONLY gauss.jpg BELOW Instructions: The following problems can be done interactively or by writing the commands iın an M-file (or by a combination of the two). In either case, record all MATLAB input commands and output in a text document and edit it according to the instructions of LAB 1 and LAB 2. For problem 2, include a picture of the rank-1 approximation. For problem 3, include a picture of the rank-10 approximation and for problem 4, include a...

  • Need answers. thank you VOCABULARY BUILDER Misspelled Words Find the words below that are misspelled; circle...

    Need answers. thank you VOCABULARY BUILDER Misspelled Words Find the words below that are misspelled; circle them, and then correctly spell them in the spaces provided. Then fill in the blanks below with the correct vocabulary terms from the following list. amino acids digestion clectrolytes nutrients antioxident nutrition basal metabolic rate extracellulare oxydation calories fat-soluble presearvatives catalist glycogen processed foods cellulose homeostasis saturated fats major mineral coenzyeme trace minerals diaretics metabolism water-soluable 1. Artificial flavors, colors, and commonly added to...

  • All of the following questions are in relation to the following journal article which is available...

    All of the following questions are in relation to the following journal article which is available on Moodle: Parr CL, Magnus MC, Karlstad O, Holvik K, Lund-Blix NA, Jaugen M, et al. Vitamin A and D intake in pregnancy, infant supplementation and asthma development: the Norwegian Mother and Child Cohort. Am J Clin Nutr 2018:107:789-798 QUESTIONS: 1. State one hypothesis the author's proposed in the manuscript. 2. There is previous research that shows that adequate Vitamin A intake is required...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT