Question

Implement a deterministic finite automata (DFA) using C++ programming language to extract matchin...

Implement a deterministic finite automata (DFA) using C++ programming language to extract matching patterns from a given input DNA sequence string.

  1. Design a deterministic finite automata to recognize the regular expression A(A+T+G+C)*A + T(A+T+G+C)*T over the alphaber {A,T,G,C}. This regular expression recognize any string that starts and ends with ‘A’ or starts and ends with ‘T’.

  2. Write a program which asks the user to input a DNA sequence. The program should be able to extract all the patterns (substrings present in the DNA sequence) that match the regular expression given in 1. You MUST implement DFA from (1) to check if all possible substrings in the DNA sequence is a part of the regular expression or not. Below are two sample input/output. Only the bolded are user input. Use of external package or library for regular expression matching is not allowed.

Input a DNA sequence: CATTTGCAGGTG

Matching patterns are:
TT
TT
TTT
TTTGCAGGT
TTGCAGGT
TGCAGGT
ATTTGCA

Input a DNA sequence: TTTTAAAA

Matching patterns are:

AAAA AAA AA

TTTT TTT TT

(1) Show a digital copy of the diagram of the DFA.

(2) Show source code C++


0 0
Add a comment Improve this question Transcribed image text
Answer #1

#include <bits/stdc++.h>

using namespace std;

int checkEquality(string s)

{

    return (s[0] == s[s.size() - 1]);

}

  

int countSubstringWithEqualEnds(string s)

{

    int result = 0;

    int n = s.length();

  

    // Starting point of substring

    for (int i = 0; i < n; i++)

  

       // Length of substring

       for (int len = 1; len <= n-i; len++)

  

          // Check if current substring has same

          // starting and ending characters.

          if (checkEquality(s.substr(i, len)))

            result++;

  

    return result;

}

  

int main()

{

    string s("abcab");

    cout << countSubstringWithEqualEnds(s);

    return 0;

}

Add a comment
Know the answer?
Add Answer to:
Implement a deterministic finite automata (DFA) using C++ programming language to extract matchin...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • In this assignment, you wil implement a deterministic finite automata (DFA) using C++ programming...

    In this assignment, you wil implement a deterministic finite automata (DFA) using C++ programming language to extract matching patterns from a given input DNA sequence string. 1. Design a deterministic finite automata to recognize the regular expression A(A+T+G+C)*A + T(A+T+G+C)*T over the alphaber (A,T,G,C). This regular expression recognize any string that starts and ends with 'A' or starts and ends with 'T. or starts and ends with T In this assignment, you wil implement a deterministic finite automata (DFA) using...

  • In this assignment, you will implement a deterministic finite automata (DFA) using C++ programming language to...

    In this assignment, you will implement a deterministic finite automata (DFA) using C++ programming language to extract all matching patterns (substrings) from a given input DNA sequence string. The alphabet for generating DNA sequences is {A, T, G, C}. Write a regular expression that represents all DNA strings that contains at least two ‘A’s. Note: assume empty string is not a valid string. Design a deterministic finite automaton to recognize the regular expression. Write a program which asks the user...

  • I need to construct a deterministic finite automata, DFA M, such that language of M, L(M),...

    I need to construct a deterministic finite automata, DFA M, such that language of M, L(M), is the set of all strings over the alphabet {a,b} in which every substring of length four has at least one b. Note: every substring with length less than four is in this language. For example, aba is in L(M) because there are no substrings of at least 4 so every substring of at least 4 contains at least one b. abaaab is in...

  • Write a class for DFA type objects. Deterministic Finite Automata are commonly defined as a quintuple...

    Write a class for DFA type objects. Deterministic Finite Automata are commonly defined as a quintuple consisting of a set of states, a set of symbals, a transition function, a start state and a set of accept states For this implementation let the alphabet be given as a string of symbols, the transition function as list of lists which represent an n by m matrix where the n rows represent the states and the m columns represent the alphabet symbols,...

  • Given the following non-deterministic finite state machine: (c) a σ0 o1 σ2 b Find the input...

    Given the following non-deterministic finite state machine: (c) a σ0 o1 σ2 b Find the input set V, the accepting states set T, the states set S, and initial (i) state for the machine. (10/100) Write the transition table for the machine (ii) (10/100) (iii) Write the simplest phrase structure grammar, G=(V,T,S,P), for the machine (10/100) Rewrite the grammar you found in question 4(c)(iii) in BNF notation (iv) (10/100) (v) Is the string aabaaba an accepted string by the finite-state...

  • I need to learn this in c# Requirements: For this exercise you are to implement a small bioinform...

    I need to learn this in c# Requirements: For this exercise you are to implement a small bioinformatics library for operations with DNA sequences, which, in this exercise, are represented as strings of only the characters A, C, G, and T. Three methods are required for this exercise and they are each detailed below. Kmers The term k-mer refers to all the possible substrings of length k that are contained in a DNA sequence. For example, given the DNA sequence...

  • In python! Objective: students will be able to 1.access string elements by index operator 2.search for...

    In python! Objective: students will be able to 1.access string elements by index operator 2.search for simple pattern in strings Specification: Biologists use a sequence of letters A, C, T, and G to model a genome. A gene is a substring of a genome that starts after a triplet ATG and ends before a trilet TAG, TAA, or TGA. The length of a gene string is a multiple of 3 and the gene does not contain any of the triplets...

  • In this assignment, you will implement a Memory Management System(MMS). Using C Programming Language..... MAKE SURE...

    In this assignment, you will implement a Memory Management System(MMS). Using C Programming Language..... MAKE SURE YOU USE C PROGRAMMING Your MMS will handle all requests of allocation of memory space by different users (one thread per user) …. HINT(You will use Pthreads and Semaphores). Your MMS will provide the user with an interface for making memory requests and also for freeing up memory that is no longer needed by the user. One of the jobs of your memory management...

  • Programming language C The third picture is the code that is being refactored Student 1:10 PM...

    Programming language C The third picture is the code that is being refactored Student 1:10 PM 3296.- Function Specifications [15 points 5 bonus points Use appropriate loops to ensure that if the user enters on incorrect value (out-of-range) mare hece-for an input, that the user is given an accurate messoge and provided an opportunity te enter the value egain (fer every input) Use an appropriate loop to allow the user to execute the program over and over again up to...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT