Question

Overview: Pattern-Matching (aka String Search) is the process of algorithmically finding copies of a pattern P...

Overview:

Pattern-Matching (aka String Search) is the process of algorithmically finding copies of a pattern P inside a (generally much larger) text T. The goal is to implement and compare four classical string-matching algorithms.

Input:

Your code should work for any text either inputted directly or read in from a file.

However, for testing - input file has been provided:

  • The Gettysburg Address (by President Abraham Lincoln, 1863)

You should minimally search for these three patterns in each text: FREE, BRAVE, NATION.

You should convert the entire string to upper case and conduct your searches accordingly.

Methodology and Output:

  1. Implement the Brute Force Algorithm

  2. Implement the Knuth-Morris-Pratt Algorithm

  3. For each algorithm and text/pattern combination

    a)Track and tabulate the number of character-comparison operations   b)Track and tabulate the clock time for each algorithm

Input file for test:

The Gettysburg Address.txt

Fourscore and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battlefield of that war. We have come to dedicate a portion of that field as a final resting-place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this. But, in a larger sense, we cannot dedicate — we cannot consecrate — we cannot hallow — this ground. The brave men, living and dead, who struggled here have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us — that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion — that we here highly resolve that these dead shall not have died in vain — that this nation shall have a new birth of freedom and that government of the people, by the people, for the people, shall not perish from the earth.

JAVA/C++ thanks.

0 0
Add a comment Improve this question Transcribed image text
Answer #1
// C++ program for implementation of KMP pattern searching 
// algorithm 
#include <bits/stdc++.h> 

void computeLPSArray(char* pat, int M, int* lps); 

// Prints occurrences of txt[] in pat[] 
void KMPSearch(char* pat, char* txt) 
{ 
        int M = strlen(pat); 
        int N = strlen(txt); 

        // create lps[] that will hold the longest prefix suffix 
        // values for pattern 
        int lps[M]; 

        // Preprocess the pattern (calculate lps[] array) 
        computeLPSArray(pat, M, lps); 

        int i = 0; // index for txt[] 
        int j = 0; // index for pat[] 
        while (i < N) { 
                if (pat[j] == txt[i]) { 
                        j++; 
                        i++; 
                } 

                if (j == M) { 
                        printf("Found pattern at index %d ", i - j); 
                        j = lps[j - 1]; 
                } 

                // mismatch after j matches 
                else if (i < N && pat[j] != txt[i]) { 
                        // Do not match lps[0..lps[j-1]] characters, 
                        // they will match anyway 
                        if (j != 0) 
                                j = lps[j - 1]; 
                        else
                                i = i + 1; 
                } 
        } 
} 

// Fills lps[] for given patttern pat[0..M-1] 
void computeLPSArray(char* pat, int M, int* lps) 
{ 
        // length of the previous longest prefix suffix 
        int len = 0; 

        lps[0] = 0; // lps[0] is always 0 

        // the loop calculates lps[i] for i = 1 to M-1 
        int i = 1; 
        while (i < M) { 
                if (pat[i] == pat[len]) { 
                        len++; 
                        lps[i] = len; 
                        i++; 
                } 
                else // (pat[i] != pat[len]) 
                { 
                        // This is tricky. Consider the example. 
                        // AAACAAAA and i = 7. The idea is similar 
                        // to search step. 
                        if (len != 0) { 
                                len = lps[len - 1]; 

                                // Also, note that we do not increment 
                                // i here 
                        } 
                        else // if (len == 0) 
                        { 
                                lps[i] = 0; 
                                i++; 
                        } 
                } 
        } 
} 

// Driver program to test above function 
int main() 
{ 
        char txt[] = "ABABDABACDABABCABAB"; 
        char pat[] = "ABABCABAB"; 
        KMPSearch(pat, txt); 
        return 0; 
} 
  • Output:
    Found pattern at index 10

Searching Algorithm: KMP (Knuth Morris Pratt)
Unlike Naive algorithm, where we slide the pattern by one and compare all characters at each shift, we use a value from lps[] to decide the next characters to be matched. The idea is to not match a character that we know will anyway match.

How to use lps[] to decide next positions (or to know a number of characters to be skipped)?

  • We start comparison of pat[j] with j = 0 with characters of current window of text.
  • We keep matching characters txt[i] and pat[j] and keep incrementing i and j while pat[j] and txt[i] keep matching.
  • When we see a mismatch
    • We know that characters pat[0..j-1] match with txt[i-j…i-1] (Note that j starts with 0 and increment it only when there is a match).
    • We also know (from above definition) that lps[j-1] is count of characters of pat[0…j-1] that are both proper prefix and suffix.
    • From above two points, we can conclude that we do not need to match these lps[j-1] characters with txt[i-j…i-1] because we know that these characters will anyway match. Let us consider above example to understand this.
 
txt[] = "AAAAABAAABA" 
pat[] = "AAAA"
lps[] = {0, 1, 2, 3} 

i = 0, j = 0
txt[] = "AAAAABAAABA" 
pat[] = "AAAA"
txt[i] and pat[j] match, do i++, j++

i = 1, j = 1
txt[] = "AAAAABAAABA" 
pat[] = "AAAA"
txt[i] and pat[j] match, do i++, j++

i = 2, j = 2
txt[] = "AAAAABAAABA" 
pat[] = "AAAA"
pat[i] and pat[j] match, do i++, j++

i = 3, j = 3
txt[] = "AAAAABAAABA" 
pat[] = "AAAA"
txt[i] and pat[j] match, do i++, j++

i = 4, j = 4
Since j == M, print pattern found and reset j,
j = lps[j-1] = lps[3] = 3

Here unlike Naive algorithm, we do not match first three 
characters of this window. Value of lps[j-1] (in above 
step) gave us index of next character to match.
i = 4, j = 3
txt[] = "AAAAABAAABA" 
pat[] =  "AAAA"
txt[i] and pat[j] match, do i++, j++

i = 5, j = 4
Since j == M, print pattern found and reset j,
j = lps[j-1] = lps[3] = 3

Again unlike Naive algorithm, we do not match first three 
characters of this window. Value of lps[j-1] (in above 
step) gave us index of next character to match.
i = 5, j = 3
txt[] = "AAAAABAAABA" 
pat[] =   "AAAA"
txt[i] and pat[j] do NOT match and j > 0, change only j
j = lps[j-1] = lps[2] = 2

i = 5, j = 2
txt[] = "AAAAABAAABA" 
pat[] =    "AAAA"
txt[i] and pat[j] do NOT match and j > 0, change only j
j = lps[j-1] = lps[1] = 1 

i = 5, j = 1
txt[] = "AAAAABAAABA" 
pat[] =     "AAAA"
txt[i] and pat[j] do NOT match and j > 0, change only j
j = lps[j-1] = lps[0] = 0

i = 5, j = 0
txt[] = "AAAAABAAABA" 
pat[] =      "AAAA"
txt[i] and pat[j] do NOT match and j is 0, we do i++.

i = 6, j = 0
txt[] = "AAAAABAAABA" 
pat[] =       "AAAA"
txt[i] and pat[j] match, do i++ and j++

i = 7, j = 1
txt[] = "AAAAABAAABA" 
pat[] =       "AAAA"
txt[i] and pat[j] match, do i++ and j++

We continue this way...
Add a comment
Know the answer?
Add Answer to:
Overview: Pattern-Matching (aka String Search) is the process of algorithmically finding copies of a pattern P...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Overview: Pattern-Matching (aka String Search) is the process of algorithmically finding copies of a pattern P...

    Overview: Pattern-Matching (aka String Search) is the process of algorithmically finding copies of a pattern P inside a (generally much larger) text T. The goal is to implement and compare four classical string-matching algorithms. Input: Your code should work for any text either inputted directly or read in from a file. However, for testing - input file has been provided: The Gettysburg Address (by President Abraham Lincoln, 1863) You should minimally search for these three patterns in each text: FREE,...

  • Four score and seven years ago our fathers brought forth on this continent, a new nation,...

    Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for...

  • SCREENSHOTS OF CODE ONLY!! PLEASE DON'T POST TEXT!! C++ CODE! PLEASE DON'T REPOST OLD POSTS! Objective...

    SCREENSHOTS OF CODE ONLY!! PLEASE DON'T POST TEXT!! C++ CODE! PLEASE DON'T REPOST OLD POSTS! Objective To gain experience with the operations involving binary search trees. This data structure as linked list uses dynamic memory allocation to grow as the size of the data set grows. Unlike linked lists, a binary search tree is very fast to insert, delete and search. Project Description When an author produce an index for his or her book, the first step in this process...

  • Delivered August 28, 1963 by Dr. Martin Luther King Jr. I am happy to join with...

    Delivered August 28, 1963 by Dr. Martin Luther King Jr. I am happy to join with you today in what will go down in history as the greatest demonstration for freedom in the history of our nation. Five score years ago, a great American, in whose symbolic shadow we stand today, signed the Emancipation Proclamation. This momentous decree came as a great beacon light of hope to millions of Negro slaves who had been seared in the flames of withering...

  • Coding for Python - The pattern detection problem – part 3: def pattern_search_max(data_series, pattern, threshold) Plea...

    Coding for Python - The pattern detection problem – part 3: def pattern_search_max(data_series, pattern, threshold) Please do not use 'print' or 'input' statements. Context of the assignment is: In this assignment, your goal is to write a Python program to determine whether a given pattern appears in a data series, and if so, where it is located in the data series. Please see attachments below: We need to consider the following cases: Case 1 - It is possible that the...

  • 1 Overview The goal of this assignment is to help you understand caches better. You are...

    1 Overview The goal of this assignment is to help you understand caches better. You are required to write a cache simulator using the C programming language. The programs have to run on iLab machines. We are providing real program memory traces as input to your cache simulator. The format and structure of the memory traces are described below. We will not give you improperly formatted files. You can assume all your input files will be in proper format as...

  • For this c++ assignment, Overview write a program that will process two sets of numeric information....

    For this c++ assignment, Overview write a program that will process two sets of numeric information. The information will be needed for later processing, so it will be stored in two arrays that will be displayed, sorted, and displayed (again). One set of numeric information will be read from a file while the other will be randomly generated. The arrays that will be used in the assignment should be declared to hold a maximum of 50 double or float elements....

  • I would like some assistance correcting an issue I am having with this assignment. Once a...

    I would like some assistance correcting an issue I am having with this assignment. Once a finite state automaton (FSA) is designed, its transition diagram can be translated in a straightforward manner into program code. However, this translation process is considerably tedious if the FSA is large and troublesome if the design is modified. The reason is that the transition information and mechanism are combined in the translation. To do it differently, we can design a general data structure such...

  • Read “Instituionalizing our Demise: America vs Multiculturalism” by Roger Kimball on pg 268 and “Reinventing America”...

    Read “Instituionalizing our Demise: America vs Multiculturalism” by Roger Kimball on pg 268 and “Reinventing America” Call for a new national indentity” by Elizabeth Martinez on pg 275. Create a double entry notebook for each reading selection It should be atleast five observation and responses. wric 268 PART 2 essay pro. exactly how and why their authors disagree. Instead of with parties in conflict as mediators do, you will nt of view designed to appeal to both sides, mediatn posing...

  • FISCAL POLICY IN THEORY: March, 2020: we are on the verge of Congress and the President...

    FISCAL POLICY IN THEORY: March, 2020: we are on the verge of Congress and the President passing legislation that will empower the federal government to spend an unprecedented amount of EXTRA money not seen since World War 2 ---- in order to address the pandemic but also to help cushion the blow financially of perhaps ten or twenty million Americans --- or more --- losing their jobs, and thus suffering a drop in income. The scale of the 2020 recession...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT