Detecting Substrings (C++ Version)
Introduction
A very common task that is often performed by programs that work
with text files is the problem of locating a specific substring
within the file. I am sure we’ve all done this many times when
working with Word, Notepad, or other editors.
Since we don’t have a GUI or other means of displaying the contents
of a file all at once, let’s modify the problem slightly. Rather
than locating a specific substring within a file and then
highlighting the results, as most modern programs would do, let’s
write a C++ program that locates the occurrences of a specific
substring within a file and then displays the occurrence number as
well as a portion of the text around the found substring. It should
also count the number of occurrences and indicate that number in a
brief report at the end.
The input file that we will use for this exercise is a text copy of
the Declaration of Independence. You will find this in the input
file “DeclOfIndep.txt” on eLearning. It is okay to hard code this
file name for this project, but in the future, you might have your
program ask the user for the filename. That would generalize your
program to work on any input file.
Even though this is a C++ program, let’s use C-strings in this
exercise for all string operations. Thus, your program will read
each line of input from the file into a C-string, not into a C++
string. The functions you use to locate the substrings should be
C-string functions and not members of the string class. All
printing and other manipulations on strings should be done with
C-strings. In short, there should be no C++ strings used in this
program. (Therefore, do not #include <string>.)
Overview
Here’s a high-level overview of what your program should do:
1) Ask the user for the substring to search for (and store it in a
C-string). 2) Calculate and display each of the found occurrences
of the substring in the file. For each found occurrence, your
program should display (a) the location number starting at 1 and
going up to the total number of locations found, and (b) the
portion of the string containing the found substring. This portion
should consist of the substring itself and up to 8 characters
before and 8 characters after the found substring. (Note that if
the substring is within 8 characters of the end of a line this
won’t be possible.)
Sample Runs
Here is a sample run looking for the string “people” in the file
“DeclOfIndep.txt”.
Looking for the substring "people" in file "DeclOfIndep.txt":
Location 1: String: "for one people to" Location
2: String: "icts of people, unless" Location
3: String: "s those people would r" Location
4: String: " of the people." Location
5: String: " of our people."
There were 5 occurrences of the string "people" within the file
"DeclOfIndep.txt".
Or consider another run looking for the substring “oo” in the file
“DeclOfIndep.txt.”
Looking for the substring "oo" in file "DeclOfIndep.txt":
Location 1: String: "public goooood." Location 2: String: "ublic
goooood."
Location 3: String: "blic goooood." Location 4: String: "lic
goooood." Location 5: String: "armed troops among" Location 6:
String: ". They too have" Location 7: String: "of the good People"
Location 8: String: "illiam Hooper" Location 9: String: "s
Lightfoot Lee" Location 10: String: "Witherspoon"
There were 10 occurrences of the string "oo" within the file
"DeclOfIndep.txt".
Programming Notes:
There are several points to be made about this problem in general,
and about both sample runs.
1) Both of the sample runs given above are actual data, so you can
use them to test your program.
2) Note that we deliberately modified a single word “good” in the
Declaration file to “goooood”. In other words, we added some extra
“o’s” to the word.
This makes the point that some substrings can overlap. For example,
if we search for “oo” in the word “goooood,” the first “oo” will
certainly be a hit. But the second hit occurs with the second “o”
of the first hit, i.e., the two instances of “oo” overlap. Because
of this overlap, there are really four hits of “oo” within that
word, as illustrated above, not two, and your program should find
all four of them.
3) Do not use “inFile >>” to read the data from the input
file. As you know, “inFile >>” tokenizes around white space
and would therefore extract each word from the input file
separately. Of course, this has both advantages and disadvantages
depending on the circumstances. It would, for example, be a good
function to use if we wanted to process within individual words
only. But since our program should be able to detect substrings
consisting of more than one word, “inFile >>” will not serve
our purposes.
Therefore, use “inFile.getline()” (i.e., the member function
version of getline() – see Chapter 10, Slide 17+.) as your primary
input function. As we discussed in class, this will read a single
line of input from the file at a time and place it in the target
buffer, which should be a character array of adequate size.
4) Note that the file will be processed one line at a time. It is
not necessary to look for substrings that span more than one line.
5) Since we are using C-strings for this assignment, we’ll have to
use the C string processing functions. A very useful function for
this assignment would be the strstr(const char *, const char *)
function, which locates an instance of
the right hand string inside the left hand string and returns a
pointer to the found instance. (It returns a NULL if no instance is
found.) This function is described on slide 27 of the Chapter 10
slide set. The strchr(const char *, int ch), not listed in the
slide set, locates the first occurrence of “ch” in the string and
returns a pointer to it.
6) Since all C-strings are based on character arrays, be careful
about running off either end of the array. Since you are required
to print not only the substring but also 8 characters to either
side of it, this overrun can occur if the substring you are looking
for is within 8 characters of either the beginning or the end of
that line. (Often an array overrun will be detected if your program
starts printing out gibberish or default characters instead of text
from the Declaration.)
For an example, consider the first sample run (i.e., looking for
the word “people”). Note that in locations 4 and 5, the word
“people” appears not only at the end of a sentence but also at the
end of a line of input. It is, therefore, impossible to display 8
characters after the found substring in those cases, since we are
processing on a line-by-line basis. This is perfectly okay. If
there are not 8 characters either before or after the found
substring, just terminate the output report at that point.
7) Build up your solution in a modular fashion, debugging as you
go. Do not attempt to write the whole program at once. If you feel
lost at some point, simplify your problem down to something
manageable. You might, for example, create a sample input file with
a single sentence in it and see if your program can detect
substrings within it. In any case, unless it is necessary to solve
a problem, you should never have more than one function at once
under development. Debug that function before moving on to the
next. If you will code in this way, your overall development time
will be much quicker.
8) Be alert to array overruns on either side as you look for
substrings.
Deliverables
Please submit your C++ source code file. There is no output file on
this problem.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
//#define fopen_s(fp,fmt,mode)
#define fopen_s(pFile,filename,mode)
((*(pFile))=fopen((filename),(mode)))==NULL
int File_to_Search(char*,char*);
int File_to_Search(char s, char str)//Error expected
';',',',or')'
{
FILE *fp;
int line_num = 1;
int find_res_string = 0;
char temp[512];
if((fopen_s(&fp, s, "r")) != NULL) {
return(-1);
}
while(fgets(temp, 512, fp) != NULL) {
if((strstr(temp, str)) != NULL) {
printf("line : %d\t", line_num);
printf("string: %s\n", temp);
find_res_string++;
}
line_num++;
}
if(find_res_string == 0) {
printf("\nSorry, couldn't find a match.\n");
}
//Close the file if still open.
if(fp) {
fclose(fp);
}
return(0);
}
int main(int argc, char *argv[]) {
int res_string, errno;
system("cls");
res_string = File_to_Search("Index.txt", "for one people
to");
if(res_string == -1) {
perror("Error");
printf("Error number = %d\n", errno);
exit(1);
}
return(0);
}
==
See Images for help
* New Project-20170419囧ロ+ 《 윙 compile 1 Execute l > Share Code main.cpp x Index.txt x root 1 for one people to 2 You got it 3
You can change the string, I have kept it "or one people to" ,
you can change that
Keep the file name Index.txt
Thanks, let me know if there is any concern, I will be happy to help
=====
EDIT: Pass the Search string in command line, See Images to
understand
#include <iostream>
using namespace std;
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
//#define fopen_s(fp,fmt,mode)
#define fopen_s(pFile,filename,mode)
((*(pFile))=fopen((filename),(mode)))==NULL
int File_to_Search(char*,char*);
int File_to_Search(char s, char str)//Error expected
';',',',or')'
{
FILE *fp;
int line_num = 1;
int find_res_string = 0;
char temp[512];
if((fopen_s(&fp, s, "r")) != NULL) {
return(-1);
}
while(fgets(temp, 512, fp) != NULL) {
if((strstr(temp, str)) != NULL) {
printf("line : %d\t", line_num);
printf("string: %s\n", temp);
find_res_string++;
}
line_num++;
}
if(find_res_string == 0) {
printf("\nSorry, couldn't find a match.\n");
}
//Close the file if still open.
if(fp) {
fclose(fp);
}
return(0);
}
int main(int argc, char *argv[]) {
int res_string, errno;
system("cls");
res_string = File_to_Search("Index.txt", argv[1]);
// cout<<argv[0]<<endl;
if(res_string == -1) {
perror("Error");
printf("Error number = %d\n", errno);
exit(1);
}
return(0);
}
=============
See Image
Detecting Substrings (C++ Version) Introduction A very common task that is often performed by programs that...
Write a C program to run on ocelot to read a text file and print it to the display. It should optionally find the count of the number of words in the file, and/or find the number of occurrences of a substring, and/or take all the words in the string and sort them lexicographically (ASCII order). You must use getopt to parse the command line. There is no user input while this program is running. Usage: mywords [-cs] [-f substring]...
TASK Your task is to build a palindrome from an input string. A palindrome is a word that reads the same backward or forward. Your code will take the first 5 characters of the user input, and create a 9- character palindrome from it. Words shorter than 5 characters will result in a runtime error when you run your code. This is acceptable for this exercise – we will cover input validation in a later class. Some examples of input...
Overview: file you have to complete is
WordTree.h, WordTree.cpp, main.cpp
Write a program in C++ that reads an input text
file and counts the occurrence of individual words in the file. You
will see a binary tree to keep track of words and their counts.
Project description:
The program should open and read an input file (named
input.txt) in turn, and build a binary search tree
of the words and their counts. The words will be stored in
alphabetical order...
OK, here is the project, I need to get started and just don't understand how to get the registers, the array, stack to work with the UART. Just looking for some help to start, not looking for you to solve the project. I have to write a program that receives a string of characters via the UART, checks if this string is a palindrome, and then uses a print function to print either "Yes" or "No". A palindrome sequence of...
Lab2: Processing Strings Part#1 – Counting Vows Assume s is a string of lower case characters. Write a program that counts up the number of vowels contained in the string s. Valid vowels are: 'a', 'e', 'i', 'o', and 'u'. For example, if s = 'azcbobobegghakl', your program should print: Number of vowels: 5 Part#2 – Counting Bobs Assume s is a string of lower case characters. Write a program that prints the number of times the string 'bob' occurs...
Please solve in Python.
You would like to set a password for an email account. However, there are two restrictions on the format of the password. It has to contain at least one uppercase character and it cannot contain any digits. You are given a string S consisting of N alphanumerical characters. You would like to find the longest substring of Sthat is a valid password. A substring is defined as a contiguous segment of a string. For example, given...
Consider the following C++ program. It reads a sequence of strings from the user and uses "rot13" encryption to generate output strings. Rot13 is an example of the "Caesar cipher" developed 2000 years ago by the Romans. Each letter is rotated 13 places forward to encrypt or decrypt a message. For more information see the rot13 wiki page. #include <iostream> #include <string> using namespace std; char rot13(char ch) { if ((ch >= 'a') && (ch <= 'z')) return char((13 +...
Instructions: Consider the following C++ program. It reads a sequence of strings from the user and uses "rot13" encryption to generate output strings. Rot13 is an example of the "Caesar cipher" developed 2000 years ago by the Romans. Each letter is rotated 13 places forward to encrypt or decrypt a message. For more information see the rot13 wiki page. #include <iostream> #include <string> using namespace std; char rot13(char ch) { if ((ch >= 'a') && (ch <= 'z')) return char((13...
Help please
Write a program named one.c that takes a single command-line argument, the name of a file. Your program should read all the strings (tokens) from this file and write all the strings that are potentially legal words (the string contains only upper-case and lower-case characters in any combination) to the file words. Your program should ignore everything else (do not write those strings anywhere 1. As an example, running /a.out dsia would result in the generation of the...
Hi, I need help with my comp sci assignment. The parameters are listed below, but I am having trouble generating the number of occurrences of each word. Please use a standard library. Read in the clean text you generated in part 2 (start a new cpp file). Create a list of all the unique words found in the entire text file (use cleanedTextTest.txt for testing). Your list should be in the form of an array of structs, where each struct...