Let’s build a dynamic string tokenizer!
Start with the existing template and work on the areas marked with TODO in the comments: Homework 8 Template.c
Note: If you turn the template back into me without adding any original work you will receive a 0.
By itself the template does nothing. You need to fill in the code to dynamically allocate an array of strings that are returned to the user.
Remember: A string is an array. A tokenizer goes through this array looking for special characters called delimiters . The result is a series of strings (tokens) that were taken out of the original array.
Example: Assume the input array is "1.1, 2.2, 3.3". The tokens would be "1.1" then "2.2" then "3.3".
Your objective: Finish up the template. We will use the old tokenizer system (strtok) to parse the string for comma characters ',' and space characters ' '.
#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#define MAX_INPUT_SIZE 10000
/* I used typedef to create a "string" data type. All strings are just
arrays of characters. I really did this to get rid of the scary
double pointers. */
typedef char* string;
/*This is the primary structure
It contains an array of strings (tokens) in tokenizedArray
And a count of how many tokens were found in tokenCnt.*/
typedef struct
{
string* tokenizedArray;
int tokenCnt;
} BetterStrTok;
//Declare all functions up front
string GetNextToken(string sStr, string delimiter, bool* pTokenizedOnce, int*
pLength);
BetterStrTok BetterTokenize(string sInput, string delimiter, int maxStringLength);
void PrintTokens(const BetterStrTok* pInput);
void CleanUpTokens(BetterStrTok* pInput);
int GetDelimiterCnt(string sStr, int maxStringLength, string delimiter);
void main()
{
/*Don't worry about this. The main function has an array of characters
(which is the same as the string data type). It uses gets() to read the
string in.*/
char sInput[MAX_INPUT_SIZE];
BetterStrTok tokens;
printf("Enter a series of numbers (or words) separated by commas: ");
gets(sInput);
/* BetterTokenize parses out the tokens in between the ',' and ' '
characters.
It will return a data structure that contains an array of strings
(dynamically allocated).
We are mimicing the Java string.Split() function. It breaks a string apart
into its
component parts.
*/
tokens = BetterTokenize(sInput, ", ", MAX_INPUT_SIZE);
/* Once the tokens have been made, we print each one and finally clean up the
dynamically allocated memory. Ultimately this is just a test.
We read in a string, break it apart, print out each piece, then clean up.
In the final homework we will do more with this.
*/
PrintTokens(&tokens);
CleanUpTokens(&tokens);
}
/*HOMEWORK 8:
Find the TODO comments and fill them in with the appropriate lines of code.
This function takes the original input string and fills out a BetterStrTok
structure.
This structure will contain a count of tokens and a dynamically allocated
array
of strings (tokens) that were extracted from the original string.
*/
BetterStrTok BetterTokenize(string sInput, string delimiter, int maxStringLength)
{
int i;
bool tokenizedOnce = false;
int newStringLength = 0;
int expectedCnt = 0;
BetterStrTok strTokStruct;
memset(&strTokStruct, 0, sizeof(BetterStrTok));
expectedCnt = GetDelimiterCnt(sInput, maxStringLength,delimiter);
/*TODO 1:
Use calloc or malloc to dynamically create the array of strings
This is an array of size expectedCnt times sizeof(string).
strTokStruct.tokenizedArray will be set to the pointer returned by
the calloc (or malloc) call.
Explanation: We are constructing an array of strings dynamically.
GetDelimiterCnt returned a total number of tokens. We cannot know
this at compile time, so an array of strings is dynamically
allocated.
*/
strTokStruct.tokenCnt = 0;
//This loop goes through every token that is expected
//The tokens are extracted, and then copied into dynamic memory.
for (i = 0; i < expectedCnt; i++)
{
/*Don't worry about this GetNextToken function, it returns the next
token string.
and sets newStringLength to the length of that string.
Explanation: This extracts the next token from the original string.
A token is every character between the space characters and ,
characters.
So given the string "1.1, 2.2, 3.3" the first token would be
1.1,
the next token would be 2.2, and the final token would be 3.3.
*/
string token = GetNextToken(sInput, delimiter, &tokenizedOnce,
&newStringLength);
if (newStringLength != 0)
{
int arrayIndex = strTokStruct.tokenCnt++;
/*TODO 2:
Use malloc or calloc to create a string of size
newStringLength times
the sizeof char. This pointer would be set to next index of
the array:
strTokStruct.tokenizedArray[arrayIndex]
Explanation: Remember that the string data type is an array
of chars.
We do not know, in advance (at compile-time), the length
of the token.
So a new array of chars is allocated. Keep in mind that
strTokStruct.tokenizedArray is an array of strings. That
is, it is an
array of array of chars.
*/
/*TODO 3:
Copy the string, token, to the newly allocated string which
is pointed at
by strTokStruct.tokenizedArray[arrayIndex] (assuming you
did the above step
correctly). You may use a for loop, the strcpy function or
the memcpy function.
Explanation: The token was found and is in the token
string, but that will be
overwritten. Before that occurs, we want to copy it to
the newly
allocated string (In the TODO 2 step).
*/
}
}
return strTokStruct;
}
/*HOMEWORK 8:
Assuming the above function works, this one will print every token in the
array.
*/
void PrintTokens(const BetterStrTok* pInput)
{
/* TODO 4:
Create a loop that goes from 0 to pInput->tokenCount.
Inside the loop print the token. If you want output like mine:
printf "%d: %s\n"
where the %d variable is the counter variable for the loop
and %s is the string in pInput's tokenizedArray array at
index i (or whatever you call your counter variable).
*/
}
/*HOMEWORK 8:
You always have to clean up dynamic memory, otherwise the system does not
know
that you are done with it. This function goes through the array of strings
and
releases each one. Then it releases the array itself.
*/
void CleanUpTokens(BetterStrTok* pInput)
{
/* TODO 5:
Create a loop that goes from 0 to pInput->tokenCount.
Inside the loop call the free function on pInput->tokenizedArray[]
at index i, or whatever you call your counter variable.
Explanation: pInput->tokenizedArray is an array of dynamically
allocated
strings. At the next line we will free the array itself, but first
you want
to go through the loop and free the dynamic memory one string at a
time.
Failure to do this create a memory leak.
Always remember, for every malloc or calloc you need a corresponding
free
somewhere in your code.
Similarly in C++ for every new call you would want a corresponding
delete,
but that isn't a part of this homework assignment.
*/
/*After freeing every string inside pInput->tokenizedArray the array itself
must be freed.*/
if(pInput->tokenizedArray != NULL)
free(pInput->tokenizedArray);
}
/*You don't have to worry about this function. It walks through the original string
And compares each character with the list of delimiters. It returns an accurate
Count of how many tokens exist in the original string.*/
int GetDelimiterCnt(string sStr, int maxStringLength, string delimiter)
{
int i;
int j;
int cnt = 0;
for (i = 0; i < maxStringLength || sStr[i] != '\0'; i++)
{
for (j = 0; j < strlen(delimiter); j++)
{
if (sStr[i] == delimiter[j])
cnt++;
}
}
//There will also be a concluding string
return cnt + 1;
}
/*You don't have to worry about this function. It uses the original strtok
functions to
extract the next token. strtok is not thread safe, so as good practice I actually
use strtok_s or strtok_r depending on if you are using OSX or Windows.
If you want to know what thread safe means, please feel free to ask.*/
string gContext;
string GetNextToken(string sStr, string delimiter, bool* pTokenizedOnce, int*
pLength)
{
string rv = NULL;
if (*pTokenizedOnce == false)
{
*pTokenizedOnce = true;
#ifdef _WIN32
rv = strtok_s(sStr, delimiter, &gContext);
#else
rv = strtok_r(sStr, delimiter, &gContext);
#endif
}
else
{
#ifdef _WIN32
rv = strtok_s(NULL, delimiter, &gContext);
#else
rv = strtok_r(NULL, delimiter, &gContext);
#endif
}
if (rv != NULL)
*pLength = strlen(rv)+1;
else
*pLength = 0;
return rv;
}
Annotations
Summary :
Pasted below the c code and output towards the end .
I have to modify the getTokenziecount logic as it was not behavior properly with original code .
Added few print statements to show the input and Number of tokens received .
Output shown for varied lenght of tokens .
######################## C Code ########################################
#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#define MAX_INPUT_SIZE 10000
/* I used typedef to create a "string" data type. All strings are just
arrays of characters. I really did this to get rid of the scary
double pointers. */
typedef char* string;
/*This is the primary structure
It contains an array of strings (tokens) in tokenizedArray
And a count of how many tokens were found in tokenCnt.*/
typedef struct
{
string* tokenizedArray;
int tokenCnt;
} BetterStrTok;
//Declare all functions up front
string GetNextToken(string sStr, string delimiter, bool*
pTokenizedOnce, int* pLength);
BetterStrTok BetterTokenize(string sInput, string delimiter, int
maxStringLength);
void PrintTokens(const BetterStrTok* pInput);
void CleanUpTokens(BetterStrTok* pInput);
int GetDelimiterCnt(string sStr, int maxStringLength, string
delimiter);
void main()
{
/*Don't worry about this. The main function has an array of characters
(which is the same as the string data type). It uses gets() to read the
string in.*/
char sInput[MAX_INPUT_SIZE];
BetterStrTok tokens;
printf("Enter a series of numbers (or words) separated by commas: ");
gets(sInput);
//strcpy(sInput ,"1.2, 1.3, 1.43456, 01.5, 13.456,
12.7, 1");
printf(" Given Str : %s \n" , sInput);
/* BetterTokenize parses out the tokens in between the ',' and '
'
characters.
It will return a data structure that contains an array of
strings
(dynamically allocated).
We are mimicing the Java string.Split() function. It breaks a
string apart
into its
component parts.
*/
tokens = BetterTokenize(sInput, ", ", MAX_INPUT_SIZE);
/* Once the tokens have been made, we print each one and finally
clean up the
dynamically allocated memory. Ultimately this is just a test.
We read in a string, break it apart, print out each piece, then
clean up.
In the final homework we will do more with this.
*/
PrintTokens(&tokens);
CleanUpTokens(&tokens);
}
/*HOMEWORK 8:
Find the TODO comments and fill them in with the appropriate lines of code.
This function takes the original input string and fills out a BetterStrTok
structure.
This structure will contain a count of tokens and a dynamically allocated
array
of strings (tokens) that were extracted from the original string.
*/
BetterStrTok BetterTokenize(string sInput, string delimiter, int
maxStringLength)
{
int i;
bool tokenizedOnce = false;
int newStringLength = 0;
int expectedCnt = 0;
BetterStrTok strTokStruct;
memset(&strTokStruct, 0,
sizeof(BetterStrTok));
expectedCnt = GetDelimiterCnt(sInput,
maxStringLength,delimiter);
printf(" Num of Tokens % d \n " , expectedCnt );
/*TODO 1:
Use calloc or malloc to dynamically create the array of
strings
This is an array of size expectedCnt times sizeof(string).
strTokStruct.tokenizedArray will be set to the pointer returned
by
the calloc (or malloc) call.
Explanation: We are constructing an array of strings
dynamically.
GetDelimiterCnt returned a total number of tokens. We cannot
know
this at compile time, so an array of strings is dynamically
allocated.
*/
strTokStruct.tokenizedArray = malloc( sizeof(char *) *
(expectedCnt + 1));
strTokStruct.tokenCnt = 0;
//This loop goes through every token that is expected
//The tokens are extracted, and then copied into dynamic
memory.
for (i = 0; i < expectedCnt; i++)
{
/*Don't worry about this GetNextToken function, it returns the
next
token string.
and sets newStringLength to the length of that string.
Explanation: This extracts the next token from the original
string.
A token is every character between the space characters and ,
characters.
So given the string "1.1, 2.2, 3.3" the first token would be
1.1,
the next token would be 2.2, and the final token would be
3.3.
*/
string token = GetNextToken(sInput, delimiter, &tokenizedOnce, &newStringLength);
if (newStringLength != 0)
{
int arrayIndex =
strTokStruct.tokenCnt++;
//printf(" Got
token %s , of length %zu , arrayIndx : %d , tokenCnt : %d \n" ,
token , strlen(token), arrayIndex, strTokStruct.tokenCnt);
/*TODO 2:
Use malloc or calloc to create a string of size
newStringLength times
the sizeof char. This pointer would be set to next index of
the array:
strTokStruct.tokenizedArray[arrayIndex]
Explanation: Remember that the string data type is an array
of chars.
We do not know, in advance (at compile-time), the length
of the token.
So a new array of chars is allocated. Keep in mind that
strTokStruct.tokenizedArray is an array of strings. That
is, it is an
array of array of chars.
*/
strTokStruct.tokenizedArray[arrayIndex] = malloc(sizeof( char) *
(strlen(token) + 1));
/*TODO 3:
Copy the string, token, to the newly allocated string which
is pointed at
by strTokStruct.tokenizedArray[arrayIndex] (assuming you
did the above step
correctly). You may use a for loop, the strcpy function or
the memcpy function.
Explanation: The token was found and is in the token
string, but that will be
overwritten. Before that occurs, we want to copy it to
the newly
allocated string (In the TODO 2 step).
*/
strcpy(strTokStruct.tokenizedArray[arrayIndex], token);
}
}
return strTokStruct;
}
/*HOMEWORK 8:
Assuming the above function works, this one will print every token in the
array.
*/
void PrintTokens(const BetterStrTok* pInput)
{
/* TODO 4:
Create a loop that goes from 0 to pInput->tokenCount.
Inside the loop print the token. If you want output like mine:
printf "%d: %s\n"
where the %d variable is the counter variable for the loop
and %s is the string in pInput's tokenizedArray array at
index i (or whatever you call your counter variable).
*/
for(int i = 0 ; i < pInput->tokenCnt ; i++ )
{
printf(" Token num : %d is -> %s
\n" , i , pInput->tokenizedArray[i]);
}
}
/*HOMEWORK 8:
You always have to clean up dynamic memory, otherwise the system does not
know
that you are done with it. This function goes through the array of strings
and
releases each one. Then it releases the array itself.
*/
void CleanUpTokens(BetterStrTok* pInput)
{
/* TODO 5:
Create a loop that goes from 0 to pInput->tokenCount.
Inside the loop call the free function on pInput->tokenizedArray[]
at index i, or whatever you call your counter variable.
Explanation: pInput->tokenizedArray is an array of dynamically
allocated
strings. At the next line we will free the array itself, but first
you want
to go through the loop and free the dynamic memory one string at a
time.
Failure to do this create a memory leak.
Always remember, for every malloc or calloc you need a corresponding
free
somewhere in your code.
Similarly in C++ for every new call you would want a corresponding
delete,
but that isn't a part of this homework assignment.
*/
/*After freeing every string inside pInput->tokenizedArray the array itself
must be freed.*/
if(pInput->tokenizedArray != NULL)
{
for(int i = 0 ; i < pInput->tokenCnt ; i++ )
{
free(pInput->tokenizedArray[i]);
}
free(pInput->tokenizedArray);
}
}
/*You don't have to worry about this function. It walks through the original string
And compares each character with the list of delimiters. It returns an accurate
Count of how many tokens exist in the original string.*/
int GetDelimiterCnt(string sStr, int maxStringLength, string
delimiter)
{
int i;
int j;
int cnt = 0;
int cnt0 = 0;
for (i = 0; i <
maxStringLength || sStr[i] != '\0'; i++)
{
if ( sStr[i] ==
delimiter[0] ) {
cnt0 = 1;
for (j = 1; j < strlen(delimiter); j++)
{
if (sStr[i+j] ==
delimiter[j])
cnt0++;
}
if( cnt0 == strlen(delimiter) ) {
cnt = cnt + 1;
}
}
}
//There will also be a concluding
string
return cnt + 1;
}
/*You don't have to worry about this function. It uses the
original strtok
functions to
extract the next token. strtok is not thread safe, so as good
practice I actually
use strtok_s or strtok_r depending on if you are using OSX or
Windows.
If you want to know what thread safe means, please feel free to
ask.*/
string gContext;
string GetNextToken(string sStr, string delimiter, bool*
pTokenizedOnce, int* pLength)
{
string rv = NULL;
if (*pTokenizedOnce == false)
{
*pTokenizedOnce
= true;
#ifdef
_WIN32
rv = strtok_s(sStr, delimiter,
&gContext);
#else
rv = strtok_r(sStr, delimiter,
&gContext);
#endif
}
else
{
#ifdef
_WIN32
rv = strtok_s(NULL, delimiter,
&gContext);
#else
rv = strtok_r(NULL, delimiter,
&gContext);
#endif
}
if (rv != NULL)
*pLength =
strlen(rv)+1;
else
*pLength =
0;
return rv;
}
########################## End Code ######################################
############################# Output ###################################
########################################################################
Let’s build a dynamic string tokenizer! Start with the existing template and work on the areas...
Write the code to dynamically allocate ONE integer variable using calloc (contiguous allocation) or malloc (memory allocation) and have it pointed to by a pointer (of type int * ) named ptr_1. Use ptr_1 to assign the number 7 to that dynamically allocated integer, and in another line use printf to output the contents of that dynamically allocated integer variable. Write the code to dynamically allocate an integer array of length 5 using calloc or malloc and have it pointed...
Using C, create a data file with the first number being an integer. The value of that integer will be the number of further integers which follow it in the file. Write the code to read the first number into the integer variable how_many.Please help me with the file :((This comes from this question:Write the code to dynamically allocate ONE integer variable using calloc (contiguous allocation) or malloc (memory allocation) and have it pointed to by a pointer (of type int...
IN C ONLY As mentioned earlier there are two changes we are going to make from lab 5, The file you read into data structures can be any length. studentInfo array will be stored in another struct called studentList that will contain the Student pointer and current length of the list. Sometimes data can be used in structs that correlate between variables so it's convenient to store the data in the same struct. Instead of tracking a length variable all...
Malloc function For the prelab assignment and the lab next week use malloc function to allocate space (to store the string) instead of creating fixed size character array. malloc function allows user to allocate memory (instead of compiler doing it by default) and this gives more control to the user and efficient allocation of the memory space. Example int *ptr ptr=malloc(sizeof(int)*10); In the example above integer pointer ptr is allocated a space of 10 blocks this is same as creating...
Answer this in c++ #include <iostream> #include <fstream> #include <string> using namespace std; class Person { public: Person() { setData("unknown-first", "unknown-last"); } Person(string first, string last) { setData(first, last); } void setData(string first, string last) { firstName = first; lastName = last; } void printData() const { cout << "\nName: " << firstName << " " << lastName << endl; } private: string firstName; string lastName; }; class Musician : public Person { public: Musician() { // TODO: set this...
C++ Chapter 16 Problem: Implement #25 as a template function (Demonstrate using int, double, string, and x,y pair object) 24. Write a function that searches a numeric array for a specified value. The function should return the subscript of the element containing the value if it is found in the array. If the value is not found, the function should throw an exception. 25. Write a function that dynamically allocates a block of memory and returns a char pointer to...
In C Programming Language In this lab you will implement 4 string functions, two using array notation and two using pointers. The functions must have the signatures given below. You may not use any C library string functions. The functions are 1. int my strlen (char s ) - This function returns the number of characters in a string. You should use array notation for this function. 2. int my strcpy (char s [], char t I)- This function overwrites...
Can anyone help me with my C hw? Exercise 3 You will write a new program that combines dynamically allocating an array and saving that array to a file. These are the tasks your program must perform Open an output file named "data.txt" and prepare it for writing in text mode o If the file handle is NULL, quit the program o By default, it is created and stored in the same directory as your source code file Prompt the...
Objectives: Use strings and string library functions. Write a program that asks the user to enter a string and output the string in all uppercase letters. The program should then display the number of white space characters in the string. You program should run continuously until the user enters an empty string. The program must use the following two functions: A function called count_spaces that counts the number of white spaces inside a string. int count_space(char str[]); which tell you...
C programming The program will require the following structure: struct _data { char *name; long number; }; The program will require command line arguments: int main(int argv, char **argc) { Where argv is the number of arguments and argc is an array holding the arguments (each is a string). Your program must catch any case where no command line arguement was provided and print a warning message (see below). You MUST include/use the following functions, defined as follows: int SCAN(FILE...