Question

Tokeniser in C++

Tokenisers

Background

The primary task of any language translator is to work out how the structure and meaning of an input in a given language so that an appropriate translation can be output in another language. If you think of this in terms of a natural language such as English. When you attempt to read a sentence you do not spend your time worrying about what characters there are, how much space is between the letters or where lines are broken. What you do is consider the words and attempt to derive structure and meaning from their order and arrangement into English language sentences, paragraphs, sections, chapters etc. In the same way, when we attempt to write translators from assembly language, virtual machine language or a programming language into another form, we attempt to focus on things like keywords, identifiers, operators and logical structures rather than individual characters.

The role of a tokeniser is to take the input text and break it up into tokens (words in natural language) so that the assembler or compiler using it only needs to concern itself with higher level structure and meaning. This division of labor is reflected in most programming language definitions in that they usually have a separate syntax definition for tokens and another for structures formed from the tokens.

The focus of this assignment is writing a tokeniser to recognise tokens that conform to a specific set of rules. The set of tokens may or may not correspond to a particular language because a tokeniser is a fairly generic tool. After completing this assignment we will assume that you know how to write a tokeniser and we will provide you a working tokeniser to use in each of the remaining programming assignments. This will permit you to take the later assignments much further than would be otherwise possible in the limited time available.

Writing Your Program

You are required to complete the implementation of the C++ files tokeniser.cpp and tokeniser-basics.cpp which are used to compile the programs tokens and tokens-context. You will complete the implementation of a function, next_token(), that will read text character by character using the function nextch(), and return the next recognised token in the input. The tokens that must be recognised in the milestone and final submissions are specified in the file includes/tokeniser.h. Additional helper functions described in the EBNF, Languages and Parsing page are also provided as part of the precompiled library, their interfaces are shown in the includes/tokeniser-extras.h file.

The tokeniser-basics.cpp file is where you will implement the nextch()token_context()new_token() and initialise_tokeniser() functions. These are separated out so that it is possible to test your next_token() function without needing to complete all of the messy parts of these other functions.

Your tokens and tokens-context programs will be compiled using the Makefile in the zip file attached below using the command:

% make

Note: The only files you are allowed to edit are tokeniser.cpp and tokeniser-basics.cpp. All other files are automatically regenerated every time you run make and are not used by the web submission systemLinks to an external site.'s test scripts.

Testing Your Program

For each file in the tests directory, the output of the tokens and tokens-context programs must match the corresponding .tokens and .context output files respectively. You must not produce any output of your own. You can both compile and test your programs against all of the supplied tests using the command:

% make

The testing will not show you any program output, just whether or not a test was passed or failed. If you want to see the actual output, the commands used to run the tests are shown in string quotes ("). Simply copy the commands between the string quotes (") and paste them into your shell.

The web submission system will test your program in exactly this way. The key difference between your testing and the web submission testing is that the web submission system has some secret tests that it will use.

If you want to try additional tests, just create some new files in the tests sub-directory and generate the correct outputs using the command:

% make test-add

This will increase the number of tests that will be run in the future. You may add these new test inputs and outputs to svn.

Your final submission will be awarded marks for tests that require the correct recognition of all tokens and the correct implementation of the tokeniser interface functions described in the includes/tokeniser.h file.

Tests

In addition to the test files in the zip file(s) attached below, we will use a number of secret tests that may contain illegal characters or character combinations that may defeat your tokenisers. The secret tests may also check whether or not you have followed the rules for keyword recognition. Note: these tests are secret, if your programs fail any of these secret tests you will not receive any feedback about these secret tests, even if you ask!



 Assignment1 directory should now contain the following files and directories:

  • tokens - executable script that will run your compiled tokens program.

  • tokens.cpp C++ source file containing the main() function for tokens.

  • tokens-context - executable script that will run your compiled tokens-context program.

  • tokens-context.cpp C++ source file containing the main() function for tokens-context.

  • tokeniser.cpp C++ source file containing the next_token() function.

  • tokeniser-basics.cpp C++ source file containing input functions.

  • bin - this directory contains precompiled programs and scripts.

  • includes - this directory contains .h files for the library.

  • lib - this directory contains precompiled library components.

  • originals - this directory contains the original version of the tokeniser.cpp.

  • tests - this directory contains test data, you can add your own tests here.

Note: you need to edit the tokeniser.cpp and tokeniser-basics.cpp files to complete this assignment. All the other files are automatically regenerated every time you run make, they must not be changed or added to svn. You can add extra test inputs to the tests directory but those are the only additional files that you may add to svn.

Note: if a newer version of the startup files is made available, it must be placed in the updates sub-directory and added to svn. The next time make is run, all of the files will be updated except for tokeniser.cpp

assignment1-20200901-142352 (4).zip


0 0
Add a comment Improve this question Transcribed image text
Request Professional Answer

Request Answer!

We need at least 10 more requests to produce the answer.

0 / 10 have requested this problem solution

The more requests, the faster the answer.

Request! (Login Required)


All students who have requested the answer will be notified once they are available.
Know the answer?
Add Answer to:
Tokeniser in C++
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Similar Homework Help Questions
  • Experiment with the terminal and C programming   I want you to be able to setup a...

    Experiment with the terminal and C programming   I want you to be able to setup a folder, change to a folder, create a new file with the editor (windows version is fine), rename files, delete files, and change permissions. How to create a compiled program Create your source code; lets say it is named test.c Compile it using: gcc test.c –o test Change the permissions to execute for everyone (755) Run it by: ./test Create two programs The first one...

  • C++ HTML files use tags enclosed in angle brackets to denote formatting instructions. For ex- ample,...

    C++ HTML files use tags enclosed in angle brackets to denote formatting instructions. For ex- ample, indicates bold, indicates italics, etc. If a web browser is displaying an HTML document that contains ‘<’ or ‘>’ then it may mistake these symbols for tags. This is a common problem with C++ files, which contain many <’s and >’s. For example, the line “#include ” may result in the browser interpreting as a tag. To avoid this problem, HTML uses special symbols...

  • Write a C program countFiles.c to be executed on the command line as follows: countFiles <directory> The program...

    Write a C program countFiles.c to be executed on the command line as follows: countFiles <directory> The program should count the (regular) files in the specified directory as well as all subdirectories and output the total number on the console. Files and subdirectories whose names .start with should be ignored! To do this, define a function int countFilesRec(char* dirName)that dirName returns the number of (regular) files in the directory and all the subdirectories. Call the function recursively to count the...

  • For this assignment, you will use your knowledge of arrays and ArrayLists to write a Java...

    For this assignment, you will use your knowledge of arrays and ArrayLists to write a Java program that will input a file of sentences and output a report showing the tokens and shingles (defined below) for each sentence. Templates are provided below for implementing the program as two separate files: a test driver class containing the main() method, and a sentence utilities class that computes the tokens and shingles, and reports their values. The test driver template already implements accepting...

  • [15 marks] Suppose that students enrolled in one course are required to take four tests, and...

    [15 marks] Suppose that students enrolled in one course are required to take four tests, and each student’s final grade for this course is the average of his/her grades of these four tests. This question asks you to write a program that can be used to compute the lowest final grade, highest final grade and the average final grade. General Requirements: Use a5q1.c as the name of your C source code file. We will use the following command on bluenose...

  • To write a C++ program to find average of three integers using functions Objectives: To get...

    To write a C++ program to find average of three integers using functions Objectives: To get familiar with functions in C++ Task 1: Study the working of user-defined functions in C++ Task 2: To write a C++ program to find average of three integers using functions Programming Instructions: Make new project (Visual C++ Empty Project) named lab6 and add a C++ file named lab6.cpp to this project. (Some IDE may automatically generate main.cpp or source.cpp, then you just rename it...

  • C++ Write a program that computes and displays the charges for a patient's hospital stay_First, the...

    C++ Write a program that computes and displays the charges for a patient's hospital stay_First, the program should ask if the patient was admitted as an inpatient or an outpatient. If the patient was an inpatient, the following data should be entered: The number of days spent in the hospital The daily rate Charges for hospital services (lab tests, etc.) Hospital medication charges If the patient was an outpatient, the following data should be entered: Charges for hospital services (lab...

  • Submission Instruction Complete the following C++ programs. The assignment contains only one file with all different...

    Submission Instruction Complete the following C++ programs. The assignment contains only one file with all different class and functions from problem 1. The main function calls different functions as instructed in the problem description. Submit the CPP file during submission Problem 1. Define a class for a type called Fraction. This class is used to represent a ratio of two integers. Include mutator functions that allow the user to set the numerator and the denominator (one for each data). Also...

  • Can anyone help me with my C hw? Exercise 3 You will write a new program...

    Can anyone help me with my C hw? Exercise 3 You will write a new program that combines dynamically allocating an array and saving that array to a file. These are the tasks your program must perform Open an output file named "data.txt" and prepare it for writing in text mode o If the file handle is NULL, quit the program o By default, it is created and stored in the same directory as your source code file Prompt the...

  • //I NEED THE PROGRAM IN C LANGUAGE!// QUESTION: I need you to write a program which...

    //I NEED THE PROGRAM IN C LANGUAGE!// QUESTION: I need you to write a program which manipulates text from an input file using the string library. Your program will accept command line arguments for the input and output file names as well as a list of blacklisted words. There are two major features in this programming: 1. Given an input file with text and a list of words, find and replace every use of these blacklisted words with the string...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT