0. Introduction.
This involves designing a perfect hash function for a small set of strings. It demonstrates that if the set of possible keys is small, then a perfect hash function need not be hard to design, or hard to understand.
1. Theory.
A hash table is an array that associates keys
with values. A hash function takes a key as its
argument, and returns an index in the array. The object that
appears at the index is the key’s value. A hash function may return
the same index for two different keys. This is called a
collision. A collision is undesirable if two keys with the
same index have different values. A hash function that has no
collisions for a set of keys is said to be perfect for
that set. It may be perfect for some sets of keys, but not for
others.
Most modern programming languages
use a small set of reserved names as operators,
punctuation, and syntactic markers. Reserved names are also
sometimes called reserved words or keywords. For
example, Java is said to use 55 keywords, like if, private, while,
etc. A compiler for a programming language must be able to test if
each name in a program is reserved. The program may be hundreds or
thousands of pages long, and may include thousands or millions of
names, so the test must be done efficiently. It might be
implemented using a hash table and a perfect hash function.
Here’s how the test works. Suppose
that the hash table T is an array of strings. Each time
the compiler reads a name N, it calls a perfect hash
function h to compute an index h(N). If
h(N) is a legal index for T, and
T[h(N)] = N, then the name is
reserved, otherwise it is not. If there are unused elements in
T, then they are empty strings "", because an empty string
can’t be a keyword. If we measure the efficiency of a test by the
number of string comparisons it performs, then the test requires
only O(1) comparisons. Of course this works only if
h is perfect for the set of reserved names.
2. Example.
Imagine that there is a very simple programming language which uses the following twelve reserved names.
and |
else |
or |
begin |
end |
return |
define |
if |
then |
do |
not |
while |
We might define a perfect hash function for these reserved names like this. We get one or more characters from each name. Then we convert each character to an integer—this is easy, because characters are already represented as small nonnegative integers. In the ASCII and Unicode character sets, the characters 'a' through 'z' are represented as the integers 97 through 122, without gaps. Finally, we do arithmetic on the integers to obtain an index into the hash table. We can choose the characters, and the arithmetic operations, so that no two reserved names result in the same index. For example, if we define the hash function h to add the first and second characters of each name, then we get the following indexes.
h("and") |
= |
207 |
h("begin") |
= |
199 |
h("define") |
= |
201 |
h("do") |
= |
211 |
h("else") |
= |
209 |
h("end") |
= |
211 |
h("if") |
= |
207 |
h("not") |
= |
221 |
h("or") |
= |
225 |
h("return") |
= |
215 |
h("then") |
= |
220 |
h("while") |
= |
223 |
This definition for h does not result in a perfect hash function, because it has collisions. For example, the strings "and" and "if" result in the index 207. Similarly, the strings "do" and "end" result in the index 211. We either didn’t choose the right characters from each string, or we didn’t choose the right operations, or both. Unfortunately, there is no good theory about how to define h. The best we can do is try various definitions, by trial and error, until we find one that is perfect.
3. Implementation.
For this project, you must design a perfect hash function for the reserved names shown in the previous section. To do that, write a small test class Hasher, and run it with various definitions for the function hash. The class’s main method calls hash for each reserved name, and writes indexes to standard output. For full credit, no two reserved names can have the same index.
class Hasher
{
private static final String [] reserved
=
{ "and",
"begin",
"define",
"do",
"else",
"end",
"if",
"not",
"or",
"return",
"then",
"while" };
private static int hash(String name)
{
// Your code goes
here.
}
public static void main(String []
args)
{
for (int index = 0; index <
reserved.length ; index += 1)
{
System.out.print("h(\"" +
reserved[index] + "\") = ");
System.out.print(hash(reserved[index]));
System.out.println();
}
}
}
Your method hash must work in O(1) time, without loops
or recursions. It must not use if’s, switch’es, or extra arrays. It
must not call the Java method hashCode, because that uses a loop.
It must not return negative integers, because they can’t be array
indexes.
Here are some some things to try
when defining hash. Try using characters at the same indexes from
each name. Try linear combinations of the characters: that is,
multiplying them by small constants, then adding or subtracting the
results. Try the operator ‘%’. Try a mixture of these. Reject any
definition of hash that is not perfect: one that returns the same
index for two different names.
Here are some hints about how to
write the code for hash. Characters in Java String’s are indexed
starting from 0, and ending with the length of the string minus 1.
The character at index k in name is obtained by
name.charAt(k). For example, the first character from name
is returned by name.charAt(0), the second character by
name.charAt(1), and the last character byname.charAt(name.length()
- 1). If you use a character where an integer is expected, then the
character turns into its integer code. For example, name.charAt(0)
+ 1 adds one to the code for the first character in name, and
returns an integer.
Don’t worry if there are gaps
between the indexes: your hash function need not be
minimal. Try to keep the returned indexes small: they
shouldn’t exceed 2000. For example, there is a perfect hash
function for the reserved words in this assignment whose indexes
range from 1177 to 1413. It was found after about ten minutes of
trial-and-error search.
public class Program
{
static int index[]=new int[2000];
private static final String [] reserved =
{ "and",
"begin",
"define",
"do",
"else",
"end",
"if",
"not",
"or",
"return",
"then",
"while" };
private static int hash(String name)
{ int num=0;
int len=name.length();
for(int i=0;i<len;i++)
{
num+=name.charAt(i)+0; //hash logic:add all the ascii values of
each char in string...Its perfect for a given set...
}
if(index[num-1]==0)
{
index[num-1]=num;
return index[num-1];
}
System.out.println("hash function failed");
return -1;
}
public static void main(String[] args)
{
for (int index = 0; index < reserved.length ; index += 1)
{
System.out.print("h(\"" + reserved[index] + "\") = ");
System.out.print(hash(reserved[index]));
System.out.println();
}
}
}
//output:
h("and") = 307 h("begin") = 517 h("define") = 619 h("do") = 211 h("else") = 425 h("end") = 311 h("if") = 207 h("not") = 337 h("or") = 225 h("return") = 672 h("then") = 431 h("while") = 537
0. Introduction. This involves designing a perfect hash function for a small set of strings. It...
I really need assistance with creating a C++ hash table program for generating a hash from a string. Create a hash function then test and evaluate the results. The set of strings are from a "words.txt" file (there are 45,402 words in the file, no repeating words. Map the words to 45,402 buckets using a hash function). Requirements: Create an array of 45,402 integers and initialize each to zero. These will be used to store how many times each index...
I really need assistance with creating a C++ hash table program for generating a hash from a string. Create a hash function then test and evaluate the results. The set of strings are from a "words.txt" file (there are 45,402 words in the file, no repeating words. Map the words to 45,402 buckets using a hash function). Requirements: Create an array of 45,402 integers and initialize each to zero. These will be used to store how many times each index...
Please complete the following task: Create a C++ hash table program for generating a hash from a string. Create a hash function then test and evaluate the results. The set of strings are from a "words.txt" file (there are 45,402 words in the file, no repeating words. Map the words to 45,402 buckets using a hash function). Requirements: Create an array of 45,402 integers and initialize each to zero. These will be used to store how many times each index...
Write a spell checker that stores a set of words, W, in a hash table and implements a function, spellCheck(s), which performs a spell check on the string s with respect to the set of words, W. If s is in W, then the call to spellCheck(s) returns an iterable collection that contains only s, because it is assumed to be spelled correctly in this case. Otherwise, if s is not in W, then the call to spellCheck(s) returns a...
2. 9 marks] Strings. Consider the following definitions on strings Let U be the set of all strings Let s be a string. The length or size of a string, denoted Is, is the number of characters in s Let s be a string, and i e N such that 0 < ί < sl. We write s[i] to represent the character of s at index i, where indexing starts at 0 (so s 0] is the first character, and...
2. 9 marks] Strings. Consider the following definitions on strings Let U be the set of all strings. Let s be a string. The length or size of a string, denoted Is, is the number of characters in s Let s be a string, and ie N such that 0 Si< Is. We write si] to represent the character of s at index i, where indexing starts at 0 (so s(0 is the first character, and s|s -1 is the...
Insert elements into a hash table implemented using chain hashing, as an array of linked list in which each entry slot is as a linked list of key/value pairings that have the same hash (outcome value computed using certain hash function). You are allowed to use “Hash Function”, h(k) = k % x where, x is the value you will need to decide to use, that you find appropriate for this implementation. The main requirements are the following: 1. Input:...
Anagram Difference We define an anagram to be a word whose characters can be rearranged to create another word. Given two strings, we want to know the minimum number of characters in either string that we must modify to make the two strings anagrams. If it's not possible to make the two strings anagrams, we consider this number to be -t For example: tea and ate are anagrams, so we would need to modity 0 characters tea and toe are...
Lab 2: (one task for Program 5): Declare an array of C-strings to hold course names, and read in course names from an input file. Then do the output to show each course name that was read in. See Chapter 8 section on "C-Strings", and the section "Arrays of Strings and C-strings", which gives an example related to this lab. Declare the array for course names as a 2-D char array: char courseNames[10] [ 50]; -each row will be a...
C++ Demonstrate an understanding of array processing. Write a function that accepts the name of a file (by asking the user), an array of strings, and the size of the array. Display each character of each string vertically to the file. For example, if the array is called with an array of 3 strings “hello”, “abc”, “bye”, the output will be: h e l l o …. y e Test the function with a string declaration of: string name[] =...