Question

0. Introduction. This involves designing a perfect hash function for a small set of strings. It...

0. Introduction.

This involves designing a perfect hash function for a small set of strings. It demonstrates that if the set of possible keys is small, then a perfect hash function need not be hard to design, or hard to understand.

1. Theory.

A hash table is an array that associates keys with values. A hash function takes a key as its argument, and returns an index in the array. The object that appears at the index is the key’s value. A hash function may return the same index for two different keys. This is called a collision. A collision is undesirable if two keys with the same index have different values. A hash function that has no collisions for a set of keys is said to be perfect for that set. It may be perfect for some sets of keys, but not for others.
      Most modern programming languages use a small set of reserved names as operators, punctuation, and syntactic markers. Reserved names are also sometimes called reserved words or keywords. For example, Java is said to use 55 keywords, like if, private, while, etc. A compiler for a programming language must be able to test if each name in a program is reserved. The program may be hundreds or thousands of pages long, and may include thousands or millions of names, so the test must be done efficiently. It might be implemented using a hash table and a perfect hash function.
      Here’s how the test works. Suppose that the hash table T is an array of strings. Each time the compiler reads a name N, it calls a perfect hash function h to compute an index h(N). If h(N) is a legal index for T, and T[h(N)] = N, then the name is reserved, otherwise it is not. If there are unused elements in T, then they are empty strings "", because an empty string can’t be a keyword. If we measure the efficiency of a test by the number of string comparisons it performs, then the test requires only O(1) comparisons. Of course this works only if h is perfect for the set of reserved names.

2. Example.

Imagine that there is a very simple programming language which uses the following twelve reserved names.

and

else    

or

begin

end

return

define  

if

then

do

not

while

We might define a perfect hash function for these reserved names like this. We get one or more characters from each name. Then we convert each character to an integer—this is easy, because characters are already represented as small nonnegative integers. In the ASCII and Unicode character sets, the characters 'a' through 'z' are represented as the integers 97 through 122, without gaps. Finally, we do arithmetic on the integers to obtain an index into the hash table. We can choose the characters, and the arithmetic operations, so that no two reserved names result in the same index. For example, if we define the hash function h to add the first and second characters of each name, then we get the following indexes.

h("and")

  =  

207

h("begin")

  =  

199

h("define")

  =  

201

h("do")

  =  

211

h("else")

  =  

209

h("end")

  =  

211

h("if")

  =  

207

h("not")

  =  

221

h("or")

  =  

225

h("return")

  =  

215

h("then")

  =  

220

h("while")

  =  

223

This definition for h does not result in a perfect hash function, because it has collisions. For example, the strings "and" and "if" result in the index 207. Similarly, the strings "do" and "end" result in the index 211. We either didn’t choose the right characters from each string, or we didn’t choose the right operations, or both. Unfortunately, there is no good theory about how to define h. The best we can do is try various definitions, by trial and error, until we find one that is perfect.

3. Implementation.

For this project, you must design a perfect hash function for the reserved names shown in the previous section. To do that, write a small test class Hasher, and run it with various definitions for the function hash. The class’s main method calls hash for each reserved name, and writes indexes to standard output. For full credit, no two reserved names can have the same index.

class Hasher  
{  
  private static final String [] reserved =  
   { "and",  
     "begin",  
     "define",  
     "do",  
     "else",  
     "end",  
     "if",  
     "not",  
     "or",  
     "return",  
     "then",  
     "while" };  
  
  private static int hash(String name)  
  {  
    //  Your code goes here.  
  }  
  
  public static void main(String [] args)  
  {  
    for (int index = 0; index < reserved.length ; index += 1)  
    {  
      System.out.print("h(\"" + reserved[index] + "\") = ");  
      System.out.print(hash(reserved[index]));  
      System.out.println();  
    }  
  }  
}

Your method hash must work in O(1) time, without loops or recursions. It must not use if’s, switch’es, or extra arrays. It must not call the Java method hashCode, because that uses a loop. It must not return negative integers, because they can’t be array indexes.
      Here are some some things to try when defining hash. Try using characters at the same indexes from each name. Try linear combinations of the characters: that is, multiplying them by small constants, then adding or subtracting the results. Try the operator ‘%’. Try a mixture of these. Reject any definition of hash that is not perfect: one that returns the same index for two different names.
      Here are some hints about how to write the code for hash. Characters in Java String’s are indexed starting from 0, and ending with the length of the string minus 1. The character at index k in name is obtained by name.charAt(k). For example, the first character from name is returned by name.charAt(0), the second character by name.charAt(1), and the last character byname.charAt(name.length() - 1). If you use a character where an integer is expected, then the character turns into its integer code. For example, name.charAt(0) + 1 adds one to the code for the first character in name, and returns an integer.
      Don’t worry if there are gaps between the indexes: your hash function need not be minimal. Try to keep the returned indexes small: they shouldn’t exceed 2000. For example, there is a perfect hash function for the reserved words in this assignment whose indexes range from 1177 to 1413. It was found after about ten minutes of trial-and-error search.

0 0
Add a comment Improve this question Transcribed image text
Answer #1

public class Program
{
static int index[]=new int[2000];
private static final String [] reserved =
{ "and",
"begin",
"define",
"do",
"else",
"end",
"if",
"not",
"or",
"return",
"then",
"while" };
  
private static int hash(String name)
{ int num=0;
int len=name.length();
for(int i=0;i<len;i++)
{
num+=name.charAt(i)+0; //hash logic:add all the ascii values of each char in string...Its perfect for a given set...

}
if(index[num-1]==0)
{
index[num-1]=num;
return index[num-1];
}
System.out.println("hash function failed");
return -1;
}
  
public static void main(String[] args)
{
for (int index = 0; index < reserved.length ; index += 1)
{
System.out.print("h(\"" + reserved[index] + "\") = ");
System.out.print(hash(reserved[index]));
System.out.println();
}
}
}

//output:

h("and") = 307 h("begin") = 517 h("define") = 619 h("do") = 211 h("else") = 425 h("end") = 311 h("if") = 207 h("not") = 337 h("or") = 225 h("return") = 672 h("then") = 431 h("while") = 537

Add a comment
Know the answer?
Add Answer to:
0. Introduction. This involves designing a perfect hash function for a small set of strings. It...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • I really need assistance with creating a C++ hash table program for generating a hash from a string. Create a hash funct...

    I really need assistance with creating a C++ hash table program for generating a hash from a string. Create a hash function then test and evaluate the results. The set of strings are from a "words.txt" file (there are 45,402 words in the file, no repeating words. Map the words to 45,402 buckets using a hash function). Requirements: Create an array of 45,402 integers and initialize each to zero. These will be used to store how many times each index...

  • I really need assistance with creating a C++ hash table program for generating a hash from a string. Create a hash funct...

    I really need assistance with creating a C++ hash table program for generating a hash from a string. Create a hash function then test and evaluate the results. The set of strings are from a "words.txt" file (there are 45,402 words in the file, no repeating words. Map the words to 45,402 buckets using a hash function). Requirements: Create an array of 45,402 integers and initialize each to zero. These will be used to store how many times each index...

  • Please complete the following task: Create a C++ hash table program for generating a hash from a string. Create a hash f...

    Please complete the following task: Create a C++ hash table program for generating a hash from a string. Create a hash function then test and evaluate the results. The set of strings are from a "words.txt" file (there are 45,402 words in the file, no repeating words. Map the words to 45,402 buckets using a hash function). Requirements: Create an array of 45,402 integers and initialize each to zero. These will be used to store how many times each index...

  • Write a spell checker that stores a set of words, W, in a hash table and...

    Write a spell checker that stores a set of words, W, in a hash table and implements a function, spellCheck(s), which performs a spell check on the string s with respect to the set of words, W. If s is in W, then the call to spellCheck(s) returns an iterable collection that contains only s, because it is assumed to be spelled correctly in this case. Otherwise, if s is not in W, then the call to spellCheck(s) returns a...

  • 2. 9 marks] Strings. Consider the following definitions on strings Let U be the set of all strings Let s be a str...

    2. 9 marks] Strings. Consider the following definitions on strings Let U be the set of all strings Let s be a string. The length or size of a string, denoted Is, is the number of characters in s Let s be a string, and i e N such that 0 < ί < sl. We write s[i] to represent the character of s at index i, where indexing starts at 0 (so s 0] is the first character, and...

  • 2. 9 marks] Strings. Consider the following definitions on strings Let U be the set of all strings. Let s be a stri...

    2. 9 marks] Strings. Consider the following definitions on strings Let U be the set of all strings. Let s be a string. The length or size of a string, denoted Is, is the number of characters in s Let s be a string, and ie N such that 0 Si< Is. We write si] to represent the character of s at index i, where indexing starts at 0 (so s(0 is the first character, and s|s -1 is the...

  • Insert elements into a hash table implemented using chain hashing, as an array of linked list...

    Insert elements into a hash table implemented using chain hashing, as an array of linked list in which each entry slot is as a linked list of key/value pairings that have the same hash (outcome value computed using certain hash function). You are allowed to use “Hash Function”, h(k) = k % x where, x is the value you will need to decide to use, that you find appropriate for this implementation. The main requirements are the following: 1. Input:...

  • Anagram Difference We define an anagram to be a word whose characters can be rearranged to...

    Anagram Difference We define an anagram to be a word whose characters can be rearranged to create another word. Given two strings, we want to know the minimum number of characters in either string that we must modify to make the two strings anagrams. If it's not possible to make the two strings anagrams, we consider this number to be -t For example: tea and ate are anagrams, so we would need to modity 0 characters tea and toe are...

  • Lab 2: (one task for Program 5): Declare an array of C-strings to hold course names,...

    Lab 2: (one task for Program 5): Declare an array of C-strings to hold course names, and read in course names from an input file. Then do the output to show each course name that was read in. See Chapter 8 section on "C-Strings", and the section "Arrays of Strings and C-strings", which gives an example related to this lab. Declare the array for course names as a 2-D char array: char courseNames[10] [ 50]; -each row will be a...

  • C++ Demonstrate an understanding of array processing. Write a function that accepts the name of a...

    C++ Demonstrate an understanding of array processing. Write a function that accepts the name of a file (by asking the user), an array of strings, and the size of the array. Display each character of each string vertically to the file. For example, if the array is called with an array of 3 strings “hello”, “abc”, “bye”, the output will be: h e l l o …. y e Test the function with a string declaration of: string name[] =...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT