Question

Implementation of a MapReduce-style distributed word count application For this assignment, you can use any programming...

Implementation of a MapReduce-style distributed word count application

For this assignment, you can use any programming language you want and you can use either RMI or any version of RPC for client/server communication.

For this assignment, you will focus only on a single type of application; Word Count. In a single Word Count job, the programmer provides a set of text files to be processed, and the frequency of each word in all the documents is counted and stored in a single output file. In your first assignment, you implemented the same application using multiple threads on a single machine. Then, we learnt about MapReduce, and how it was designed as a hybrid model to optimize these types of distributed applications. In this assignment, we’ll emulate the distributed computation that would occur in MapReduce for the Word Count application. Note: The performance of this application will not be as good as Google or Hadoop MapReduce because we are working at the application level, but it will be faster than the single machine implementation.

Suggestion: Hadoop MapReduce-style Distributed Computation

In Hadoop, when a job is submitted by the programmer, a dedicated machine is chosen to act as the master, and manages the job. The master oversees all the map and reduce tasks in the job. Every other node (machine) in the cluster has a “mapManager” and a “reduceManager” remote objects running.

When a programmer starts a new job, s/he directly starts the master object, and provides the IPs of the other machines in the cluster and the path of the input file (in this assignment, we’ll work with a single text file for simplicity).
Once the master has all the information, it starts the job:

1. It opens the file, and reads it line by line.
2. For each line, it starts a mapper task on one of the mapper nodes.
3. A mapper task counts the frequency of occurrence of words in the line sent to it.
4. Then, it contacts the master to get the addresses of the reducers in charge of each key it generated.
5. When the master receives a request from a mapper task for the addresses of the reducer tasks corresponding to its keys, it goes through the mapper task keys and,

a. If the key is not assigned a reducer task, a reducer task is started and the remote object reference is sent to the mapper task.
b. If the key has been assigned to an already existing reducer task, the corresponding object is simply sent to the mapper task.

6. When a reducer task is started, it’s in charge of counting the frequency of occurrence of a specific key word.
7. The mapper task directly contacts the corresponding reducer task, and sends it to its locally stored word count, and terminates when done.

8. The reducer task keeps adding to the frequency count of the key word, until all mapper tasks are done.
9. Once the reducer is done, it sends its results to the master, and terminates when done.
10. Finally, the master stores all the results received from all reducers to an output file, and terminates.

As an option and to help you get this application started, a sample of the RMI interfaces for the master, mapper, and reducer has been provided as below.

import java rmi.*;

import java.rmi.server.*;

import java.rmi.RemoteException;

import java.util.*;

public interface iMaster extends Remote

{

public iReducer[] getReducers(String[] keys) throws RemoteException, AlreadyBoundException;

public void markMapperDone() throws RemoteException;

public void receiveOutput(String key, int value) throws RemoteException;

}

import java rmi.*;

import java.rmi.server;.*;

import java.rmi.RemoteException;

import java.util.*;

public interface iMapper extends Remote

{

public iMapper createMapTask(String name) throws RemoteException, AlreadyBoundException;

public void processInput(String input, iMaster theMaster) throws RemoteException, AlreadyBoundException;

}

import java rmi.*;

import java.rmi.server;.*;

import java.rmi.RemoteException;

import java.util.*;

public interface iReducer extends Remote

{

public iReducer[] createReduceTask(String key, iMaster master) throws RemoteException, AlreadyBoundException;

public void receiveValues(int value) throws RemoteException;

public int terminate() throws RemoteException;

}

0 0
Add a comment Improve this question Transcribed image text
Answer #1

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.Mapper;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reporter;

//Code for Mapping

public class Mapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>

{

    public void map(LongWritable key, Text value, OutputCollector<Text,IntWritable> output, Reporter rep) throws IOException

    {

        String line = value.toString();

        for (String input_string : line.split(" "))

        {

            if (input_string.length() > 0)

            {

                output.collect(new Text(input_string), new IntWritable(1));

            }

        }

    }

}

//Code for Reducing

import java.io.IOException;

import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reducer;

import org.apache.hadoop.mapred.Reporter;

public class Reducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>

{    public void reduce(Text key, Iterator<IntWritable> value, OutputCollector<Text, IntWritable> output,

                            Reporter rep) throws IOException

    {

  

        int count = 0;

  

      

        while (value.hasNext())

        {

            IntWritable i = value.next();

            count += i.get();

        }

output.collect(key, new IntWritable(count));

    }

}

//Code for Driver

import java.io.IOException;

import org.apache.hadoop.conf.Configured;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.FileInputFormat;

import org.apache.hadoop.mapred.FileOutputFormat;

import org.apache.hadoop.mapred.JobClient;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;   

public class Driver extends Configured implements Tool

{

    public int run(String args[]) throws IOException

    {

        if (args.length < 2)

        {

            System.out.println("Please give valid inputs");

            return -1;

        }

        JobConf conf = new JobConf(Driver.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));

        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        conf.setMapperClass(Mapper.class);

        conf.setReducerClass(Reducer.class);

        conf.setMapOutputKeyClass(Text.class);

        conf.setMapOutputValueClass(IntWritable.class);

        conf.setOutputKeyClass(Text.class);

        conf.setOutputValueClass(IntWritable.class);

        JobClient.runJob(conf);

        return 0;

    }

  

  

   public static void main(String args[]) throws Exception

    {

        int exitCode = ToolRunner.run(new Driver(), args);

        System.out.println(exitCode);

    }

}

Add a comment
Know the answer?
Add Answer to:
Implementation of a MapReduce-style distributed word count application For this assignment, you can use any programming...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Assignment 3: Word Frequencies Prepare a text file that contains text to analyze. It could be...

    Assignment 3: Word Frequencies Prepare a text file that contains text to analyze. It could be song lyrics to your favorite song. With your code, you’ll read from the text file and capture the data into a data structure. Using a data structure, write the code to count the appearance of each unique word in the lyrics. Print out a word frequency list. Example of the word frequency list: 100: frog 94: dog 43: cog 20: bog Advice: You can...

  • 1. Copy the file secret.txt into a path that you can access. Read FilePath.doc if you...

    1. Copy the file secret.txt into a path that you can access. Read FilePath.doc if you have questions on file path. Copy SecretMessage.java into your NetBeans or other IDE tools. 2. Finish the main method that will read the file secret.txt, separate it into word tokens.You should process the tokens by taking the first letter of every fifth word, starting with the first word in the file. These letters should converted to capitals, then be appended to StringBuffer object to...

  • This last lab teaches you to think and solve problems in the functional programming framework of...

    This last lab teaches you to think and solve problems in the functional programming framework of the Java 8 computation streams. Therefore in this lab, you are absolutely forbidden to use any conditional statements (either if or switch), loops (either for, while or do-while) or even recursion. All computation must be implemented using only computation streams and their operations! In this lab, we also check out the Java NIO framework for better file operations than those offered in the old...

  • Rules of implementation!: You may NOT modify any of the files except Expression.java in ANY way....

    Rules of implementation!: You may NOT modify any of the files except Expression.java in ANY way. You may NOT make ANY modifications to Expression.java EXCEPT: Write in the bodies of the methods you are asked to implement, Add private helper methods as needed (including the recursive evaluate method discussed below.) Note that the java.io.*, java.util.*, and java.util.regex.* import statements at the top of the file allow for using ANY class in java.io, java.util, and java.util.regex without additional specification or qualification....

  • CSC110 Lab 6 (ALL CODING IN JAVA) Problem: A text file contains a paragraph. You are to read the contents of the file, store the UNIQUEwords and count the occurrences of each unique word. When the fil...

    CSC110 Lab 6 (ALL CODING IN JAVA) Problem: A text file contains a paragraph. You are to read the contents of the file, store the UNIQUEwords and count the occurrences of each unique word. When the file is completely read, write the words and the number of occurrences to a text file. The output should be the words in ALPHABETICAL order along with the number of times they occur and the number of syllables. Then write the following statistics to...

  • Programming assignment for Java: Do not add any other instance variables to any class, but you...

    Programming assignment for Java: Do not add any other instance variables to any class, but you can create local variables in a method to accomplish tasks. Do not create any methods other than the ones listed below. Step 1 Develop the following class: Class Name: College Degree Access Modifier: public Instance variables Name: major Access modifier: private Data type: String Name: numberOfCourses Access modifier: private Data type: int Name: courseNameArray Access modifier: private Data type: String Name: courseCreditArray Access modifier:...

  • 1. The first task in this assignment creates the pid manager whose implementation can simply be...

    1. The first task in this assignment creates the pid manager whose implementation can simply be a single class. Of course, you can create any other classes you might need to implement the pid manager. You may use any data structure of your choice to represent the availability of process identifiers. One strategy adopts Linux’s approach of a bitmap in which a value of 0 at position i indicates that a process id of value i is available and a...

  • Recursion and Trees Application – Building a Word Index Make sure you have read and understood...

    Recursion and Trees Application – Building a Word Index Make sure you have read and understood ·         lesson modules week 10 and 11 ·         chapters 9 and 10 of our text ·         module - Lab Homework Requirements before submitting this assignment. Hand in only one program, please. Background: In many applications, the composition of a collection of data items changes over time. Not only are new data items added and existing ones removed, but data items may be duplicated. A list data structure...

  • For this lab you will write a Java program that plays a simple Guess The Word...

    For this lab you will write a Java program that plays a simple Guess The Word game. The program will prompt the user to enter the name of a file containing a list of words. These words mustbe stored in an ArrayList, and the program will not know how many words are in the file before it starts putting them in the list. When all of the words have been read from the file, the program randomly chooses one word...

  • Use BlueJ to write a program that reads a sequence of data for several car objects...

    Use BlueJ to write a program that reads a sequence of data for several car objects from an input file. It stores the data in an ArrayList<Car> list . Program should work for input file containing info for any number of cars. (You should not assume that it will always be seven lines in the input file). Use notepad to create input file "inData.txt". File should be stored in the same folder where all files from BlueJ for this program...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT