Implementation of a MapReduce-style distributed word count application For this assignment, you can use any programming...

Question

Question

Implementation of a MapReduce-style distributed word count application For this assignment, you can use any programming...

Implementation of a MapReduce-style distributed word count application

For this assignment, you can use any programming language you want and you can use either RMI or any version of RPC for client/server communication.

For this assignment, you will focus only on a single type of application; Word Count. In a single Word Count job, the programmer provides a set of text files to be processed, and the frequency of each word in all the documents is counted and stored in a single output file. In your first assignment, you implemented the same application using multiple threads on a single machine. Then, we learnt about MapReduce, and how it was designed as a hybrid model to optimize these types of distributed applications. In this assignment, we’ll emulate the distributed computation that would occur in MapReduce for the Word Count application. Note: The performance of this application will not be as good as Google or Hadoop MapReduce because we are working at the application level, but it will be faster than the single machine implementation.

Suggestion: Hadoop MapReduce-style Distributed Computation

In Hadoop, when a job is submitted by the programmer, a dedicated machine is chosen to act as the master, and manages the job. The master oversees all the map and reduce tasks in the job. Every other node (machine) in the cluster has a “mapManager” and a “reduceManager” remote objects running.

When a programmer starts a new job, s/he directly starts the master object, and provides the IPs of the other machines in the cluster and the path of the input file (in this assignment, we’ll work with a single text file for simplicity).
Once the master has all the information, it starts the job:

1. It opens the file, and reads it line by line.
2. For each line, it starts a mapper task on one of the mapper nodes.
3. A mapper task counts the frequency of occurrence of words in the line sent to it.
4. Then, it contacts the master to get the addresses of the reducers in charge of each key it generated.
5. When the master receives a request from a mapper task for the addresses of the reducer tasks corresponding to its keys, it goes through the mapper task keys and,

a. If the key is not assigned a reducer task, a reducer task is started and the remote object reference is sent to the mapper task.
b. If the key has been assigned to an already existing reducer task, the corresponding object is simply sent to the mapper task.

6. When a reducer task is started, it’s in charge of counting the frequency of occurrence of a specific key word.
7. The mapper task directly contacts the corresponding reducer task, and sends it to its locally stored word count, and terminates when done.

8. The reducer task keeps adding to the frequency count of the key word, until all mapper tasks are done.
9. Once the reducer is done, it sends its results to the master, and terminates when done.
10. Finally, the master stores all the results received from all reducers to an output file, and terminates.

As an option and to help you get this application started, a sample of the RMI interfaces for the master, mapper, and reducer has been provided as below.

import java rmi.*;

import java.rmi.server.*;

import java.rmi.RemoteException;

import java.util.*;

public interface iMaster extends Remote

{

public iReducer[] getReducers(String[] keys) throws RemoteException, AlreadyBoundException;

public void markMapperDone() throws RemoteException;

public void receiveOutput(String key, int value) throws RemoteException;

}

import java rmi.*;

import java.rmi.server;.*;

import java.rmi.RemoteException;

import java.util.*;

public interface iMapper extends Remote

{

public iMapper createMapTask(String name) throws RemoteException, AlreadyBoundException;

public void processInput(String input, iMaster theMaster) throws RemoteException, AlreadyBoundException;

}

import java rmi.*;

import java.rmi.server;.*;

import java.rmi.RemoteException;

import java.util.*;

public interface iReducer extends Remote

{

public iReducer[] createReduceTask(String key, iMaster master) throws RemoteException, AlreadyBoundException;

public void receiveValues(int value) throws RemoteException;

public int terminate() throws RemoteException;

}

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.Mapper;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reporter;

//Code for Mapping

public class Mapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>

{

public void map(LongWritable key, Text value, OutputCollector<Text,IntWritable> output, Reporter rep) throws IOException

{

String line = value.toString();

for (String input_string : line.split(" "))

{

if (input_string.length() > 0)

{

output.collect(new Text(input_string), new IntWritable(1));

}

//Code for Reducing

import java.io.IOException;

import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reducer;

import org.apache.hadoop.mapred.Reporter;

public class Reducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>

{ public void reduce(Text key, Iterator<IntWritable> value, OutputCollector<Text, IntWritable> output,

Reporter rep) throws IOException

{

int count = 0;

while (value.hasNext())

{

IntWritable i = value.next();

count += i.get();

}

output.collect(key, new IntWritable(count));

}

//Code for Driver

import java.io.IOException;

import org.apache.hadoop.conf.Configured;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.FileInputFormat;

import org.apache.hadoop.mapred.FileOutputFormat;

import org.apache.hadoop.mapred.JobClient;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;

public class Driver extends Configured implements Tool

{

public int run(String args[]) throws IOException

{

if (args.length < 2)

{

System.out.println("Please give valid inputs");

return -1;

}

JobConf conf = new JobConf(Driver.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));

FileOutputFormat.setOutputPath(conf, new Path(args[1]));

conf.setMapperClass(Mapper.class);

conf.setReducerClass(Reducer.class);

conf.setMapOutputKeyClass(Text.class);

conf.setMapOutputValueClass(IntWritable.class);

conf.setOutputKeyClass(Text.class);

conf.setOutputValueClass(IntWritable.class);

JobClient.runJob(conf);

return 0;

}

public static void main(String args[]) throws Exception

{

int exitCode = ToolRunner.run(new Driver(), args);

System.out.println(exitCode);

}

Add a comment

Answer 2

Implementation of a MapReduce-style distributed word count application For this assignment, you can use any programming...

Homework Answers

Add Answer to:
Implementation of a MapReduce-style distributed word count application For this assignment, you can use any programming...

Post as a guest

Earn Coins

Assignment 3: Word Frequencies Prepare a text file that contains text to analyze. It could be...

1. Copy the file secret.txt into a path that you can access. Read FilePath.doc if you...

This last lab teaches you to think and solve problems in the functional programming framework of...

Rules of implementation!: You may NOT modify any of the files except Expression.java in ANY way....

CSC110 Lab 6 (ALL CODING IN JAVA) Problem: A text file contains a paragraph. You are to read the contents of the file, store the UNIQUEwords and count the occurrences of each unique word. When the fil...

Programming assignment for Java: Do not add any other instance variables to any class, but you...

1. The first task in this assignment creates the pid manager whose implementation can simply be...

Recursion and Trees Application – Building a Word Index Make sure you have read and understood...

For this lab you will write a Java program that plays a simple Guess The Word...

Use BlueJ to write a program that reads a sequence of data for several car objects...

Implementation of a MapReduce-style distributed word count application For this assignment, you can use any programming...

Homework Answers

Add Answer to: Implementation of a MapReduce-style distributed word count application For this assignment, you can use any programming...

Post as a guest

Earn Coins

Add Answer to:
Implementation of a MapReduce-style distributed word count application For this assignment, you can use any programming...