Question

MapReduce and Hadoop (a) Explain the difference between map and reduce tasks in the MapReduce framework. (b) How does the Hadoop framework ensure that no reduce tasks can begin until all map tasks hav...

MapReduce and Hadoop

(a) Explain the difference between map and reduce tasks in the MapReduce framework.

(b) How does the Hadoop framework ensure that no reduce tasks can begin until all map tasks have finished?

(c) When a worker node fails in Hadoop, its tasks are reassigned to other workers. What guarantees that the data being processed by the failed node is available to these other workers?

0 0
Add a comment Improve this question Transcribed image text
Answer #1

a)

MapReduce is a framework using which we can write applications to process huge amounts of data, in parallel, on large clusters of commodity hardware in a reliable manner.

The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). Secondly, reduce task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples. As the sequence of the name MapReduce implies, the reduce task is always performed after the map job.

b)

  1. A Task Tracker is a slave node which accepts and executes the tasks from Job Tracker. Task Tracker runs in its own JVM Process. Every Task Tracker is configured with a set number of slots which indicates the number of tasks it can accept. The Task Tracker starts a separate JVM processes to do the actual work (called as Task Instance), this is to ensure that process failure does not take down the Task Tracker. The Task Tracker monitors these task instances, capturing the output and exit codes. When the Task instances finish, successfully or not, the task tracker notifies the Job Tracker. The Task Trackers also send out heartbeat messages to the Job Tracker, usually every few minutes, to reassure the Job Tracker that it is still alive. These messages also update the Job Tracker about the number of slots available.

c)

On a system of this scale, failure is common place. It is the job of the Master to periodically ping the Workers. If a Worker doesn't answer, it is marked as bad, and its work is rescheduled to another Worker. Furthermore, any Reduce that was scheduled to get results from the old Worker is told to begin getting them from the new Worker, instead.

When a Map worker dies, it needs to be re-executed from scratch. The reason for this is the results are stored on the Worker's local disk and are now inaccessible to Reduces. But, should a Reduce Worker fail, its results remain available in the global file system.

Why the difference? Well, remember, the results of a Reduce are designed for consumption by the end user. Because of this, they are placed in a distributed file system such that the program can get to all of them in one place.

By contrast the results of the Map Workers are intended only for consumption by a particular worker, so they are left in place. Instead, the upsteam reduce Worker is told of their location by the Master and they suck in the data explicitly, in a location-aware way, by an RPC-like mechanism.

Add a comment
Know the answer?
Add Answer to:
MapReduce and Hadoop (a) Explain the difference between map and reduce tasks in the MapReduce framework. (b) How does the Hadoop framework ensure that no reduce tasks can begin until all map tasks hav...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Case 2.1: Organizational Culture Can Help Reduce Burnout in Hospitals There are more than 5,600 hospitals...

    Case 2.1: Organizational Culture Can Help Reduce Burnout in Hospitals There are more than 5,600 hospitals in the United States that admit a total of approximately 35 million patients each year, so it is no surprise that there is a great amount of pressure on physicians, nurses, staff, and hospital administrators to provide top quality care with the utmost urgency and accuracy. The services these health care professionals provide are invaluable and the decisions they make can have a lasting...

  • How can we assess whether a project is a success or a failure? This case presents...

    How can we assess whether a project is a success or a failure? This case presents two phases of a large business transformation project involving the implementation of an ERP system with the aim of creating an integrated company. The case illustrates some of the challenges associated with integration. It also presents the obstacles facing companies that undertake projects involving large information technology projects. Bombardier and Its Environment Joseph-Armand Bombardier was 15 years old when he built his first snowmobile...

  • I have this case study to solve. i want to ask which type of case study...

    I have this case study to solve. i want to ask which type of case study in this like problem, evaluation or decision? if its decision then what are the criterias and all? Stardust Petroleum Sendirian Berhad: how to inculcate the pro-active safety culture? Farzana Quoquab, Nomahaza Mahadi, Taram Satiraksa Wan Abdullah and Jihad Mohammad Coming together is a beginning; keeping together is progress; working together is success. - Henry Ford The beginning Stardust was established in 2013 as a...

  • In your judgement, and given only the facts described in this case, should the management of...

    In your judgement, and given only the facts described in this case, should the management of Massey energy Company be held morally responsible for the deaths of the 29 miners? Explain in detail. Suppose that nothing more is learned about the explosion other than what is described in this case. Do you think Don Blankership should be held morally responsible for the deaths of the 29 miners? Explain in detail. Given only the facts described in this case, should the...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT