Assuming you are an IT consultant providing companies solutions for the analysis big data. You know that Hadoop framework, thanks to MapReduce can allow users to process and extract various different type of information from very large text files. In order to convince the owner of a medium size company to install Hadoop into the company cluster:
Provide a brief definition of the Hadoop Distributed File System and of MapReduce, and briefly explain how Hadoop works by listing using bullet-points the main steps for processing a single large data file ?
1) Hadoop istributed file system (HDFS), as it's name states it was implemented using distributed file systems and files are stored through out the multiple machines .It used to store large amount of data of an applications and data stored using distributed files,so it provides easy access to the clients.It allows parallel operations on data such as rea ,write,append.The HDFS built upon commodity servers such as namenode and datanode which manages client's access to data ,performs operations on data as client required.Very important aspect of HDFS is that fault tolerant and hence provides easy management.HDFS tolerates data inconsistency, disk failure issues.As all these features injected into HDFS so it is highly reliable,scalable and manageable.
2) Mapreuce is a framework used to process the data in parallel manner in a distributed applications.So this framework highly used with HDFS to process the data from distributed files simultanously as per the client requirements.Basically Mapreduce is a two layer task,firts maps the data in key-value pair.Then this mapping will be used to reduce the key-value data in smaller set of key-value pair.
Assuming you are an IT consultant providing companies solutions for the analysis big data. You know that Hadoop framework, thanks to MapReduce can allow users to process and extract various different...