Architecture of Hadoop

 Hadoop is an open-source framework written in java that enables the distributed process of enormous datasets across clusters of computers victimization straightforward programming models. The Hadoop framework application works in open-source network that gives distributed storage and computation across clusters of computers. Hadoop is meant to proportion from a single server to thousands of machines, every providing native computation, and storage.

Hadoop design

At its core, Hadoop has 2 major layers particularly particularly

Processing/Computation layer (MapReduce), and

Storage layer (Hadoop Distributed file system).


MapReduce


MapReduce may be a parallel programming model for writing distributed applications devised at Google for the economical process of enormous amounts of information (multi-terabyte data-sets), on massive clusters (thousands of nodes) of trade goods hardware in an exceedingly reliable, fault-tolerant manner. The MapReduce program runs on Hadoop that is an open-source framework.


Hadoop Distributed filing system


The Hadoop Distributed filing system (HDFS) is predicated on the Google filing system (GFS) and provides a distributed filing system that's designed to run on trade goods hardware. it's several similarities with existing distributed file systems. However, the variations from different distributed file systems area unit vital. it's extremely fault-tolerant and is meant to be deployed on affordable hardware. It provides high outturn access to application information and is appropriate for applications having massive datasets.


Modules of Hadoop


HDFS: Hadoop Distributed file system. Google printed its paper GFS and on the premise of that HDFS was developed. It states that the files are broken into blocks and hold on in nodes over the distributed design.


Yarn: yet one more Resource communicator is employed for job programing and manage the cluster.


Map Reduce: this can be a framework that helps Java programs to try to to the parallel computation on information victimization key price try. The Map task takes the input file and converts it into an information set that might be computed in Key price try. The output of the Map task is consumed by scale back task so the out of reducer offers the specified result.


Hadoop Common: These Java libraries area unit wont to begin Hadoop networking and area unit utilized by different Hadoop modules.


Hadoop design

The Hadoop design may be a package of the filing system, MapReduce engine and also the HDFS (Hadoop Distributed File System). The MapReduce engine will be MapReduce/MR1 or YARN/MR2.

A Hadoop cluster consists of one master and multiple slave nodes. The master node includes Job huntsman, Task huntsman, NameNode, and DataNode whereas the slave node includes DataNode and TaskTracker.


Hadoop Distributed File System


The Hadoop Distributed filing system (HDFS) may be a distributed filing system for Hadoop. It contains a master/slave design. This design includes one NameNode that performs the role of master, and multiple DataNodes performs the role of a slave.

Both NameNode and DataNode area units capable enough to run on trade goods machines. The Java language is employed to develop HDFS. therefore any machine that supports Java language will simply run the NameNode and DataNode software package.


NameNode

It is one master server that exists within the HDFS cluster.

As it may be a single node, it's going to become the explanation of single-purpose failure.

It manages the filing system namespace by execution Associate in Nursing operation just like the gap, renaming and shutting the files.

It simplifies the design of the system.


DataNode

The HDFS cluster contains multiple DataNodes.

Each DataNode contains multiple information blocks.

These information blocks area unit wont to store information.

It is the responsibility of DataNode to browse and write requests from the file system's shoppers.

It performs block creation, deletion, and replication upon instruction from the NameNode.






Job Tracker

The role of Job tracker is to just accept the MapReduce jobs from clients and method the information by victimization NameNode.

In response, NameNode provides information to the Job tracker.


Task tracker

It works as a slave node for Job huntsman.

It receives tasks and code from the Job tracker and applies that code on the file. This method can even be known as a Mapper.


MapReduce Layer

The MapReduce comes into existence once the consumer application submits the MapReduce job to Job tracker. In response, the work huntsman sends the request to the acceptable Task Trackers.

Comments

Popular posts from this blog

What is Hadoop?