Hadoop is a software framework that can process a large amount of data in a distributed way. It implements Google's MapReduce programming model and framework, which can divide an application into many small work units and execute them on any cluster node.
MapReduce is the core module of data operation in Hadoop. MapReduce generates task running files through JobClient, and schedules and assigns TaskTracker to complete tasks in JobTracker.
Extended data
1, MapReduce distributed computing framework prototype;
MapReduce distributed computing model was put forward by Google, which is mainly used in search field to solve the problem of massive data computing. Apache open source implementation, integrated in hadoop, to achieve universal distributed data computing.
MR includes two stages: Map and Reduce. Users only need to implement map () and reduce () functions to realize distributed computing, which is very simple. It greatly simplifies the development of distributed concurrent processors.
The mapping stage is segmentation processing.
The Reduce stage is summary processing. After summarizing, you can also perform a series of beautification operations on the data and then output it.
2. Introduction of 2.MapReduce components:
JobClient: used to generate the running package of the job from the user's job task and put it into HDFS.
JobinProgress: decompose the job running package into MapTask and ReduceTask, and store them in TaskTracker.
JobTracker (master): Scheduling and managing TaskTracker to perform tasks.
TaskTracker: Perform the assigned map calculation or reduce the calculation tasks.