1. 1 HDFS Harbin background
NameNode in HDFS cluster has a single point of failure (SPOF). For a cluster with only one NameNode, if there is a problem with the NameNode machine, the whole cluster will be unavailable until NameNode is restarted.
The unavailability of HDFS cluster mainly includes the following two situations: first, the downtime of NameNode machine will make the cluster unavailable, and it can only be used after restarting NameNode; Secondly, NameNode's software or hardware upgrade plan will make the cluster unavailable for a short time.
In order to solve the above problems, Hadoop gives a highly available HDFS HA scheme: HDFS usually consists of two NameNode, one in active state and the other in standby state. The active NameNode provides external services, such as handling RPC requests from clients, while the standby NameNode does not provide external services, but only synchronizes the status of the active NameNode so that it can be quickly switched when it fails.
1.2 HDFS HA architecture
A typical HA cluster NameNode will be configured on two independent machines. At any time, one NameNode is active and the other NameNode is in the backup state. The active NameNode will respond to all clients in the cluster, while the backup NameNode is only used as a copy to ensure fast transmission if necessary.
In order to keep the standby node and the active node synchronized, both nodes use a set of independent processes called JNS to keep the log node. When the namespace is updated on the active node, it will send the record modification log to most parts of JNS. The standby node will read these edits from JNS and pay attention to their changes to the log. The standby node applies log changes to its own namespace. In case of failover, the standby server will ensure that all edits can be read from JNS before promoting itself to the active server, that is, the namespace held by the standby server should be completely synchronized with the active server before failover occurs.
In order to support fast failover, it is necessary for the standby node to keep the latest location of data blocks in the cluster. To achieve this goal, DataNodes needs to configure the addresses of the two Namenode at the same time, establish heartbeat links with them, and send the block locations to them.
At any time, it is very important to have only one active NameNode, otherwise it will lead to the confusion of cluster operation, and then two NameNodes will have two different data states, which may lead to data loss or abnormal state. This situation is often referred to as "split brain" (three-node communication is blocked, that is, different Datanodes in the cluster see two active NameNodes). For JNS, only one NameNode is allowed as the author; At any time; During failover, the original standby node will take over all functions of the active node and be responsible for writing log records into JNS, thus preventing other NameNode from becoming active nodes.
The HDFS HA scheme based on QJM is shown in the above figure, and its processing flow is as follows: after the cluster is started, a NameNode is in an active state, providing services to handle the requests of clients and DataNode, and writing the editlog to the local and shared editlog (here, QJM). The other NameNode is on standby. It loads the fsimage at startup, and then periodically obtains the edit log from the shared edit log to keep synchronization with the status of the active node. In order to realize Standby's quick service after Active hang-up, DataNode needs to report to two NameNodes at the same time, so Stadnby can save the information of block to DataNode, because the most time-consuming work in NameNode startup is to process the blockreport of all DataNodes. In order to realize hot standby, a fault controller and Zookeeper are added, and the fault controller communicates with Zookeeper. Through Zookeeper election mechanism, FailoverController converts NameNode into active or standby through RPC.
1.3 HDFS high availability configuration element
NameNode machine: two physical machines with the same configuration, running the active node and the standby node respectively.
Jourlnode machine: the machine running Jourlnode. JouralNode daemon is very lightweight and can be deployed with other Hadoop processes, such as NameNode, DataNode, ResourceManager, etc. At least three of them are odd numbers. If you run n JNS, it can allow (N- 1)/2 JNS processes to fail without affecting the work.
In the HA cluster, the standby Namenode also performs checkpoint operations on the namespace (inheriting the characteristics of the backup NameNode), so there is no need to run the SecondaryNameNode, CheckpointNode or BackupNode in the HA cluster.
1.4 HDFS high availability configuration parameters
The following parameters need to be configured in hdfs.xml:
Dfs.nameservices: the logical name of hdfsnn, such as myhdfs.
Dfs.ha.namenodes.myhdfs: a list of nodes with a given service logical name myhdfs, such as nn 1 and nn2.
Dfs.namenode.rpc-address.myhdfs.nn 1: RPC address of nn1of external service in myhdfs.
DFS . NameNode . work-scripts/if CFG-eth 0
//Restart the network service
# Service network restart
//Modify the host name
# hostnamectl set-hostname hostname
//View the host name
# hostnamectl status
3.3 Set the mapping between IP address and host name
//Switch the root user
$ su root
//Edit the host file
# vim /etc/hosts
172. 16.20.8 1
172. 16.20.82
hadoop-slave 1
172. 16 . 20 . 84 Hadoop-slave 2
172. 16 . 20 . 85 Hadoop-slave 3
3.4 Close the firewall and Selinux
//Switch the root user
$ su root
//Stop the firewall firewall
# systemctl stop firewall d.service
//Disable the firewall from starting.
# systemctl disable firewall d.service
//Turn Selinux on and off
# vim /etc/selinux/config
SELINUX = disabled
//After restarting the machine, the root user checks the status of Selinux.
# getenforce
3.5 configure SSH password-free login
//Generate SSH key pair at hadoop-master 1 node.
$ ssh-keygen -t rsa
//Copy the public key to all node machines in the cluster.
$ ssh-copy-id Hadoop-master 1
$ ssh-copy-id hadoop-master2
$ ssh-copy-id Hadoop-slave 1
$ ssh-copy-id hadoop-slave2
$ ssh-copy-id hadoop-slave3
//Log in to each node through ssh to test whether the password-free login is successful.
$ ssh hadoop-master2
Note: Perform the same operation on other nodes to ensure that any node in the cluster can log in to other nodes through ssh without a password.
3.6 installing JDK
//Uninstall the openjdk that comes with the system.
$ suroot
# rpm-qa | grep java
# rpm-e-nodeps Java- 1 . 7 . 0-open JDK- 1 . 7 . 0 . 75-2 . 5 . 4 . 2 . el7 _ 0 . x86 _ 64
# rpm-e-nodeps Java- 1 . 7 . 0-open JDK-headless- 1 . 7 . 0 . 75-2 . 5 . 4 . 2 . el7 _ 0 . x86 _ 64
# rpm-e-nodeps tz data-Java-20 15a- 1 . el7 _ 0 . no arch
# Exit
//extract the jdk installation package
$ tar-xvf jdk-7u79-linux-x64.tar.gz
//Delete the installation package
$ rmjdk-7u79-linux-x64.tar.gz
//Modify user environment variables
$ cd ~
$ vim.bash_profile
export Java _ HOME =/HOME/Hadoop/app/JDK 1 . 7 . 0 _ 79
export PATH = $ PATH:$ JAVA _ HOME/bin
//Make the modified environment variable effective.
$ source.bash_profile
//Test whether the jdk is installed successfully.
$ java version
4 Cluster time synchronization
If the time of cluster nodes is not synchronized, it may cause node downtime or other abnormal problems, so cluster time synchronization is generally realized by configuring NTP servers in production environment. In this cluster, ntp server is set in hadoop-master 1 node, as follows:
//Switch the root user
$ su root
//Check whether ntp is installed.
# rpm -qa | grep ntp
//Install ntp
# yum install -y ntp
//Configure the time server
# vim /etc/ntp.conf
# Prohibit all machines from connecting to ntp servers
Restrict default ignore
# Allow all machines in the LAN to connect to the ntp server.
Restriction 172. 16.20.0 Shielding 255.255.255.0 No modification, no traps.
# Use this machine as a time server
Server127.127.1.0
//Start ntp server
# service ntpd startup
//Set the ntp server to start automatically.
# chkconfig ntpd open
Other nodes in the cluster perform crontab timing tasks and synchronize time with ntp server at the specified time every day. The method is as follows:
//Switch the root user
$ su root
//Perform a scheduled task, synchronize the time with the server at 00:00 every day, and write it in the log.
# crontab -e
0? 0? *? *? *? /usr/sbin/NTP date Hadoop-master 1 & gt; & gt/home/hadoop/ntpd.log
//View tasks
# crontab -l
5 Zookeeper cluster installation
Zookeeper is an open source distributed coordination service, and its unique Leader-Follower cluster structure solves the distributed single point problem well. At present, it is mainly used in unified naming service, configuration management, lock service, cluster management and other scenarios. Zookeeper's cluster management function is mainly used for big data applications.
This cluster uses version zookeeper-3.4.5-CDH5.7.1. Firstly, install Zookeeper on hadoop-slave 1 node, as follows:
//Create a new directory
$ mkdir application /cdh
//Unzip the zookeeper installation package
$ tar-xvf zookeeper-3 . 4 . 5-CD H5 . 7 . 1 . tar . gz-C app/CDH/
//Delete the installation package
$ RM-RF zookeeper-3 . 4 . 5-CD H5 . 7 . 1 . tar . gz
//Configure user environment variables
$ vim。 bash_profile
Export zookeeper _ home =/home/Hadoop/app/CDH/zookeeper-3.4.5-CDH5.7.1
Export path =$PATH:$ZOOKEEPER_HOME/bin
//Make the modified environment variable effective.
$ source.bash_profile
//Modify the configuration file of zookeeper
$ CD app/CDH/zookeeper-3 . 4 . 5-CD H5 . 7 . 1/conf/
$ cp zoo_sample.cfg zoo.cfg
$ vim zoo.cfg
# Client Heartbeat Time (ms)
Ticket time =2000
# Maximum allowable heartbeat interval
initLimit= 10
# Synchronization time limit
syncLimit=5
# Data storage directory
dataDir =/home/Hadoop/app/CDH/zookeeper-3 . 4 . 5-CDH 5 . 7 . 1/data
# Data log storage directory
dataLogDir =/home/Hadoop/app/CDH/zookeeper-3 . 4 . 5-CDH 5 . 7 . 1/data/log
# Port number
Client port =2 18 1
# Cluster node and service port configuration
server . 1 = Hadoop-slave 1:2888:3888
server . 2 = Hadoop-slave 2:2888:3888
server . 3 = Hadoop-slave 3:2888:3888
# The following is the optimized configuration
# By default, the maximum number of connections to a server is 10, and 0 means there is no limit.
maxClientCnxns=0
# Number of snapshots
autopurge.snapRetainCount=3
# Snapshot cleaning time, the default value is 0.
auto purge . purge interval = 1
//Create the data storage directory and log storage directory of zookeeper.
$ cd ..
$ mkdir -p data/log
//Create a file myid in the data directory, and the input content is 1.
$ echo "1" >& gt data/my id
//Modify the log output path of zookeeper (note that the configuration files of CDH version and native version are different)
$ vim libexec/zkEnv.sh
if [ "x${ZOO_LOG_DIR}" = "x" ]
then
ZOO _ LOG _ DIR = " $ ZOOKEEPER _ HOME/logs "
The ship does not bear the loading fee.
if [ "x${ZOO_LOG4J_PROP}" = "x" ]
then
ZOO_LOG4J_PROP="INFO,ROLLINGFILE "
The ship does not bear the loading fee.
//Modify the log configuration file of zookeeper.
$ vim conf/log4j.properties
zookeeper.root.logger=INFO,ROLLINGFILE
//Create a log directory
$ mkdir log
Synchronize the Zookeeper directory on hadoop-slave 1 node to hadoop-slave2 and hadoop-slave3 nodes, and modify the data file of Zookeeper. In addition, don't forget to set user environment variables.
//Copy the zookeeper directory to other nodes in hadoop-slave 1.
$ cd ~
$ scp-r app/CDH/zookeeper-3 . 4 . 5-CD H5 . 7 . 1 Hadoop-slave 2:/home/Hadoop/app/CDH
$ scp-r app/CDH/zookeeper-3 . 4 . 5-CD H5 . 7 . 1 Hadoop-slave 3:/home/Hadoop/app/CDH
//Modify the myid file in the data directory in hadoop-slave2.
$ echo "2 " >app/CDH/zookeeper-3 . 4 . 5-CD H5 . 7 . 1/data/myid
//Modify the myid file in the data directory in hadoop-slave3.
$ echo "3 " >app/CDH/zookeeper-3 . 4 . 5-CD H5 . 7 . 1/data/myid
Finally, start Zookeeper on each node where Zookeeper is installed, and check the node status. The method is as follows:
//Start
$ zkServer.sh start
//View status
$ zkServer.sh status
//Close