||
1. 集群规划
主机名 |
IP |
安装的软件 |
运行的进程 |
yun01-nn-01 |
192.168.56.11 |
jdk,Hadoop,zookeeper |
NameNode、DFSZKFailoverController(zkfc)、ResourceManager |
yun01-nn-02 |
192.168.56.12 |
jdk,Hadoop,zookeeper |
NameNode、DataNode、NodeManager、 QuorumPeerMain、DFSZKFailoverController、 JournalNode |
yun01-dn-01 |
192.168.56.13 |
jdk,Hadoop,zookeeper |
JournalNode、QuorumPeerMain、DataNode、NodeManager |
yun01-dn-02 |
192.168.56.14 |
jdk,Hadoop,zookeeper |
JournalNode、QuorumPeerMain、DataNode、NodeManager |
准备工具:hadoop-2.6.0-x64.tar.gz、zookeeper-3.4.5.tar.gz、jdk-7u76-linux-x64.tar.gz
2. 安装主机模板
2.1. 修改IP地址、主机名、主机与IP地址映射
修改IP地址:
vi /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
HWADDR=08:00:27:31:06:BA
TYPE=Ethernet
UUID=197d4736-3f53-4e3f-81a9-d8a704b026f5
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=static
IPADDR=192.168.56.11
NETMASK=255.255.255.0
GATWAY=192.168.56.1
修改主机名:
vi /etc/sysconfig/network
HOSTNAME=yun01-nn-01
修改主机名与IP地址映射:
vi /etc/hosts
192.168.56.11 yun01-nn-01
192.168.56.12 yun01-nn-02
192.168.56.13 yun01-dn-01
192.168.56.14 yun01-dn-02
2.2. 创建用户和组
使用root用户登录:
groupadd hadoop
useradd -g hadoop hadoop
passwd hadoop
Hadoop
2.3. 创建目录并授权给hadoop用户和组
mkdir –p /data/hadoop
mkdir –p /application/hadoop
chown -R hadoop:hadoop /data/hadoop
chown -R hadoop:hadoop /application/Hadoop
2.4. 配置无密码登录
[hadoop@master ~]$ mkdir .ssh
[hadoop@master ~]$ chmod 755 .ssh
[hadoop@master ~]$ ssh-keygen -t rsa -P ''
遇到要输入的地方直接回车
[hadoop@master ~]$ cd .ssh
[hadoop@master .ssh]$ ls
id_rsa id_rsa.pub
[hadoop@master .ssh]$ cat id_rsa.pub >> authorized_keys
[hadoop@master .ssh]$ chmod 600 authorized_keys
验证:
ssh master
第一次输入yes
退出:
exit
再连一次:
ssh master
不需要再输入密码。
2.5. 关闭防火墙和SELINUX
service iptables stop
iptables –F
setenforce 0
vi /etc/selinux/config
SELINUX=disabled
/etc/init.d/iptables save
2.6. 安装JDK和hadoop
使用hadoop用户登录,将jdk-7u76-linux-x64.tar.gz和hadoop-2.6.0-x64.tar.gz拷到/application/hadoop下,
卸载原来的JDK:
rpm -qa|grep jdk
显示:jdk-1.6.0_10-fcs
卸载:#rpm -e --nodeps jdk-1.6.0_10-fcs
授予执行权限:
chmod +x jdk-7u76-linux-x64.tar.gz执行:
tar –zxvf jdk-7u76-linux-x64.tar.gz
ln –s jdk-7u76-linux-x64 jdk
安装hadoop
tar -zxvf hadoop-2.6.0-x64.tar.gz
ln –s hadoop-2.6.0 hadoop
配置环境变量:
切换到root用户:
vi /etc/profile.d/java.sh
export JAVA_HOME=/application/hadoop/jdk
export HADOOP_HOME=/application/hadoop/hadoop
export HADOOP_PREFIX=/application/hadoop/hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH
2.7. 创建hadoop用到的目录
使用hadoop用户登录
mkdir -p /application/hadoop
mkdir -p /data/hadoop/hdfs/journal
mkdir -p /data/hadoop/mapred/mrlocal
mkdir -p /data/hadoop/mapred/logs
mkdir -p /data/hadoop/hdfs/data
mkdir -p /data/hadoop/hdfs/namesecondary
mkdir -p /data/hadoop/tmp
mkdir -p /data/hadoop/name
mkdir -p /data/hadoop/zookeeper/tmp
2.8. 修改hadoop-env.sh
vi /application/hadoop/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/application/hadoop/jdk
2.9. 修改core-site.xml,hdfs-site.xml,mapred-site.xml,yarn-site配置
mapred-site.xml在目录下本没有,cp mapred-site.xml.template mapred-site.xml
cd /application/Hadoop/Hadoop/etc/Hadoop,具体配置如下:
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop/hdfs/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>yun01-nn-02:2181,yun01-dn-01:2181,yun01-dn-02:2181</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>yun01-nn-01:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>yun01-nn-01:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>yun01-nn-02:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>yun01-nn-02:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://yun01-nn-02:8485;yun01-dn-01:8485;yun01-dn-02:8485/ns1</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/hadoop/hdfs/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>3000</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
</configuration>
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml:
<configuration>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>RM_HA_ID</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>yun01-nn-01</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>yun01-nn-02</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>yun01-nn-02:2181,yun01-dn-01:2181,yun01-dn-02:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
2.10. 修改slaves
vi /application/hadoop/hadoop/etc/hadoop/slaves
yun01-nn-02
yun01-dn-01
yun01-dn-02
主机模板至此完成,接下来可以克隆了。
3. 克隆4台服务器
克隆后,修改每台服务器的主机名和IP
修改主机名: vi /etc/sysconfig/network
修改IP: vi /etc/sysconfig/network-scripts/ifcfg-eth0
-------------------------
DEVICE=eth0
#HWADDR=08:00:27:31:06:BA
TYPE=Ethernet
UUID=197d4736-3f53-4e3f-81a9-d8a704b026f5
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=static
IPADDR=192.168.56.11
NETMASK=255.255.255.0
GATWAY=192.168.56.1
注意:上述文件中除了修改IP以外,还要注释掉原来物理地址
service network restart
重新启动网络服务报错,解决办法:
cd /etc/udev/rules.d
rm -f 70-persistent-net.rules
reboot
4. 安装zooekeeper集群
4.1. 安装zookeepe
使用hadoop用户登录yun01-nn-02
将zookeeper-3.4.5.tar.gz拷贝到/application/hadoop下
tar -zxvf zookeeper-3.4.5.tar.gz
创建软连接
ln -s zookeeper-3.4.5 zookeeper
修改zookeeper配置:
cd /application/hadoop/zookeeper/conf
cp zoo_sample.cfg zoo.cfg
vim zoo.cfg
dataDir=/data/hadoop/zookeeper/tmp
在最后加上:
server.1=yun01-nn-02:2888:3888
server.2=yun01-dn-01:2888:3888
server.3=yun01-dn-02:2888:3888
cd /data/hadoop/zookeeper/tmp
新建一空文件:
[hadoop@yun01-nn-02 tmp]$ touch myid
[hadoop@yun01-nn-02 tmp]$ echo 1 > myid
将zookeeper-3.4.5 拷贝到yun01-dn-01和yun01-dn-02
scp -r /application/hadoop/zookeeper-3.4.5 hadoop@yun01-dn-01:/application/hadoop/zookeeper
scp -r /application/hadoop/zookeeper-3.4.5 hadoop@yun01-dn-02:/application/hadoop/zookeeper
ssh yun01-dn-01
cd /data/hadoop/zookeeper/tmp
touch myid
echo 2 > myid
ssh yun01-dn-02
cd /data/hadoop/zookeeper/tmp
touch myid
echo 3 > myid
4.2. 启动zookeeper
分别在yun01-nn-02,yun01-dn-01,yun01-dn-02上启动zookeeper:
cd /application/hadoop/zookeeper/bin
./zkServer.sh start
或者:
/application/hadoop/zookeeper/bin/zkServer.sh start
/application/hadoop/zookeeper/bin/zkServer.sh status
三台服务器上的都启动后,查看状态:
./zkServer.sh status
如下图:
三台服务器启动journalnode:
cd /application/hadoop/hadoop/sbin
./hadoop-daemon.sh start journalnode
或者:
/application/hadoop/hadoop/sbin/hadoop-daemon.sh start journalnode
运行检查命令:jps
/application/hadoop/hadoop/logs/hadoop-hadoop-journalnode-yun01-nn-02.out
cd /application/hadoop/hadoop/logs/
5. 格式化
在yun01-nn-01上执行:
5.1. 格式化namenode
在yun01-nn-01上执行:
/application/hadoop/hadoop/bin/hdfs namenode -format
将格式化生成的目录拷贝到yun01-nn-02上:
scp -r /data/hadoop/hdfs/tmp/ hadoop@yun01-nn-02:/data/hadoop/hdfs/tmp
5.2. 格式化zkfc
/application/hadoop/hadoop/bin/hdfs zkfc –formatZK
5.3. 启动和验证
5.3.1. 启动
/application/hadoop/hadoop/sbin/start-dfs.sh
查看:
jps
启动yarn:
/application/hadoop/hadoop/sbin/start-yarn.sh
5.3.2. 验证
5.3.3. 网页上验证启动
点击Live Nodes可以查看datanode:
另一个nameNode如下:
hadoop fs –ls hdfs://ns1/
5.3.4. 验证主从切换
在yun01-nn-01上上传一文件:
[hadoop@yun01-nn-01 hadoop]$ hadoop fs -put bbb.txt /
[root@yun01-nn-01 profile.d]# jps
2675 Jps
1778 NameNode
2066 DFSZKFailoverController
2188 ResourceManager
干掉NameNode:
[root@yun01-nn-01 profile.d]# kill -9 1778
使用命令查看:
[hadoop@yun01-nn-01 hadoop]$ jps
8395 ResourceManager
8887 Jps
8222 DFSZKFailoverController
但集群仍然可以使用:
[hadoop@yun01-nn-01 hadoop]$ hadoop fs -cat /bbb.txt
12323423asdf
干掉主后,打开浏览器查看,yun01-nn-02已变成主:
手动启动yun01-nn-01:
/application/hadoop/hadoop/sbin/hadoop-daemon.sh start namenode
在浏览器中查看,发现已启动,但已变成dtandby:
6. 集群的启动与停止
6.1. 启动
使用hadoop用户登录yun01-nn-01
/application/hadoop/hadoop/sbin/start-dfs.sh
结果:
[hadoop@yun01-nn-01 ~]$ jps
8287 Jps
7930 NameNode
8222 DFSZKFailoverController
yun01-nn-02:
[hadoop@yun01-nn-02 ~]$ jps
4317 DataNode
4539 DFSZKFailoverController
4419 JournalNode
4256 QuorumPeerMain
4638 Jps
4217 NameNode
[hadoop@yun01-dn-01 ~]$ ssh yun01-dn-01
[hadoop@yun01-dn-01 ~]$ jps
1684 Jps
32150 QuorumPeerMain
1461 JournalNode
1350 DataNode
[hadoop@yun01-dn-01 ~]$ ssh yun01-dn-02
Last login: Thu Apr 21 22:16:15 2016 from yun01-dn-01
[hadoop@yun01-dn-02 ~]$ jps
1324 JournalNode
1505 Jps
3147 QuorumPeerMain
1215 DataNode
/application/hadoop/hadoop/sbin/start-yarn.sh
[hadoop@yun01-nn-01 ~]$ jps
8395 ResourceManager
8637 Jps
7930 NameNode
8222 DFSZKFailoverController
6.2. 查看NN状态
/application/hadoop/hadoop/bin/hdfs haadmin -getServiceState nn1
[hadoop@yun01-nn-01 conf]$ /application/hadoop/hadoop/bin/hdfs haadmin -getServiceState nn1
active
主从切换:hdfs haadmin -failover --forcefence --forceactive nn2 nn1(必须dfs.ha.automatic-failover.enabled为false时才有用,一般情况下此值为true,所以此命令会报错)
6.3. 停止
总述:使用hadoop用户登录yun01-nn-01,使用命令停止部分服务:
/application/hadoop/hadoop/sbin/stop-yarn.sh
/application/hadoop/hadoop/sbin/stop-dfs.sh
zookeeper的进程要到其它服务器停止:
登录yun01-nn-02,yun01-dn02,yun01-dn-02执行命令:
/application/hadoop/zookeeper/bin/zkServer.sh stop
详细步骤说明:
使用hadoop用户登录yun01-nn-01
/application/hadoop/hadoop/sbin/stop-yarn.sh
/application/hadoop/hadoop/sbin/stop-dfs.sh
查看:
[hadoop@yun01-nn-01 hadoop]$ jps
9808 Jps
使用hadoop用户登录yun01-nn-02
[hadoop@yun01-nn-01 hadoop]$ ssh yun01-nn-02
Last login: Thu Apr 21 23:23:47 2016 from yun01-nn-01
[hadoop@yun01-nn-02 ~]$ jps
5402 Jps
4256 QuorumPeerMain
可见zookeeper并未停止,需要额外停止:
[hadoop@yun01-nn-02 ~]$ /application/hadoop/zookeeper/bin/zkServer.sh stop
JMX enabled by default
Using config: /application/hadoop/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
查看:
[hadoop@yun01-nn-02 ~]$ jps
5433 Jps
已全部停止
同样方法停止yun01-dn-01t和yun01-dn-02上的zookeeper:
[hadoop@yun01-nn-02 ~]$ ssh yun01-dn-01
/application/hadoop/zookeeper/bin/zkServer.sh stop
[hadoop@yun01-dn-01 ~]$ ssh yun01-dn-02
/application/hadoop/zookeeper/bin/zkServer.sh stop