分享

【Hadoop学习】CDH5.2安装部署

xioaxu790 发表于 2015-1-4 20:16:56 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 0 52764
本帖最后由 xioaxu790 于 2015-1-4 20:19 编辑
问题导读
1、CDH5.2安装部署,需要哪些条件?
2、需要将哪些文件复制到集群中的所有主机上?
3、如何测试YARN?





【平台】Centos 6.5
【工具】scp
【软件】jdk-7u67-linux-x64.rpm
    CDH5.2.0-hadoop2.5.0

【步骤】
    1. 准备条件
      (1)集群规划
  
主机类型
  
IP地址
域名
master
192.168.50.10
master.hadoop.com
slave1
192.168.50.11
slave1.hadoop.com
slave2
192.168.50.12
slave2.hadoop.com
slave3
192.168.50.13
slave3.hadoop.com


        (2)以root身份登录操作系统

      (3)在集群中的每台主机上执行如下命令,设置主机名。
  1.   hostname *.hadoop.com
复制代码

          编辑文件/etc/sysconfig/network如下
  1.   HOSTNAME=*.hadoop.com
复制代码


      (4)修改文件/etc/hosts如下
  1.     192.168.86.10 master.hadoop.com
  2.          192.168.86.11 slave1.hadoop.com
  3.          192.168.86.12 slave2.hadoop.com
  4.          192.168.86.13 slave3.hadoop.com
复制代码


          执行如下命令,将hosts文件复制到集群中每台主机上
  1. scp /etc/hosts 192.168.50.*:/etc/hosts
复制代码


      (5)安装jdk
  1. rpm -ivh jdk-7u67-linux-x64.rpm
复制代码


         创建文件
  1.    echo -e "JAVA_HOME=/usr/java/default\nexport PATH=\$JAVA_HOME/bin:\$PATH" > /etc/profile.d/java-env.sh
  2.     . /etc/profile.d/java-env.sh
复制代码


      (6)关闭iptables
  1.     service iptables stop
复制代码


      (7)关闭selinux。修改文件/etc/selinux/config,然后重启操作系统
  1. SELINUX=disabled
复制代码


    2. 安装 (with YARN)

      (1)在master.hadoop.com主机上执行
  1.      yum install hadoop-yarn-resourcemanager hadoop-mapreduce-historyserver hadoop-yarn-proxyserver hadoop-hdfs-namenode
复制代码


      (2)在所有的slave*.hadoop.com主机上执行
  1. yum install hadoop-yarn-nodemanager hadoop-mapreduce hadoop-hdfs-datanode<span style="line-height: 1.5; background-color: rgb(255, 255, 255);">      </span>
复制代码


    3. 配置。将以下文件修改完毕后,用scp命令复制到集群中的所有主机上

      (1)创建配置文件
  1. cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster
  2. alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
  3. alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
复制代码

      (2)创建必要的本地文件夹
  1. sudo -u hdfs hadoop fs -mkdir -p /tmp && sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
  2. sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn
  3. sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn/staging/history/done_intermediate && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging && sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
  4. sudo -u hdfs hadoop fs -mkdir -p /var
  5. sudo -u hdfs hadoop fs -mkdir -p /var/log && sudo -u hdfs hadoop fs -chmod -R 1775 /var/log && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log
  6. sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chmod -R 1777 /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn/apps
  7. sudo -u hdfs hadoop fs -mkdir -p /user
  8. sudo -u hdfs hadoop fs -mkdir -p /user/history && sudo -u hdfs hadoop fs -chown mapred /user/history
  9. sudo -u hdfs hadoop fs -mkdir -p /user/test && sudo -u hdfs hadoop fs -chmod -R 777 /user/test && sudo -u hdfs hadoop fs -chown test /user/test
  10. sudo -u hdfs hadoop fs -mkdir -p /user/root && sudo -u hdfs hadoop fs -chmod -R 777 /user/root && sudo -u hdfs hadoop fs -chown root /user/root
复制代码

      (3)修改配置文件
        1)core-site.xml
  1.   <property>
  2.      <name>fs.defaultFS</name>
  3.      <value>hdfs://master.hadoop.com:8020</value>
  4.   </property>
  5.   <property>
  6.      <name>fs.trash.interval</name>
  7.      <value>1440</value>
  8.   </property>
  9.   <property>
  10.      <name>fs.trash.checkpoint.interval</name>
  11.      <value>720</value>
  12.   </property>
  13.   <property>
  14.      <name>hadoop.proxyuser.mapred.groups</name>
  15.      <value>*</value>
  16.   </property>
  17.   <property>
  18.      <name>hadoop.proxyuser.mapred.hosts</name>
  19.      <value>*</value>
  20.   </property>
  21.   <property>
  22.      <name>io.compression.codecs</name>
  23.      <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec</value>
  24.   </property>
复制代码

        2)hdfs-site.xml
  1.   <property>
  2.      <name>dfs.permissions.superusergroup</name>
  3.      <value>hadoop</value>
  4.   </property>
  5.   <property>
  6.      <name>dfs.namenode.name.dir</name>
  7.      <value>file:///data/1/dfs/nn</value>
  8.   </property>
  9.   <property>
  10.      <name>dfs.datanode.data.dir</name>
  11.      <value>file:///data/1/dfs/dn,file:///data/2/dfs/dn,file:///data/3/dfs/dn,file:///data/4/dfs/dn</value>
  12.   </property>
  13.   <property>
  14.      <name>dfs.datanode.failed.volumes.tolerated</name>
  15.      <value>3</value>
  16.   </property>
  17.   <property>
  18.      <name>dfs.datanode.fsdataset.volume.choosing.policy</name>
  19.      <value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>
  20.   </property>
  21.   <property>
  22.      <name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold</name>
  23.      <value>10737418240</value>
  24.   </property>
  25.   <property>
  26.      <name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction</name>
  27.      <value>0.75</value>
  28.   </property>
  29.   <property>
  30.      <name>dfs.webhdfs.enabled</name>
  31.      <value>true</value>
  32.   </property>
  33.   <property>
  34.      <name>dfs.webhdfs.user.provider.user.pattern</name>
  35.      <value>^[A-Za-z0-9_][A-Za-z0-9._-]*[$]?[        DISCUZ_CODE_845        ]lt;/value>
  36.   </property>
复制代码


        3)yarn-site.xml
  1.   <property>
  2.     <name>yarn.resourcemanager.hostname</name>
  3.     <value>master.hadoop.com</value>
  4.   </property>
  5.   <property>
  6.     <name>yarn.nodemanager.aux-services</name>
  7.     <value>mapreduce_shuffle</value>
  8.   </property>
  9.   <property>
  10.     <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
  11.     <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  12.   </property>
  13.   <property>
  14.     <name>yarn.log-aggregation-enable</name>
  15.     <value>true</value>
  16.   </property>
  17.   <property>
  18.     <description>List of directories to store localized files in.</description>
  19.     <name>yarn.nodemanager.local-dirs</name>
  20.     <value>/data/1/yarn/local,/data/2/yarn/local,/data/3/yarn/local,/data/4/yarn/local</value>
  21.   </property>
  22.   <property>
  23.     <description>Where to store container logs.</description>
  24.     <name>yarn.nodemanager.log-dirs</name>
  25.     <value>/data/1/yarn/logs,/data/2/yarn/logs,/data/3/yarn/logs,/data/4/yarn/logs</value>
  26.   </property>
  27.   <property>
  28.     <description>Where to aggregate logs to.</description>
  29.     <name>yarn.nodemanager.remote-app-log-dir</name>
  30.     <value>hdfs://master.hadoop.com:8020/var/log/hadoop-yarn/apps</value>
  31.   </property>
  32.   <property>
  33.     <description>Classpath for typical applications.</description>
  34.      <name>yarn.application.classpath</name>
  35.      <value>
  36.         $HADOOP_CONF_DIR,
  37.         $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
  38.         $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
  39.         $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
  40.         $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
  41.      </value>
  42.   </property>
  43.   <property>
  44.     <name>yarn.web-proxy.address</name>
  45.     <value>master.hadoop.com</value>
  46.   </property>
  47.   <property>
  48.     <description>It's not the memory the physical machine totally has, but that allocated to containers</description>
  49.     <name>yarn.nodemanager.resource.memory-mb</name>
  50.     <value>5120</value>
  51.   </property>
  52.   <property>
  53.     <name>yarn.scheduler.minimum-allocation-mb</name>
  54.     <value>512</value>
  55.   </property>
  56.   <property>
  57.     <name>yarn.scheduler.maximum-allocation-mb</name>
  58.     <value>10240</value>
  59.   </property>
  60.   <property>
  61.     <name>yarn.app.mapreduce.am.resource.mb</name>
  62.     <value>512</value>
  63.   </property>
  64.   <property>
  65.     <name>yarn.app.mapreduce.am.command-opts</name>
  66.     <value>-Xmx512m</value>
  67.   </property>
  68.   <property>
  69.     <name>yarn.nodemanager.vmem-pmem-ratio</name>
  70.     <value>2.1</value>
  71.   </property>
  72.   <property>
  73.     <name>yarn.nodemanager.resource.cpu-vcores</name>
  74.     <value>4</value>
  75.   </property>
  76.   <property>
  77.     <name>yarn.scheduler.minimum-allocation-vcores</name>
  78.     <value>1</value>
  79.   </property>
  80.   <property>
  81.     <name>yarn.scheduler.maximum-allocation-vcores</name>
  82.     <value>10</value>
  83.   </property>
  84.   <property>
  85.     <name>yarn.scheduler.increment-allocation-mb</name>
  86.     <value>512</value>
  87.   </property>
  88.   <property>
  89.     <name>yarn.scheduler.increment-allocation-vcores</name>
  90.     <value>1</value>
  91.   </property>
复制代码


        4)mapred-site.xml
  1.   <property>
  2.      <name>mapreduce.framework.name</name>
  3.      <value>yarn</value>
  4.   </property>
  5.   <property>
  6.      <name>mapreduce.jobhistory.address</name>
  7.      <value>master.hadoop.com:10020</value>
  8.   </property>
  9.   <property>
  10.      <name>mapreduce.jobhistory.webapp.address</name>
  11.      <value>master.hadoop.com:19888</value>
  12.   </property>
  13.   <property>
  14.      <name>yarn.app.mapreduce.am.staging-dir</name>
  15.      <value>/user/history</value>
  16.   </property>
  17.   <property>
  18.      <name>mapreduce.jobhistory.intermediate-done-dir</name>
  19.      <value>/user/history/intermediate-done-dir</value>
  20.   </property>
  21.   <property>
  22.      <name>mapreduce.jobhistory.done-dir</name>
  23.      <value>/user/history/done-dir</value>
  24.   </property>
复制代码

      (4)复制配置文件到集群中的所有主机上
  1.   scp /etc/hadoop/conf.my_cluster/*-site.xml  192.168.50.*:/etc/hadoop/conf.my_cluster/
复制代码


     4. 格式化HDFS
  1. sudo -u hdfs hdfs namenode -format
复制代码


     5. 启动HDFS
  1.   for x in `cd /etc/init.d ; ls hadoop-hdfs-*`; do service $x start; done
复制代码


     6. 在HDFS上创建必要的文件夹
  1. sudo -u hdfs hadoop fs -mkdir -p /tmp && sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
  2. sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn
  3. sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn/staging/history/done_intermediate && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging && sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
  4. sudo -u hdfs hadoop fs -mkdir -p /var
  5. sudo -u hdfs hadoop fs -mkdir -p /var/log && sudo -u hdfs hadoop fs -chmod -R 1775 /var/log && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log
  6. sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chmod -R 1777 /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn/apps
  7. sudo -u hdfs hadoop fs -mkdir -p /user
  8. sudo -u hdfs hadoop fs -mkdir -p /user/history && sudo -u hdfs hadoop fs -chown mapred /user/history
  9. sudo -u hdfs hadoop fs -mkdir -p /user/test && sudo -u hdfs hadoop fs -chmod -R 777 /user/test && sudo -u hdfs hadoop fs -chown test /user/test
  10. sudo -u hdfs hadoop fs -mkdir -p /user/root && sudo -u hdfs hadoop fs -chmod -R 777 /user/root && sudo -u hdfs hadoop fs -chown root /user/root
复制代码


     7. 操作YARN
       在集群中每台机器上执行如下命令:
      (1)启动  
  1. service hadoop-yarn-resourcemanager start;service hadoop-mapreduce-historyserver start;service hadoop-yarn-proxyserver start;service hadoop-yarn-nodemanager start
复制代码

      (2)查看  
  1. service hadoop-yarn-resourcemanager status;service hadoop-mapreduce-historyserver status;service hadoop-yarn-proxyserver status;service hadoop-yarn-nodemanager status
复制代码

      (3)停止  
  1. service hadoop-yarn-resourcemanager stop;service hadoop-mapreduce-historyserver stop;service hadoop-yarn-proxyserver stop;service hadoop-yarn-nodemanager stop
复制代码

       (4)重启  
  1. service hadoop-yarn-resourcemanager restart;service hadoop-mapreduce-historyserver restart;service hadoop-yarn-proxyserver restart;service hadoop-yarn-nodemanager restart
复制代码


     8. 安装Hadoop客户端
      (1)安装CentOS 6.5

      (2)以root身份登录,执行以下命令:
  1. rpm -ivh jdk-7u67-linux-x64.rpm
  2. yum install hadoop-client
  3. cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster
  4. alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
  5. alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
  6. scp 192.168.50.10:/etc/hadoop/conf.my_cluster/*-site.xml /etc/hadoop/conf.my_cluster/
  7. scp 192.168.50.10:/etc/hosts /etc/
  8. scp 192.168.50.10:/etc/profile.d/hadoop-env.sh /etc/profile.d/
  9. . /etc/profile
  10. useradd -u 700 -g hadoop test
  11. passwd test <test用户密码>
复制代码


      9. 测试Hadoop with YARN
  1. su - test
  2. #计算Pi
  3. hadoop fs -mkdir input
  4. hadoop fs -put /etc/hadoop/conf/*.xml input
  5. hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount input output
  6. hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 2 100
  7. #执行grep任务
  8. hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output 'dfs[a-z.]+'
  9. hadoop fs -ls output
  10. hadoop fs -cat output/part-r-00000 | head
复制代码




没找到任何评论,期待你打破沉寂

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条