立即注册 登录
About云-梭伦科技 返回首页

ljlinux2012的个人空间 https://www.aboutyun.com/?56035 [收藏] [复制] [分享] [RSS]

日志

hadoop和spark集群搭建说明

已有 920 次阅读2017-3-1 13:19 |个人分类:hadoop


一、准备工作
1.修改Linux主机名
2.修改IP
3.修改主机名和IP的映射关系
4.关闭防火墙
5.ssh免登陆
6.安装JDK,配置环境变量等
二、集群规划
主机名  IP     安装的软件         运行的进程
spark001 192.168.198.201  jdk、hadoop 、scala 、zookeeper 、spark  QuorumPeerMain、NameNode、DFSZKFailoverController、DataNode、NodeManager、JournalNode
spark002 192.168.198.202  jdk、hadoop 、scala 、zookeeper 、spark  QuorumPeerMain、NameNode、DFSZKFailoverController、DataNode、NodeManager、JournalNode
spark003 192.168.198.203  jdk、hadoop 、scala 、zookeeper 、spark  QuorumPeerMain、ResourceManager、DataNode、NodeManager、JournalNode
三、安装步骤:
 1.安装配置zooekeeper集群
  1.1解压
   tar -zxvf zookeeper-3.4.5.tar.gz -C /home/hadoop/soft
  1.2修改配置
   cd /home/hadoop/soft/zookeeper-3.4.5-cdh5.0.0/conf
   cp zoo_sample.cfg zoo.cfg
   vim zoo.cfg
   修改:dataDir=/home/hadoop/soft/zookeeper-3.4.5-cdh5.0.0/data
   在最后添加:
   server.1=spark001:2888:3888
   server.2=spark002:2888:3888
   server.3=spark003:2888:3888
   保存退出
   然后创建一个data文件夹
   mkdir /home/hadoop/soft/zookeeper-3.4.5-cdh5.0.0/data
   再创建一个空文件
   touch /home/hadoop/soft/zookeeper-3.4.5-cdh5.0.0/data/myid
   最后向该文件写入ID
   echo 1 > /home/hadoop/soft/zookeeper-3.4.5-cdh5.0.0/data/myid
  1.3将配置好的zookeeper拷贝到其他节点
   scp -r /home/hadoop/soft/zookeeper-3.4.5-cdh5.0.0/ spark002:/home/hadoop/soft/
   scp -r /home/hadoop/soft/zookeeper-3.4.5-cdh5.0.0/ spark003:/home/hadoop/soft/
   
   注意:修改spark002、spark003对应/home/hadoop/soft/zookeeper-3.4.5-cdh5.0.0/data/myid内容
   spark002:
    echo 2 > /home/hadoop/soft/zookeeper-3.4.5-cdh5.0.0/data/myid
   spark003:
    echo 3 > /home/hadoop/soft/zookeeper-3.4.5-cdh5.0.0/data/myid
  1.4配置zookeeper环境变量
   vim /etc/profile
   export ZOOKEEPER=/home/hadoop/soft/zookeeper-3.4.5-cdh5.0.0
   export PATH=$ZOOKEEPER/bin:$SCALA_HOME/bin:$JAVA_HOME/bin:$PATH
   source /etc/profile
  1.4启动zookeeper服务
   zkServer.sh start
  1.5查看zookeeper服务状态
   zkServer.sh status
 
 2.安装配置hadoop集群
  2.1解压
   tar -zxvf hadoop-2.2.0.tar.gz -C /home/hadoop/soft/
  2.2配置HDFS(hadoop2.0所有的配置文件都在$HADOOP_HOME/etc/hadoop目录下)
   将hadoop添加到环境变量中
   vim /etc/profile
   export HADOOP_HOME=/home/hadoop/soft/hadoop-2.3.0-cdh5.0.0
   export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
   source /etc/profile
   
   cd /home/hadoop/soft/hadoop-2.3.0-cdh5.0.0
   2.2.1修改hadoo-env.sh
    export JAVA_HOME=/home/hadoop/soft/jdk1.8.0_112
    
   2.2.2修改core-site.xml
    <configuration>
     <!-- 指定hdfs的nameservice为ns1 -->
     <property>
      <name>fs.defaultFS</name>
      <value>hdfs://ns1</value>
     </property>
     <!-- 指定hadoop临时目录 -->
     <property>
      <name>hadoop.tmp.dir</name>
      <value>/home/hadoop/soft/hadoop-2.3.0-cdh5.0.0/tmp</value>
     </property>
     <!-- 指定zookeeper地址 -->
     <property>
      <name>ha.zookeeper.quorum</name>
      <value>spark001:2181,spark002:2181,spark003:2181</value>
     </property>
    </configuration>
    
   2.2.3修改hdfs-site.xml
    <configuration>
     <!--指定DataNode存储block的副本数量。默认值是3个 -->
     <property>
      <name>dfs.replication</name>
      <value>2</value>
     </property>
     <!--指定hdfs的nameservice为ns1,需要和core-site.xml中的保持一致 -->
     <property>
      <name>dfs.nameservices</name>
      <value>ns1</value>
     </property>
     <!-- ns1下面有两个NameNode,分别是nn1,nn2 -->
     <property>
      <name>dfs.ha.namenodes.ns1</name>
      <value>nn1,nn2</value>
     </property>
     <!-- nn1的RPC通信地址 -->
     <property>
      <name>dfs.namenode.rpc-address.ns1.nn1</name>
      <value>spark001:9000</value>
     </property>
     <!-- nn1的http通信地址 -->
     <property>
      <name>dfs.namenode.http-address.ns1.nn1</name>
      <value>spark001:50070</value>
     </property>
     <!-- nn2的RPC通信地址 -->
     <property>
      <name>dfs.namenode.rpc-address.ns1.nn2</name>
      <value>spark002:9000</value>
     </property>
     <!-- nn2的http通信地址 -->
     <property>
      <name>dfs.namenode.http-address.ns1.nn2</name>
      <value>spark002:50070</value>
     </property>
     <!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
     <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://spark001:8485;spark002:8485;spark003:8485/ns1</value>
     </property>
     <!-- 指定JournalNode在本地磁盘存放数据的位置 -->
     <property>
      <name>dfs.journalnode.edits.dir</name>
      <value>/home/hadoop/soft/hadoop-2.3.0-cdh5.0.0/journal</value>
     </property>
     <!-- 开启NameNode失败自动切换 -->
     <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
     </property>
     <!-- 配置失败自动切换实现方式 -->
     <property>
      <name>dfs.client.failover.proxy.provider.ns1</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
     </property>
     <!-- 配置隔离机制 -->
     <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
     </property>
     <!-- 使用隔离机制时需要ssh免登陆 -->
     <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/root/.ssh/id_rsa</value>
     </property>
     <property>
      <name>dfs.permissions</name>
      <value>false</value>   
     </property>
    </configuration>
    
   2.2.4修改slaves
    spark001
    spark002
    spark003
   
  2.3配置YARN
   2.3.1修改yarn-site.xml
    <configuration>
     <!-- 指定resourcemanager地址 -->
     <property>
      <name>yarn.resourcemanager.hostname</name>
      <value>spark003</value>
     </property>
     <!-- 指定nodemanager启动时加载server的方式为shuffle server -->
     <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
     </property>
    </configuration>
   2.3.2修改mapred-site.xml
    <configuration>
     <!-- 指定mr框架为yarn方式 -->
     <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
     </property>
    </configuration>
  
  2.4将配置好的hadoop拷贝到其他节点
   scp -r /home/hadoop/soft/hadoop-2.3.0-cdh5.0.0/ spark002:/home/hadoop/soft/
   scp -r /home/hadoop/soft/hadoop-2.3.0-cdh5.0.0/ spark003:/home/hadoop/soft/
   
  2.5启动journalnode(在spark001、spark002、spark003上启动journalnode)
   hadoop-daemons.sh start journalnode
   (运行jps命令检验,多了JournalNode进程)
  
  2.6格式化ZK(在spark001上执行即可)
   hdfs zkfc -formatZK
   
  2.7格式化HDFS
   在spark001上执行命令:
   hadoop namenode -format
   格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件,这里我配置的是/home/hadoop/soft/hadoop-2.3.0-cdh5.0.0/tmp,然后将/home/hadoop/soft/hadoop-2.3.0-cdh5.0.0/tmp拷贝到spark002的/home/hadoop/soft/hadoop-2.3.0-cdh5.0.0/下。
   scp -r tmp/ spark002:/home/hadoop/soft/hadoop-2.3.0-cdh5.0.0/
  
  2.8启动HDFS(在spark001上执行)
   sbin/start-dfs.sh
    
  2.9启动YARN(在spark003上执行)(ResourceManager在哪台机器上就在哪台机器上执行命令)
   sbin/start-yarn.sh
   
  http://192.168.198.201:50070/
  
 3.安装spark集群
  3.1配置spark环境变量
   vim /etc/profile
   export SPARK_HOME=/home/hadoop/soft/spark-1.6.0-bin-hadoop2.3
   export PATH=$SPARK_HOME/bin:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
   source /etc/profile
   
  3.2配置spark-env.sh,其中添加以下配置信息
   export JAVA_HOME=/home/hadoop/soft/jdk1.8.0_112
   export SCALA_HOME=/home/hadoop/soft/scala-2.10.4
   export HADOOP_HOME=/home/hadoop/soft/hadoop-2.3.0-cdh5.0.0
   export HADOOP_CONF_DIR=/home/hadoop/soft/hadoop-2.3.0-cdh5.0.0/etc/hadoop
   export SPARK_MASTER_IP=spark003
   export SPARK_WORKER_MEMORY=1g
   export SPARK_EXECUTOR_MEMORY=1g
   export SPARK_DRIVER_MEMORY=1g
   export SPARK_WORKER_CORES=1
   
  3.3配置slaves
   cp slaves.template slaves
   编辑其内容为:
   spark001
   spark002
   spark003
   
  3.4配置spark-defaults.conf
   spark.executor.extraJavaOptions   -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
   spark.eventLog.enabled            true
   spark.eventLog.dir                hdfs://spark001:9000/historyserverforSpark
   spark.yarn.historyServer.address  spark001:18080
   spark.history.fs.logDirectory  hdfs://spark001:9000/historyserverforSpark
   
  3.5将配置好的spark拷贝到其他节点
   scp -r /home/hadoop/soft/spark-1.6.0-bin-hadoop2.3/ spark002:/home/hadoop/soft/
   scp -r /home/hadoop/soft/spark-1.6.0-bin-hadoop2.3/ spark003:/home/hadoop/soft/
   
  3.6在hadoop上创建historyserverforSpark文件夹
   hadoop fs -mkdir /historyserverforSpark
   
  3.7启动spark003上spark
   sbin/start-all.sh
   
   http://192.168.198.203:8080/
   
  3.8测试
   spark-submit  --master spark://spark003:7077 --class org.apache.spark.examples.SparkPi --name Spark-Pi /home/hadoop/soft/spark-1.6.0-bin-hadoop2.3/lib/spark-examples-1.6.0-hadoop2.3.0.jar 1000
   
   spark-shell --master spark://spark003:7077
   
   sc.textFile("/zookeeper.out").flatMap(_.split(" ")).map(word => (word, 1)).reduceByKey(_+_).map(pair => (pair._2, pair._1)).sortByKey(false, 1).map(pair => (pair._2, pair._1)).saveAsTextFile("/dt_spark_clicked1")

路过

雷人

握手

鲜花

鸡蛋

发表评论 评论 (1 个评论)

回复 ljlinux2012 2017-3-1 17:32
顶顶顶顶顶顶

facelist doodle 涂鸦板

您需要登录后才可以评论 登录 | 立即注册

关闭

推荐上一条 /2 下一条