分享

hadoop 三节点集群安装配置详细实例

本帖最后由 gefieder 于 2013-12-20 15:11 编辑

topo节点:
192.168.10.46 Hadoop46
192.168.10.47 Hadoop47
192.168.10.48 Hadoop48
Hadoop的守护进程deamons:NameNode/DataNode 和 JobTracker/TaskTracker。其中NameNode/DataNode工作在HDFS层,JobTracker/TaskTracker工作在MapReduce层。
设备列表中Hadoop48是master,担任namenode和jobtracker,46,47为slave,担任datanode和tasktracker。secondary namenode在hadoop 1.03中被废弃,用checkpoint node或backupnode来代替。暂没有配checkpoint node或backupnode。
untitled.png
在各机器建立用户zhouhh,可选自己喜欢的名称,用于管理hadoop。
网络准备
先对每个节点完成单节点设置,参考次帖子:http://www.aboutyun.com/thread-6143-1-1.html[url=http://abloz.com/2012/05/22/10-minutes-from-scratch-to-build-hadoop-environment-and-test-mapreduce.html]-mapreduce.html[/url]。
http://labs.renren.com/apache-mirror/hadoop/common/下载最新版本hadoop
wget http://labs.renren.com/apache-mi ... hadoop-1.0.3.tar.gz
然后分发到各机器,并在各机器解压,配置,测试单台设备ok。



[zhouhh@Hadoop48 ~]$ cat /etc/redhat-release
CentOS release 5.5 (Final)
[zhouhh@Hadoop48 ~]$ cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.10.46 Hadoop46
192.168.10.47 Hadoop47
192.168.10.48 Hadoop48
[zhouhh@Hadoop48 ~]$ ping Hadoop46
PING Hadoop46 (192.168.10.46) 56(84) bytes of data.
64 bytes from Hadoop46 (192.168.10.46): icmp_seq=1 ttl=64 time=5.25 ms
64 bytes from Hadoop46 (192.168.10.46): icmp_seq=2 ttl=64 time=0.428 ms
— Hadoop46 ping statistics —
2 packets transmitted, 2 received, 0% packet loss, time 1009ms
rtt min/avg/max/mdev = 0.428/2.843/5.259/2.416 ms
[zhouhh@Hadoop48 ~]$ ping Hadoop47
PING Hadoop47 (192.168.10.47) 56(84) bytes of data.
64 bytes from Hadoop47 (192.168.10.47): icmp_seq=1 ttl=64 time=7.08 ms
64 bytes from Hadoop47 (192.168.10.47): icmp_seq=2 ttl=64 time=4.27 ms
— Hadoop47 ping statistics —
2 packets transmitted, 2 received, 0% packet loss, time 1007ms
rtt min/avg/max/mdev = 4.277/5.678/7.080/1.403 ms
[zhouhh@Hadoop48 ~]$ ssh-keygen -t rsa -P “”
[zhouhh@Hadoop48 ~]$ cd .ssh
[zhouhh@Hadoop48 .ssh]$ cat id_rsa.pub >> authorized_keys



由于安全原因,如果各节点的ssh连接不是标准端口,可以配置一个config文件,以方便ssh Hadoop46这样的命令自动连接。
如果是标准端口标准key文件名的话通过hosts的解析就可以用ssh Hadoop46这样的命令自动登录了。
config文件格式:
  1. [zhouhh@Hadoop48 .ssh]$ vi config
  2. <DIV class=blockcode>
  3. <BLOCKQUOTE>[zhouhh@Hadoop48 ~]$ ssh-copy-id -i .ssh/id_rsa zhouhh@Hadoop46
  4. [zhouhh@Hadoop48 ~]$ ssh-copy-id -i .ssh/id_rsa zhouhh@Hadoop47
复制代码
  1. Host Hadoop46
  2. Port 22
  3. HostName 192.168.10.46
  4. IdentityFile ~/.ssh/id_rsa
  5. Host Hadoop47
  6. Port 22
  7. HostName 192.168.10.47
  8. IdentityFile ~/.ssh/id_rsa
  9. Host Hadoop48
  10. Port 22
  11. HostName 192.168.10.48
  12. IdentityFile ~/.ssh/id_rsa
复制代码
  1. 测试用key实现无密码登录,都应该成功:
  2. [zhouhh@Hadoop48 ~]$ ssh Hadoop46
  3. [zhouhh@Hadoop48 ~]$ ssh Hadoop47
  4. [zhouhh@Hadoop48 ~]$ ssh Hadoop48
复制代码
  1. 拷贝私钥:
  2. [zhouhh@Hadoop47 .ssh]$ scp zhouhh@Hadoop48:~/.ssh/id_rsa .
  3. [zhouhh@Hadoop47 .ssh]$ scp zhouhh@Hadoop48:~/.ssh/config .
  4. [zhouhh@Hadoop46 .ssh]$ scp zhouhh@Hadoop48:~/.ssh/id_rsa .
  5. [zhouhh@Hadoop46 .ssh]$ scp zhouhh@Hadoop48:~/.ssh/config .
  6. 至此,完成了互联互通。
  7. <DIV class=blockcode>
  8. <BLOCKQUOTE>unalias fs &> /dev/null
  9. alias fs="hadoop fs"
  10. unalias hls &> /dev/null
  11. alias hls="fs -ls"
复制代码
  1. =================
  2. 下面完成配置
  3. =================
  4. 环境变量:
  5. [zhouhh@Hadoop48 ~]$ vi .bashrc
  6. export HADOOP_HOME=/home/zhouhh/hadoop-1.0.3
  7. export HADOOP_HOME_WARN_SUPPRESS=1
复制代码
  1. export PATH=$PATH:$HADOOP_HOME/bin
复制代码
  1. [zhouhh@Hadoop48 ~]$ source .bashrc
复制代码
  1. [zhouhh@Hadoop48 ~]$ cd hadoop-1.0.3
  2. [zhouhh@Hadoop48 hadoop-1.0.3]$ cd conf
  3. [zhouhh@Hadoop48 conf]$ ls
  4. capacity-scheduler.xml fair-scheduler.xml hdfs-default.xml mapred-queue-acls.xml ssl-client.xml.example
  5. configuration.xsl hadoop-env.sh hdfs-site.xml mapred-site.xml ssl-server.xml.example
  6. core-default.xml hadoop-metrics2.properties log4j.properties masters taskcontroller.cfg
  7. core-site.xml hadoop-policy.xml mapred-default.xml slaves
  8. 其中几个*default.xml文件是我从相应的src中拷贝过来的,用于配置参考。
  9. 配置文件包括环境和配置参数两部分。环境是bin目录下脚本需要的,在hadoop-env.sh 中配置。配置参数在*-site.xml中配置。
复制代码
masters文件和slaves文件,仅方便用同时管理多台设备的启动和停止,也可以用手动方式来启动:
bin/hadoop-daemon.sh start [namenode | secondarynamenode | datanode | jobtracker | tasktracker]
运行bin/start-dfs.sh,表示是该设备是 NameNode,运行bin/start-mapred.sh表示该设备是 JobTracker。NameNode和JobTracker可以是同一台机器,也可以分开。
bin/start-all.sh, stop-all.sh这两个脚本在1.03中被废弃,被bin/start-dfs.sh ,bin/start-mapred.sh和bin/stop-dfs.sh,bin/stop-mapred.sh所替代。
  1. [zhouhh@Hadoop48 conf]$ vi masters
  2. Hadoop48
  3. [zhouhh@Hadoop48 conf]$ vi slaves
  4. Hadoop46
  5. Hadoop47
复制代码
只读配置文件:src/core/core-default.xml, src/hdfs/hdfs-default.xml, src/mapred/mapred-default.xml
可以用于配置参考。
这三个文件用于实际配置:conf/core-site.xml, conf/hdfs-site.xml,conf/mapred-site.xml
另外,可以通过配置conf/hadoop-env.sh来控制bin目录下执行脚本的变量
配置core-site.xml
可以参考手册和src/core/core-default.xml
[zhouhh@Hadoop48 conf]$ vi core-site.xml
  1. <configuration>
  2. <property>
  3. <name>hadoop.mydata.dir</name>
  4. <value>/home/zhouhh/myhadoop</value>
  5. <description>A base for other directories.${user.name} </description>
  6. </property>
  7. <property>
  8. <name>hadoop.tmp.dir</name>
  9. <value>/tmp/hadoop-${user.name}</value>
  10. <description>A base for other temporary directories.</description>
  11. </property>
  12. <property>
  13. <name>fs.default.name</name>
  14. <value>hdfs://Hadoop48:54310</value>
  15. <description>The name of the default file system. A URI whose
  16. scheme and authority determine the FileSystem implementation. The
  17. uri's scheme determines the config property (fs.SCHEME.impl) naming
  18. the FileSystem implementation class. The uri's authority is used to
  19. determine the host, port, etc. for a filesystem.</description>
  20. </property>
  21. </configuration>
复制代码
其中hadoop.mydata.dir 是我自定义的变量,用于作为数据根目录,以后hdfs的dfs.name.dir和dfs.data.dir全配在该分区下面。
这里,config配置文件有几个变量可以用:
${hadoop.home.dir} 和$HADOOP_HOME 一致。${user.name}和用户名一致。
[zhouhh@Hadoop48 conf]$ vi hdfs-site.xml
  1. <configuration>
  2. <property>
  3. <name>hadoop.mydata.dir</name>
  4. <value>/home/zhouhh/myhadoop</value>
  5. <description>A base for other directories.${user.name} </description>
  6. </property>
  7. <property>
  8. <name>hadoop.tmp.dir</name>
  9. <value>/tmp/hadoop-${user.name}</value>
  10. <description>A base for other temporary directories.</description>
  11. </property>
  12. <property>
  13. <name>fs.default.name</name>
  14. <value>hdfs://Hadoop48:54310</value>
  15. <description>The name of the default file system. A URI whose
  16. scheme and authority determine the FileSystem implementation. The
  17. uri's scheme determines the config property (fs.SCHEME.impl) naming
  18. the FileSystem implementation class. The uri's authority is used to
  19. determine the host, port, etc. for a filesystem.</description>
  20. </property>
  21. </configuration>
复制代码
[zhouhh@Hadoop48 conf]$ vi mapred-site.xml
  1. <configuration>
  2. <property>
  3. <name>mapred.job.tracker</name>
  4. <value>Hadoop48:54311</value>
  5. <description>The host and port that the MapReduce job tracker runs
  6. at. If "local", then jobs are run in-process as a single map
  7. and reduce task.
  8. </description>
  9. </property>
  10. <property>
  11. <name>mapred.local.dir</name>
  12. <value>${hadoop.tmp.dir}/mapred/local</value>
  13. <description>The local directory where MapReduce stores intermediate
  14. data files. May be a comma-separated list of
  15. directories on different devices in order to spread disk i/o.
  16. Directories that do not exist are ignored.
  17. </description>
  18. </property>
  19. <property>
  20. <name>mapred.system.dir</name>
  21. <value>${hadoop.mydata.dir}/mapred/system</value>
  22. <description>The directory where MapReduce stores control files.
  23. </description>
  24. </property>
  25. <property>
  26. <name>mapred.tasktracker.map.tasks.maximum</name>
  27. <value>2</value>
  28. <description>The maximum number of map tasks that will be run
  29. simultaneously by a task tracker.vary it depending on your hardware
  30. </description>
  31. </property>
  32. <property>
  33. <name>mapred.tasktracker.reduce.tasks.maximum</name>
  34. <value>2</value>
  35. <description>The maximum number of reduce tasks that will be run
  36. simultaneously by a task tracker.vary it depending on your hardware
  37. </description>
  38. </property>
  39. </configuration>
复制代码
配置可能会随实际情况增减。尤其是有时端口冲突,导致datanode或tasktracker起不来,需求增加相应的配置。参考对应的default配置文件和手册完成。
将配置拷贝到47,46两台机器。
  1. [zhouhh@Hadoop48 hadoop-1.0.3]$ ./bin/hadoop namenode -format
  2. 12/05/23 17:04:42 INFO namenode.NameNode: STARTUP_MSG:
  3. /************************************************************
  4. STARTUP_MSG: Starting NameNode
  5. STARTUP_MSG: host = Hadoop48/192.168.10.48
  6. STARTUP_MSG: args = [-format]
  7. STARTUP_MSG: version = 1.0.3
  8. STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1335192; compiled by ‘hortonfo’ on Tue May 8 20:31:25 UTC 2012
  9. ************************************************************/
  10. 12/05/23 17:04:42 INFO util.GSet: VM type = 64-bit
  11. 12/05/23 17:04:42 INFO util.GSet: 2% max memory = 17.77875 MB
  12. 12/05/23 17:04:42 INFO util.GSet: capacity = 2^21 = 2097152 entries
  13. 12/05/23 17:04:42 INFO util.GSet: recommended=2097152, actual=2097152
  14. 12/05/23 17:04:42 INFO namenode.FSNamesystem: fsOwner=zhouhh
  15. 12/05/23 17:04:42 INFO namenode.FSNamesystem: supergroup=supergroup
  16. 12/05/23 17:04:42 INFO namenode.FSNamesystem: isPermissionEnabled=true
  17. 12/05/23 17:04:42 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
  18. 12/05/23 17:04:42 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
  19. 12/05/23 17:04:42 INFO namenode.NameNode: Caching file names occuring more than 10 times
  20. 12/05/23 17:04:42 INFO common.Storage: Image file of size 112 saved in 0 seconds.
  21. 12/05/23 17:04:42 INFO common.Storage: Storage directory /home/zhouhh/myhadoop/dfs/name has been successfully formatted.
  22. 12/05/23 17:04:42 INFO namenode.NameNode: SHUTDOWN_MSG:
  23. /************************************************************
  24. SHUTDOWN_MSG: Shutting down NameNode at Hadoop48/192.168.10.48
  25. ************************************************************/
复制代码
因为我前面在.bashrc中加了路径和环境变量,因此,也可以直接用
[zhouhh@Hadoop48 hadoop-1.0.3]$ hadoop namenode -format
该命令格式化hdfs-site.xml里面定义的dfs.name.dir路径,用于保存跟踪和协同DataNode的信息。
[zhouhh@Hadoop48 ~]$ find myhadoop/
myhadoop/
myhadoop/dfs
myhadoop/dfs/name
myhadoop/dfs/name/previous.checkpoint
myhadoop/dfs/name/previous.checkpoint/fstime
myhadoop/dfs/name/previous.checkpoint/edits
myhadoop/dfs/name/previous.checkpoint/fsimage
myhadoop/dfs/name/previous.checkpoint/VERSION
myhadoop/dfs/name/image
myhadoop/dfs/name/image/fsimage
myhadoop/dfs/name/current
myhadoop/dfs/name/current/fstime
myhadoop/dfs/name/current/edits
myhadoop/dfs/name/current/fsimage
myhadoop/dfs/name/current/VERSION
[zhouhh@Hadoop48 hadoop-1.0.3]$ start-dfs.sh
starting namenode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-namenode-Hadoop48.out
Hadoop46: Bad owner or permissions on /home/zhouhh/.ssh/config
Hadoop47: Bad owner or permissions on /home/zhouhh/.ssh/config
Hadoop48: Bad owner or permissions on /home/zhouhh/.ssh/config
[zhouhh@Hadoop48 .ssh]$ ls -l
total 20
-rw——- 1 zhouhh zhouhh 794 Apr 13 10:21 authorized_keys
-rw-rw-r– 1 zhouhh zhouhh 288 May 23 10:37 config
原来config文件权限不对
[zhouhh@Hadoop48 .ssh]$ chmod 600 config
[zhouhh@Hadoop48 .ssh]$ ls -l
total 20
-rw——- 1 zhouhh zhouhh 794 Apr 13 10:21 authorized_keys
-rw——- 1 zhouhh zhouhh 288 May 23 10:37 config
[zhouhh@Hadoop48 ~]$ start-dfs.sh
starting namenode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-namenode-Hadoop48.out
Hadoop47: bash: line 0: cd: /home/zhouhh/hadoop-1.0.3/libexec/..: No such file or directory
Hadoop47: bash: /home/zhouhh/hadoop-1.0.3/bin/hadoop-daemon.sh: No such file or directory
Hadoop46: starting datanode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-datanode-Hadoop46.out
Hadoop48: starting secondarynamenode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-secondarynamenode-Hadoop48.out
start-dfs.sh会启动本机NameNode 和 conf/slaves 添加的DataNode
[zhouhh@Hadoop48 ~]$ ssh Hadoop47
Last login: Tue May 22 17:57:01 2012 from hadoop48
[zhouhh@Hadoop47 ~]$
[zhouhh@Hadoop47 hadoop-1.0.3]$ vi conf/hadoop-env.sh
配置$JAVA_HOME为正确的路径。
Hadoop46做同样处理。
[zhouhh@Hadoop48 ~]$ start-dfs.sh
starting namenode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-namenode-Hadoop48.out
Hadoop47: starting datanode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-datanode-Hadoop47.out
Hadoop46: starting datanode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-datanode-Hadoop46.out
Hadoop48: secondarynamenode running as process 23491. Stop it first.
HDFS已经运行成功
排错
[zhouhh@Hadoop47 logs]$ vi hadoop-zhouhh-datanode-Hadoop47.log
2012-05-23 17:17:14,230 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = Hadoop47/192.168.10.47
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.3
STARTUP_MSG: build = https://svn.apache.org/repos/asf ... branches/branch-1.0 -r 1335192; compiled by ‘hortonfo’ on Tue May 8 20:31:25 UTC 2012
************************************************************/
2012-05-23 17:17:14,762 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2012-05-23 17:17:14,772 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2012-05-23 17:17:14,772 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-05-23 17:17:14,772 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2012-05-23 17:17:14,907 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-05-23 17:17:15,064 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2012-05-23 17:17:15,187 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: file:///
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162)
at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:198)
at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:228)
at org.apache.hadoop.hdfs.server.namenode.NameNode.getServiceAddress(NameNode.java:222)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:337)
at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:299)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)
2012-05-23 17:17:15,187 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at Hadoop47/192.168.10.47
************************************************************/
同样,需要配置相关的端口
[zhouhh@Hadoop48 bin]$ start-mapred.sh
[zhouhh@Hadoop48 ~]$ ssh Hadoop46
Last login: Wed May 23 17:33:05 2012 from hadoop47
[zhouhh@Hadoop46 ~]$ cd hadoop-1.0.3/logs
[zhouhh@Hadoop46 logs]$ vi hadoop-zhouhh-datanode-Hadoop46.log
2012-05-23 17:38:46,062 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Hadoop48/192.168.10.48:54310. Already tried 0 time(s).
2012-05-23 17:38:47,065 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Hadoop48/192.168.10.48:54310. Already tried 1 time(s).
[zhouhh@Hadoop46 logs]$ vi hadoop-zhouhh-tasktracker-Hadoop46.log
2012-05-23 17:58:13,356 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 54550: starting
2012-05-23 17:58:14,428 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Hadoop48/192.168.10.48:54311. Already tried 0 time(s).
2012-05-23 17:58:15,430 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Hadoop48/192.168.10.48:54311. Already tried 1 time(s).
[zhouhh@Hadoop48 conf]$ netstat -antp | grep 54310
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.10.48:54310 192.168.20.188:30300 ESTABLISHED 20469/python
[zhouhh@Hadoop48 conf]$ netstat -antp | grep 54311
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.10.48:54311 192.168.20.188:30300 TIME_WAIT -
原来端口被占用了,将相关占用端口python程序停掉。
[zhouhh@Hadoop48 hadoop-1.0.3]$ stop-mapred.sh
[zhouhh@Hadoop48 hadoop-1.0.3]$ stop-dfs.sh
[zhouhh@Hadoop48 hadoop-1.0.3]$ start-dfs.sh
starting namenode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-namenode-Hadoop48.out
Hadoop47: starting datanode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-datanode-Hadoop47.out
Hadoop46: starting datanode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-datanode-Hadoop46.out
Hadoop48: starting secondarynamenode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-secondarynamenode-Hadoop48.out
[zhouhh@Hadoop48 hadoop-1.0.3]$ netstat -antp | grep 54310
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.10.48:54310 0.0.0.0:* LISTEN 24716/java
tcp 0 0 192.168.10.48:51040 192.168.10.48:54310 TIME_WAIT -
tcp 0 0 192.168.10.48:51038 192.168.10.48:54310 TIME_WAIT -
tcp 0 0 192.168.10.48:54310 192.168.10.46:38202 ESTABLISHED 24716/java
[zhouhh@Hadoop48 hadoop-1.0.3]$ start-mapred.sh
starting jobtracker, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-jobtracker-Hadoop48.out
Hadoop46: starting tasktracker, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-tasktracker-Hadoop46.out
Hadoop47: starting tasktracker, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-tasktracker-Hadoop47.out
[zhouhh@Hadoop48 hadoop-1.0.3]$ netstat -antp | grep 54311
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.10.48:54311 0.0.0.0:* LISTEN 25238/java
tcp 0 0 192.168.10.48:54311 192.168.10.46:33561 ESTABLISHED 25238/java
tcp 0 0 192.168.10.48:54311 192.168.10.47:55277 ESTABLISHED 25238/java
查看DataNode的log,已经正常。
[zhouhh@Hadoop48 hadoop-1.0.3]$ jps
24716 NameNode
25625 Jps
25238 JobTracker
24909 SecondaryNameNode
[zhouhh@Hadoop46 ~]$ jps
10649 TaskTracker
10352 DataNode
10912 Jps
==========================
MapReduce 测试
==========================
[zhouhh@Hadoop48 ~]$ vi test.txt
a b c d
a b c d
aa bb cc dd
ee ff gg hh
由前面.bashrc设置,fs为hadoop dfs的别称
hls为 hadoop -ls的别称
[zhouhh@Hadoop48 hadoop-1.0.3]$ fs -put test.txt test.txt
[zhouhh@Hadoop48 hadoop-1.0.3]$ hls
Found 1 items
-rw-r–r– 3 zhouhh supergroup 40 2012-05-23 19:39 /user/zhouhh/test.txt
执行mapreduce测试wordcount例子:
  1. [zhouhh@Hadoop48 hadoop-1.0.3]$ ./bin/hadoop jar hadoop-examples-1.0.3.jar wordcount /user/zhouhh/test.txt output
  2. 12/05/23 19:40:52 INFO input.FileInputFormat: Total input paths to process : 1
  3. 12/05/23 19:40:52 INFO util.NativeCodeLoader: Loaded the native-hadoop library
  4. 12/05/23 19:40:52 WARN snappy.LoadSnappy: Snappy native library not loaded
  5. 12/05/23 19:40:52 INFO mapred.JobClient: Running job: job_201205231824_0001
  6. 12/05/23 19:40:53 INFO mapred.JobClient: map 0% reduce 0%
  7. 12/05/23 19:41:07 INFO mapred.JobClient: map 100% reduce 0%
  8. 12/05/23 19:41:19 INFO mapred.JobClient: map 100% reduce 100%
  9. 12/05/23 19:41:24 INFO mapred.JobClient: Job complete: job_201205231824_0001
  10. 12/05/23 19:41:24 INFO mapred.JobClient: Counters: 29
  11. 12/05/23 19:41:24 INFO mapred.JobClient: Job Counters
  12. 12/05/23 19:41:24 INFO mapred.JobClient: Launched reduce tasks=1
  13. 12/05/23 19:41:24 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=11561
  14. 12/05/23 19:41:24 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  15. 12/05/23 19:41:24 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  16. 12/05/23 19:41:24 INFO mapred.JobClient: Launched map tasks=1
  17. 12/05/23 19:41:24 INFO mapred.JobClient: Data-local map tasks=1
  18. 12/05/23 19:41:24 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9934
  19. 12/05/23 19:41:24 INFO mapred.JobClient: File Output Format Counters
  20. 12/05/23 19:41:24 INFO mapred.JobClient: Bytes Written=56
  21. 12/05/23 19:41:24 INFO mapred.JobClient: FileSystemCounters
  22. 12/05/23 19:41:24 INFO mapred.JobClient: FILE_BYTES_READ=110
  23. 12/05/23 19:41:24 INFO mapred.JobClient: HDFS_BYTES_READ=147
  24. 12/05/23 19:41:24 INFO mapred.JobClient: FILE_BYTES_WRITTEN=43581
  25. 12/05/23 19:41:24 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=56
  26. 12/05/23 19:41:24 INFO mapred.JobClient: File Input Format Counters
  27. 12/05/23 19:41:24 INFO mapred.JobClient: Bytes Read=40
  28. 12/05/23 19:41:24 INFO mapred.JobClient: Map-Reduce Framework
  29. 12/05/23 19:41:24 INFO mapred.JobClient: Map output materialized bytes=110
  30. 12/05/23 19:41:24 INFO mapred.JobClient: Map input records=4
  31. 12/05/23 19:41:24 INFO mapred.JobClient: Reduce shuffle bytes=110
  32. 12/05/23 19:41:24 INFO mapred.JobClient: Spilled Records=24
  33. 12/05/23 19:41:24 INFO mapred.JobClient: Map output bytes=104
  34. 12/05/23 19:41:24 INFO mapred.JobClient: CPU time spent (ms)=1490
  35. 12/05/23 19:41:24 INFO mapred.JobClient: Total committed heap usage (bytes)=194969600
  36. 12/05/23 19:41:24 INFO mapred.JobClient: Combine input records=16
  37. 12/05/23 19:41:24 INFO mapred.JobClient: SPLIT_RAW_BYTES=107
  38. 12/05/23 19:41:24 INFO mapred.JobClient: Reduce input records=12
  39. 12/05/23 19:41:24 INFO mapred.JobClient: Reduce input groups=12
  40. 12/05/23 19:41:24 INFO mapred.JobClient: Combine output records=12
  41. 12/05/23 19:41:24 INFO mapred.JobClient: Physical memory (bytes) snapshot=271958016
  42. 12/05/23 19:41:24 INFO mapred.JobClient: Reduce output records=12
  43. 12/05/23 19:41:24 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1126625280
  44. 12/05/23 19:41:24 INFO mapred.JobClient: Map output records=16
复制代码
可见,效率不高,但成功了。
[zhouhh@Hadoop48 ~]$ hls
Found 2 items
drwxr-xr-x – zhouhh supergroup 0 2012-05-23 19:41 /user/zhouhh/output
-rw-r–r– 3 zhouhh supergroup 40 2012-05-23 19:39 /user/zhouhh/test.txt
hls所列,实际存在于分布式系统中。
[zhouhh@Hadoop48 ~]$ hadoop dfs -get /user/zhouhh/output .
[zhouhh@Hadoop48 ~]$ cat output/*
cat: output/_logs: Is a directory
a 2
aa 1
b 2
bb 1
c 2
cc 1
d 2
dd 1
ee 1
ff 1
gg 1
hh 1
或直接远程查看:
[zhouhh@Hadoop48 ~]$ hadoop dfs -cat output/*
cat: File does not exist: /user/zhouhh/output/_logs
a 2
aa 1

可见,分布式hadoop配置成功。
希望对大家有所帮助








欢迎加入about云群371358502、39327136,云计算爱好者群,亦可关注about云腾讯认证空间||关注本站微信

已有(13)人评论

跳转到指定楼层
兰君云 发表于 2014-3-4 23:13:15
比较详细,谢谢。
回复

使用道具 举报

weijia365 发表于 2014-5-6 17:31:27
顶顶更健康!
回复

使用道具 举报

ascentzhen 发表于 2014-7-18 14:56:29
分布式已经搭建成功,多谢
回复

使用道具 举报

monkey_d 发表于 2014-12-24 13:59:15
学习了,谢谢
回复

使用道具 举报

tang 发表于 2015-3-6 15:06:57
因为我前面在.bashrc中加了路径和环境变量,因此,也可以直接用
原来如此
回复

使用道具 举报

1027420005 发表于 2015-6-28 13:08:11
最新版本的和 老版本的搭建步骤是否一样呢?
回复

使用道具 举报

fdfdggg 发表于 2015-7-7 11:02:17
正在看,谢谢
回复

使用道具 举报

12下一页
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条