分享

理解Hadoop HDFS写文件原理

本帖最后由 xioaxu790 于 2014-9-24 18:48 编辑
问题导读

1、做一个HDFS写文件的测试,需要准备什么环境?
2、如何对DataNode分析,有哪些节点?
3、Client发起写文件的请求流程是什么?




这里做一个测试HDFS写文件的测试
  1. NN : 192.168.1.1
  2. DN1 : 192.168.1.2
  3. DN2 : 192.168.1.3
  4. DN3 : 192.168.1.4
  5. Client : 192.168.1.1
  6. $ll read.txt
  7. -rw-rw-r-- 1 hadoop hadoop 12 Apr  3 11:48 read.txt
复制代码


NameNode分析
看看hadoop的namenode的日志
  1. 2014-04-03 14:24:50,338 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/hadoop/read.txt._COPYING_. BP-398901529-192.168.1.1-1393416650594 blk_3945775701777059462_15982{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[192.168.1.2:50010|RBW], ReplicaUnderConstruction[192.168.1.3:50010|RBW], ReplicaUnderConstruction[192.168.1.4:50010|RBW]]}
  2. 2014-04-03 14:24:50,473 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.1.4:50010 is added to blk_3945775701777059462_15982{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[192.168.1.2:50010|RBW], ReplicaUnderConstruction[192.168.1.3:50010|RBW], ReplicaUnderConstruction[192.168.1.4:50010|RBW]]} size 0
  3. 2014-04-03 14:24:50,474 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.1.3:50010 is added to blk_3945775701777059462_15982{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[192.168.1.2:50010|RBW], ReplicaUnderConstruction[192.168.1.3:50010|RBW], ReplicaUnderConstruction[192.168.1.4:50010|RBW]]} size 0
  4. 2014-04-03 14:24:50,476 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.1.2:50010 is added to blk_3945775701777059462_15982{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[192.168.1.2:50010|RBW], ReplicaUnderConstruction[192.168.1.3:50010|RBW], ReplicaUnderConstruction[192.168.1.4:50010|RBW]]} size 0
  5. 2014-04-03 14:24:50,477 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /user/hadoop/read.txt._COPYING_ is closed by DFSClient_NONMAPREDUCE_1320389024_1
复制代码

我在hadoop集群里只上线了3台几点,replication的也是等于3,就是一个机器一个块。


DataNode分析
分别在192.168.1.2(DN1),192.168.1.3(DN2),192.168.1.4(DN3)上抓包

DN1 : 192.168.1.2
三次握手(DN1和Client建立连接)
  1. 14:24:50.367036 IP 192.168.1.1.53561 > 192.168.1.2.50010: S 1235675786:1235675786(0) win 14600
  2. 14:24:50.367142 IP 192.168.1.2.50010 > 192.168.1.1.53561: S 3430371344:3430371344(0) ack 1235675787 win 14480
  3. 14:24:50.367183 IP 192.168.1.1.53561 > 192.168.1.2.50010: . ack 1 win 29
复制代码



DN1和client通信(Client开始发第一个包,seq=439,ack=440)
  1. 14:24:50.448286 IP 192.168.1.1.53561 > 192.168.1.2.50010: P 1:440(439) ack 1 win 29
  2. 14:24:50.448336 IP 192.168.1.2.50010 > 192.168.1.1.53561: . ack 440 win 31
复制代码



DN1和DN2通信(DN1和DN2建立连接,三次握手)
  1. 14:24:50.449765 IP 192.168.1.2.60024 > 192.168.1.3.50010: S 753790100:753790100(0) win 14600
  2. 14:24:50.449978 IP 192.168.1.3.50010 > 192.168.1.2.60024: S 839637351:839637351(0) ack 753790101 win 14480
  3. 14:24:50.450051 IP 192.168.1.2.60024 > 192.168.1.3.50010: . ack 1 win 29
复制代码



DN1和DN2通信(DN1把第一个包发给DN2,收到确认)
  1. 14:24:50.450304 IP 192.168.1.2.60024 > 192.168.1.3.50010: P 1:319(318) ack 1 win 29
  2. 14:24:50.450437 IP 192.168.1.3.50010 > 192.168.1.2.60024: . ack 319 win 31
  3. 14:24:50.455004 IP 192.168.1.3.50010 > 192.168.1.2.60024: P 1:6(5) ack 319 win 31
  4. 14:24:50.455020 IP 192.168.1.2.60024 > 192.168.1.3.50010: . ack 6 win 29
复制代码



DN1和Client通信(仔细看下面的ack是440,到这里才是对第一个包的确认,代表三个DN都完成第一个包处理)
  1. 14:24:50.455225 IP 192.168.1.2.50010 > 192.168.1.1.53561: P 1:6(5) ack 440 win 31
  2. 14:24:50.455384 IP 192.168.1.1.53561 > 192.168.1.2.50010: . ack 6 win 29
复制代码


Client开始发第二个包,seq=440,ack=487
  1. 14:24:50.464315 IP 192.168.1.1.53561 > 192.168.1.2.50010: P 440:487(47) ack 6 win 29
复制代码


DN1和DN2通信(DN1把第二个包转发给DN2)
  1. 14:24:50.464508 IP 192.168.1.2.60024 > 192.168.1.3.50010: P 319:366(47) ack 6 win 29
  2. 14:24:50.467019 IP 192.168.1.3.50010 > 192.168.1.2.60024: P 6:17(11) ack 366 win 31
复制代码


DN1和Client通信(确认第二个包,代表3个DN完成第二个包处理)
  1. 14:24:50.467885 IP 192.168.1.2.50010 > 192.168.1.1.53561: P 6:20(14) ack 487 win 31
复制代码


Client开始发第三个包
  1. 14:24:50.471012 IP 192.168.1.1.53561 > 192.168.1.2.50010: P 487:518(31) ack 20 win 29
复制代码


DN1和DN2通信(DN1把第三个包发送给DN2)
  1. 14:24:50.471167 IP 192.168.1.2.60024 > 192.168.1.3.50010: P 366:397(31) ack 17 win 29
  2. 14:24:50.474400 IP 192.168.1.3.50010 > 192.168.1.2.60024: P 17:29(12) ack 397 win 31
  3. 14:24:50.474786 IP 192.168.1.3.50010 > 192.168.1.2.60024: F 29:29(0) ack 397 win 31
复制代码


DN1和Client通信(DN1告诉Client已经写完)
  1. 14:24:50.475349 IP 192.168.1.2.50010 > 192.168.1.1.53561: P 20:34(14) ack 518 win 31
  2. ....
复制代码


DN1和DN2,Client断开连接,看时间是14:24:50.476223
  1. 14:24:50.475771 IP 192.168.1.1.53561 > 192.168.1.2.50010: F 518:518(0) ack 34 win 29
  2. 14:24:50.476047 IP 192.168.1.2.60024 > 192.168.1.3.50010: F 397:397(0) ack 30 win 29
  3. 14:24:50.476081 IP 192.168.1.2.50010 > 192.168.1.1.53561: F 34:34(0) ack 519 win 31
  4. 14:24:50.476186 IP 192.168.1.1.53561 > 192.168.1.2.50010: . ack 35 win 29
  5. 14:24:50.476223 IP 192.168.1.3.50010 > 192.168.1.2.60024: . ack 398 win 31
复制代码


在看看DN1上的日志
  1. 2014-04-03 14:24:50,448 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-398901529-192.168.1.1-1393416650594:blk_3945775701777059462_15982 src: /192.168.1.1:53561 dest: /192.168.1.2:50010
  2. 2014-04-03 14:24:50,475 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.1.1:53561, dest: /192.168.1.2:50010, bytes: 12, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_1320389024_1, offset: 0, srvID: DS-1250979778-192.168.1.2-50010-1393417978787, blockid: BP-398901529-192.168.1.1-1393416650594:blk_3945775701777059462_15982, duration: 19058333
  3. 2014-04-03 14:24:50,475 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-398901529-192.168.1.1-1393416650594:blk_3945775701777059462_15982, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
复制代码



DN2 : 192.168.1.3
三次握手 起始时间是14:24:50.449860
  1. 14:24:50.449860 IP 192.168.1.2.60024 > 192.168.1.3.50010: S 753790100:753790100(0) win 14600
  2. 14:24:50.449953 IP 192.168.1.3.50010 > 192.168.1.2.60024: S 839637351:839637351(0) ack 753790101 win 14480
  3. 14:24:50.449981 IP 192.168.1.2.60024 > 192.168.1.3.50010: . ack 1 win 29               
复制代码
        

开始发包,192.168.1.2 和 192.168.1.3之间TCP的传输
  1. 14:24:50.450292 IP 192.168.1.2.60024 > 192.168.1.3.50010: P 1:319(318) ack 1 win 29
  2. 14:24:50.450308 IP 192.168.1.3.50010 > 192.168.1.2.60024: . ack 319 win 31
  3. 14:24:50.451944 IP 192.168.1.3.36534 > 192.168.1.4.50010: S 2811947842:2811947842(0) win 14600
  4. ...
  5. ...
复制代码


断开连接,看时间是14:24:50.476039
  1. 14:24:50.474584 IP 192.168.1.3.36534 > 192.168.1.4.50010: F 243:243(0) ack 21 win 29
  2. 14:24:50.474631 IP 192.168.1.3.50010 > 192.168.1.2.60024: F 29:29(0) ack 397 win 31
  3. 14:24:50.474798 IP 192.168.1.4.50010 > 192.168.1.3.36534: . ack 244 win 31
  4. 14:24:50.476009 IP 192.168.1.2.60024 > 192.168.1.3.50010: F 397:397(0) ack 30 win 29
  5. 14:24:50.476039 IP 192.168.1.3.50010 > 192.168.1.2.60024: . ack 398 win 31
复制代码


下面的DN2的日志
  1. 2014-04-03 14:24:50,451 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-398901529-192.168.1.1-1393416650594:blk_3945775701777059462_15982 src: /192.168.1.2:60024 dest: /192.168.1.3:50010
  2. 2014-04-03 14:24:50,474 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.1.2:60024, dest: /192.168.1.3:50010, bytes: 12, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_1320389024_1, offset: 0, srvID: DS-136573777-192.168.1.3-50010-1393417978720, blockid: BP-398901529-192.168.1.1-1393416650594:blk_3945775701777059462_15982, duration: 18129080
  3. 2014-04-03 14:24:50,474 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-398901529-192.168.1.1-1393416650594:blk_3945775701777059462_15982, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
复制代码



DN3 : 192.168.1.4
三次握手 起始时间是14:24:50.452130
  1. 14:24:50.452130 IP 192.168.1.3.36534 > 192.168.1.4.50010: S 2811947842:2811947842(0) win 14600
  2. 14:24:50.452426 IP 192.168.1.4.50010 > 192.168.1.3.36534: S 2051537423:2051537423(0) ack 2811947843 win 14480
  3. 14:24:50.452224 IP 192.168.1.3.36534 > 192.168.1.4.50010: . ack 1 win 29
复制代码


开始发包,192.168.1.3 和 192.168.1.4之间TCP的传输
  1. 14:24:50.452552 IP 192.168.1.3.36534 > 192.168.1.4.50010: P 1:165(164) ack 1 win 29
  2. 14:24:50.452575 IP 192.168.1.4.50010 > 192.168.1.3.36534: . ack 165 win 31
  3. 14:24:50.454682 IP 192.168.1.4.50010 > 192.168.1.3.36534: P 1:6(5) ack 165 win 31
  4. 14:24:50.454864 IP 192.168.1.3.36534 > 192.168.1.4.50010: . ack 6 win 29
  5. 14:24:50.464797 IP 192.168.1.3.36534 > 192.168.1.4.50010: P 165:212(47) ack 6 win 29
  6. 14:24:50.466173 IP 192.168.1.4.50010 > 192.168.1.3.36534: P 6:13(7) ack 212 win 31
  7. 14:24:50.471383 IP 192.168.1.3.36534 > 192.168.1.4.50010: P 212:243(31) ack 13 win 29
  8. 14:24:50.473232 IP 192.168.1.4.50010 > 192.168.1.3.36534: P 13:20(7) ack 243 win 31
复制代码


断开连接,看时间是14:24:50.474757
  1. 14:24:50.473901 IP 192.168.1.4.50010 > 192.168.1.3.36534: F 20:20(0) ack 243 win 31                               ....
  2. 14:24:50.474728 IP 192.168.1.3.36534 > 192.168.1.4.50010: F 243:243(0) ack 21 win 29                              ..>.
  3. 14:24:50.474757 IP 192.168.1.4.50010 > 192.168.1.3.36534: . ack 244 win 31
复制代码


下面是DN3的日志
  1. 2014-04-03 14:24:50,453 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-398901529-192.168.1.1-1393416650594:blk_3945775701777059462_15982 src: /192.168.1.3:36534 dest: /192.168.1.4:50010
  2. 2014-04-03 14:24:50,472 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.1.3:36534, dest: /192.168.1.4:50010, bytes: 12, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_1320389024_1, offset: 0, srvID: DS-2002629359-192.168.1.4-50010-1393417979543, blockid: BP-398901529-192.168.1.1-1393416650594:blk_3945775701777059462_15982, duration: 16898199
  3. 2014-04-03 14:24:50,473 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-398901529-192.168.1.1-1393416650594:blk_3945775701777059462_15982, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
复制代码



总结
1.png



Client发起写的文件的请求,先把本地要写的文件分割成一个一个块,每写一个块之前向NN申请,告诉NN“我有一个Block1要写”,NN就会返回给Client可以写的DN列表,这里是3个(具体的决策由NN调度),Client收到DN列表后,就开始向第一个DN建立连接,然后发送一个package(Block1的切片),里面的实现有一个发送队列的,每个packages在这个队列里面。现在Client发送了package1的同时,DN1会和DN2建立TCP连接,DN2会和DN3建立连接,可以看上面的时间。DN1收到第一个package1后,会把package1转发到DN2,DN2转发到DN3。DN3接收完成给DN2回复,DN2给DN1回复,然后DN1回复给Client。Client端接收到ack,把packages1从发送队列里移除,然后开始发送packages2。直到一个Blocks发送完,然后发送第二个Block,按照上面的步骤走。当所有的Block都发送完成,Client就会告诉NN,全部发送成功,然后NN就把这个文件的信息正式写入到NameSpace里面,可以上面NN日志的最后一条。





已有(2)人评论

跳转到指定楼层
GreenArrow 发表于 2014-9-24 21:32:11
有深度哦,谢谢
回复

使用道具 举报

Riordon 发表于 2014-9-29 12:38:54
很详细,清晰, 谢谢
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条