在client向DataNode写入block之前,会与NameNode有一次通信,由NameNode来选择指定数目的DataNode来存放副本。具体的副本选择策略在BlockPlacementPolicy接口中,其子类实现是BlockPlacementPolicyDefault。该类中会有多个chooseTarget()方法重载,但最终调用了下面的方法:
1 /**
2 * This is not part of the public API but is used by the unit tests.
3 */
4 DatanodeDescriptor[] chooseTarget(int numOfReplicas,
5 DatanodeDescriptor writer,
6 List<DatanodeDescriptor> chosenNodes,
7 HashMap<Node, Node> excludedNodes,
8 long blocksize) {
9 //numOfReplicas:要选择的副本个数
10 //clusterMap.getNumOfLeaves():整个集群的DN个数
11 if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) {
12 return new DatanodeDescriptor[0];
13 }
14
15 //excludedNodes:排除的DN(因为有些DN已经被选中,所以不再选择他们)
16 if (excludedNodes == null) {
17 excludedNodes = new HashMap<Node, Node>();
18 }
19
20 int clusterSize = clusterMap.getNumOfLeaves();
21 //总的副本个数=已选择的个数 + 指定的副本个数
22 int totalNumOfReplicas = chosenNodes.size()+numOfReplicas;
23 if (totalNumOfReplicas > clusterSize) { //若总副本个数 > 整个集群的DN个数
24 numOfReplicas -= (totalNumOfReplicas-clusterSize);
25 totalNumOfReplicas = clusterSize;
26 }
27
28 //计算每个一个rack能有多少个DN被选中
29 int maxNodesPerRack =
30 (totalNumOfReplicas-1)/clusterMap.getNumOfRacks()+2;
31
32 List<DatanodeDescriptor> results =
33 new ArrayList<DatanodeDescriptor>(chosenNodes);
34 for (DatanodeDescriptor node:chosenNodes) {
35 // add localMachine and related nodes to excludedNodes
36 addToExcludedNodes(node, excludedNodes);
37 adjustExcludedNodes(excludedNodes, node);
38 }
39
40 //客户端不是DN
41 if (!clusterMap.contains(writer)) {
42 writer=null;
43 }
44
45 boolean avoidStaleNodes = (stats != null && stats
46 .shouldAvoidStaleDataNodesForWrite());
47
48 //选择numOfReplicas个DN,并返回本地DN
49 DatanodeDescriptor localNode = chooseTarget(numOfReplicas, writer,
50 excludedNodes, blocksize, maxNodesPerRack, results, avoidStaleNodes);
51
52 results.removeAll(chosenNodes);
53
54 // sorting nodes to form a pipeline
55 //将选中的DN(result中的元素)组织成pipe
56 return getPipeline((writer==null)?localNode:writer,
57 results.toArray(new DatanodeDescriptor[results.size()]));
58 }
方法含义大概就如注释中写的,不过要注意其中的变量含义。在第48行,又调用chooseTarget()方法来选择指定数目的DN(选中的DN存放在result中),并返回一个DN作为本地DN。下面分析这个方法。
原文:http://www.cnblogs.com/gwgyk/p/4137060.html