模拟namenode崩溃,将name目录的内容全部删除,然后通过secondary namenode恢复namenode。
环境:OS:Centos 6.5 x64 & Soft:Hadoop 1.2.1
1、进入name目录下,删除name目录内容。
[huser@master name]$ pwd
/home/huser/hadoop/tmp/dfs/name
[huser@master name]$ ll
drwxrwxr-x 2 huser huser 4096 4月 16 20:16
current
drwxrwxr-x 2 huser huser 4096 4月 16 17:24 image
-rw-rw-r-- 1 huser
huser 0 4月 16 20:10 in_use.lock
drwxrwxr-x 2 huser huser 4096 4月 16 18:55
previous.checkpoint
[huser@master name]$ rm -R *
[huser@master name]$ ls
2、停止集群,然后重启集群,发现nameNode失败。
[huser@master hadoop-1.2.1]$ bin/stop-all.sh
[huser@master hadoop-1.2.1]$ bin/start-all.sh
[huser@master
hadoop-1.2.1]$ jps
7160 SecondaryNameNode
7229 JobTracker
7369 Jps
3、停止集群格式化namenode。
[huser@master hadoop-1.2.1]$ bin/stop-all.sh
[huser@master hadoop-1.2.1]$ bin/hadoop namenode -format
14/04/16 21:17:39
INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG:
Starting NameNode
STARTUP_MSG: host = master/192.168.1.115
STARTUP_MSG:
args = [-format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152;
compiled by ‘mattf‘ on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java =
1.7.0_51
************************************************************/
Re-format
filesystem in /home/huser/hadoop/tmp/dfs/name ? (Y or N) Y
14/04/16 21:17:42
INFO util.GSet: Computing capacity for map BlocksMap
14/04/16 21:17:42 INFO
util.GSet: VM type = 64-bit
14/04/16 21:17:42 INFO util.GSet: 2.0% max
memory = 1013645312
14/04/16 21:17:42 INFO util.GSet: capacity = 2^21 =
2097152 entries
14/04/16 21:17:42 INFO util.GSet: recommended=2097152,
actual=2097152
14/04/16 21:17:43 INFO namenode.FSNamesystem:
fsOwner=huser
14/04/16 21:17:43 INFO namenode.FSNamesystem:
supergroup=supergroup
14/04/16 21:17:43 INFO namenode.FSNamesystem:
isPermissionEnabled=true
14/04/16 21:17:43 INFO namenode.FSNamesystem:
dfs.block.invalidate.limit=100
14/04/16 21:17:43 INFO namenode.FSNamesystem:
isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
accessTokenLifetime=0 min(s)
14/04/16 21:17:43 INFO namenode.FSEditLog:
dfs.namenode.edits.toleration.length = 0
14/04/16 21:17:43 INFO
namenode.NameNode: Caching file names occuring more than 10 times
14/04/16
21:17:43 INFO common.Storage: Image file
/home/huser/hadoop/tmp/dfs/name/current/fsimage of size 111 bytes saved in 0
seconds.
14/04/16 21:17:43 INFO namenode.FSEditLog: closing edit log:
position=4, editlog=/home/huser/hadoop/tmp/dfs/name/current/edits
14/04/16
21:17:43 INFO namenode.FSEditLog: close success: truncate to 4,
editlog=/home/huser/hadoop/tmp/dfs/name/current/edits
14/04/16 21:17:44 INFO
common.Storage: Storage directory /home/huser/hadoop/tmp/dfs/name has been
successfully formatted.
14/04/16 21:17:44 INFO namenode.NameNode:
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG:
Shutting down NameNode at
master/192.168.1.115
************************************************************/
4、从datanode节点获取namespace的ID。
[huser@master hadoop-1.2.1]$ ssh slave1
[huser@slave1 current]$ pwd
/home/huser/hadoop/tmp/dfs/data/current
[huser@slave1 current]$ ll
-rw-rw-r-- 1 huser huser 49184 4月 16 18:43
blk_-1800088935645150399
-rw-rw-r-- 1 huser huser 395 4月 16 18:43
blk_-1800088935645150399_1013.meta
-rw-rw-r-- 1 huser huser 25 4月 16
18:43 blk_269963827714855400
-rw-rw-r-- 1 huser huser 11 4月 16 18:43
blk_269963827714855400_1014.meta
-rw-rw-r-- 1 huser huser 16353 4月 16 18:43
blk_4611281727215307463
-rw-rw-r-- 1 huser huser 135 4月 16 18:43
blk_4611281727215307463_1015.meta
-rw-rw-r-- 1 huser huser 769 4月 16 19:32
dncp_block_verification.log.curr
-rw-rw-r-- 1 huser huser 158 4月 16 19:51
VERSION
[huser@slave1 current]$ cat VERSION
#Wed Apr 16 19:51:23 CST
2014
namespaceID=589801292
storageID=DS-1065963269-192.168.1.111-50010-1397640950581
cTime=0
storageType=DATA_NODE
layoutVersion=-41
5、修改namenode的VERSION文件中namespaceID。
[huser@slave1 current]$ exit
logout
[huser@master current]$ pwd
/home/huser/hadoop/tmp/dfs/name/current
[huser@master current]$ vi VERSION
#Wed Apr 16 21:17:43 CST
2014
namespaceID=589801292
cTime=0
storageType=NAME_NODE
layoutVersion=-41
6、删除namenode节点下的fsinage文件。
[huser@master current]$ rm fsimage
[huser@master current]$
ll
-rw-rw-r-- 1 huser huser 4 4月 16 21:17 edits
-rw-rw-r-- 1 huser
huser 8 4月 16 21:17 fstime
-rw-rw-r-- 1 huser huser 100 4月 16 21:30
VERSION
7、复制secondarynamenode节点的fsimage文件到namenode节点下。
[huser@master current]$
pwd
/home/huser/hadoop/tmp/dfs/namesecondary/current
[huser@master
current]$ ll
-rw-rw-r-- 1 huser huser 4 4月 16 20:16 edits
-rw-rw-r-- 1
huser huser 2259 4月 16 20:16 fsimage
-rw-rw-r-- 1 huser huser 8 4月 16
20:16 fstime
-rw-rw-r-- 1 huser huser 100 4月 16 20:16 VERSION
[huser@master current]$ cp fsimage /home/huser/hadoop/tmp/dfs/name/current/
[huser@master current]$ cd
/home/huser/hadoop/tmp/dfs/name/current/
[huser@master current]$
ll
-rw-rw-r-- 1 huser huser 4 4月 16 21:17 edits
-rw-rw-r-- 1 huser
huser 2259 4月 16 21:37 fsimage
-rw-rw-r-- 1 huser huser 8 4月 16 21:17
fstime
-rw-rw-r-- 1 huser huser 100 4月 16 21:30 VERSION
8、重启集群并检查运行情况。
[huser@master hadoop-1.2.1]$ jps
7927 SecondaryNameNode
7773
NameNode
8017 JobTracker
8123 Jps
通过secondary namenode恢复崩溃的namenode,布布扣,bubuko.com
通过secondary namenode恢复崩溃的namenode
原文:http://www.cnblogs.com/guarder/p/3703808.html