可能出现该问题的情况:强制关机,数据量过大,集群意外关闭。
使用cloudera搭建hadoop集群,由于使用ubuntu系统,根目录空间分配不足,导致数据录入一部分,集群崩溃,后来对ubuntu系统的根目录进行设置,扩大了根目录的空间,但是zookeeper中一台机器的节点一直无法启动。
错误日志如下:
2015-12-29 15:50:43,900 INFO org.apache.zookeeper.server.persistence.FileSnap: Reading snapshot /var/lib/zookeeper/version-2/snapshot.1300000000
2015-12-29 15:50:43,932 ERROR org.apache.zookeeper.server.persistence.Util: Last transaction was partial.
2015-12-29 15:50:43,932 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: Unable to load database on disk
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:167)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:156)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)
2015-12-29 15:50:43,942 ERROR org.apache.zookeeper.server.quorum.QuorumPeerMain: Unexpected exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:156)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:167)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
... 4 more
从网上搜索到方法进行解决:
cat /etc/zookeeper/conf/zoo.cfg
找到dataDir=/var/lib/zookeeper
切换到路径/var/lib/zookeeper
cd /var/lib/zookeeper
查看目录下的文件:
ls
存在version-2
删除version-2
mv ./version-2 ./version-2.bak
然后在zookeeper的实例中添加角色主机,启动成功。
参考:zookeeper无法启动"Unable to load database on disk"
自己的虚拟机集群,一次强制关机后,发现slave2的zookeeper起不来了。
下午5点29:53.411 INFO org.apache.zookeeper.server.quorum.QuorumPeerConfig
Reading configuration from: /var/run/cloudera-scm-agent/process/517-zookeeper-server/zoo.cfg
下午5点29:53.420 INFO org.apache.zookeeper.server.quorum.QuorumPeerConfig
Defaulting to majority quorums
下午5点29:53.423 INFO org.apache.zookeeper.server.DatadirCleanupManager
autopurge.snapRetainCount set to 5
下午5点29:53.424 INFO org.apache.zookeeper.server.DatadirCleanupManager
autopurge.purgeInterval set to 24
下午5点29:53.430 INFO org.apache.zookeeper.server.DatadirCleanupManager
Purge task started.
下午5点29:53.434 ERROR org.apache.zookeeper.server.DatadirCleanupManager
Error occured while purging.
org.apache.zookeeper.server.persistence.FileTxnSnapLog$DatadirException: Missing data directory /var/lib/zookeeper/version-2, automatic data directory creation is disabled (zookeeper.datadir.autocreate is false). Please create this directory manually.
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:102)
at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
at org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
Removing data from /var/zookeeper/version-2 then restart seems to “fix” the problem (it gets a snapshot from one of the other nodes in the quorum).
This is Zookeeper 3.3.5+19.5-1~squeeze-cdh3, i.e. from Cloudera’s distribution.
看了老外的文章,下面是处理方法:
more /etc/zookeeper/conf.dist/zoo.cfg
找到datadir
[root@slave2 zookeeper]# pwd
/var/lib/zookeeper
[root@slave2 zookeeper]# ls
myid version-2 version-2.bak
清空version-2目录下的所有文件
Ubuntu 14.04安装配置Zookeeper集群:http://www.linuxdiyf.com/linux/14957.html
Ubuntu 14.04安装分布式存储Sheepdog+ZooKeeper:http://www.linuxdiyf.com/linux/10285.html
RHEL自动安装Zookeeper的shell脚本:http://www.linuxdiyf.com/linux/6594.html
zookeeper只能本地访问的问题解决方法:http://www.linuxdiyf.com/linux/14616.html