环境:RHEL6.5+11.2.0.4 RAC,两节点
问题描述:故意把OLR删掉,重启后发现GI无法启动。
分析过程:
1.确认GI启动到了哪一个阶段
[grid@rac1 ~]$ crsctl status resource -t -init
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Status failed, or completed with errors.
解析:发现连OHASD都没有启动,两种可能:1是init.ohasd脚本没有被调用 2是ohasd.bin守护进程没有启动成功,那么:
[grid@rac1 ~]$ ps -ef | grep ohas |grep -v grep
root 960 1 0 09:23 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
发现,脚本被调用了,但是守护进程没有成功启动。
2.查看ohasd的日志
2016-04-18 12:26:25.918: [ default][1661986592] OHASD Daemon Starting. Command string :restart
2016-04-18 12:26:25.919: [ default][1661986592] Initializing OLR
2016-04-18 12:26:25.919: [ OCROSD][1661986592]utopen:6m': failed in stat OCR file/disk /u01/app/11.2.0.1/grid/cdata/rac1.olr, errno=2, os err string=No such file or directory
2016-04-18 12:26:25.919: [ OCROSD][1661986592]utopen:7: failed to open any OCR file/disk, errno=2, os err string=No such file or directory
2016-04-18 12:26:25.919: [ OCRRAW][1661986592]proprinit: Could not open raw device
2016-04-18 12:26:25.919: [ OCRAPI][1661986592]a_init:16!: Backend init unsuccessful : [26]
2016-04-18 12:26:25.920: [ CRSOCR][1661986592] OCR context init failure. Error: PROCL-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
2016-04-18 12:26:25.920: [ default][1661986592] Created alert : (:OHAS00106:) : OLR initialization failed, error: PROCL-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
2016-04-18 12:26:25.920: [ default][1661986592][PANIC] OHASD exiting; Could not init OLR
2016-04-18 12:26:25.920: [ default][1661986592] Done.
解析:看报错是OLR打不开,那就过去看看存在不(手动删的,怎么可能存在)
[grid@rac1 cdata]$ ll
total 12
drwxrwxr-x 2 grid oinstall 4096 Apr 18 07:51 liming-cluster
drwxr-xr-x 2 grid oinstall 4096 Apr 18 07:49 localhost
drwxr-xr-x 2 grid oinstall 4096 Apr 18 08:11 rac1
OLR不存在了。
3.查看OLR的备份是否存在
[grid@rac1 rac1]$ ll
total 6644
-rw------- 1 root root 6803456 Apr 18 08:11 backup_20160418_081108.olr
可以的。
4.恢复OLR
[root@rac1 bin]# ./ocrconfig -local -restore /u01/app/11.2.0.1/grid/cdata/rac1/backup_20160418_081108.olr
PROTL-35: The configured OLR location is not accessible.
书中没写的步骤来了!
[grid@rac1 cdata]$ touch rac1.olr
[root@rac1 bin]# ./ocrconfig -local -restore /u01/app/11.2.0.1/grid/cdata/rac1/backup_20160418_081108.olr
[root@rac1 bin]#
[grid@rac1 cdata]$ ll
total 6660
drwxrwxr-x 2 grid oinstall 4096 Apr 18 07:51 liming-cluster
drwxr-xr-x 2 grid oinstall 4096 Apr 18 07:49 localhost
drwxr-xr-x 2 grid oinstall 4096 Apr 18 08:11 rac1
-rw-r--r-- 1 grid oinstall 272756736 Apr 18 13:02 rac1.olr
5.启动GI,恢复正常
[root@rac1 bin]# ./crsctl start crs