环境:Ubuntu 64
工具:MegaCli Python 2.7
一、MegaCli在Ubuntu下的安装
可以在https://www.broadcom.com/site-search?q=megacli下载到你需要的MegaCli的zip包。
解压之后得到一个rpm的安装包,对于 Redhat系的Linux系统可以直接通过yum安装,Ubuntu系统需要额外的操作:
rpm2cpio MegaCli-8.07.14-1.noarch.rpm | cpio -dimv
完成之后会在当前目录下多了一个opt的目录。找到里面的可执行文件可以直接执行了。先看下效果:
root@salt-minion2:~#sudo ./opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aAll
Adapter #0
Versions
Product Name : PERC H700 Adapter
Serial No : 1C601NX
FW Package Build: 12.10.1-0001
Mfg. Data
================
Mfg. Date : 12/07/11
Rework Date : 12/07/11
Revision No : A05
Battery FRU : N/A
Image Versions in Flash:
================
BIOS Version : 3.18.00_4.09.05.00_0x0416A000
FW Version : 2.100.03-1062
Preboot CLI Version: 04.04-010:#%00008
Ctrl-R Version : 2.02-0025
NVDATA Version : 2.07.03-0003
Boot Block Version : 2.02.00.00-0000
BOOT Version : 01.250.04.219
Pending Images in Flash
================
None
......
二、采集整理需要的信息
MegaCli 工具的功能有很多,具体可以sudo ./opt/MegaRAID/MegaCli/MegaCli64 -h查看,这里主要用来监控RAID健康状况。主要用到下面两条命令。
MegaCli64 -AdpAllInfo -aAll #查看RAID卡信息
MegaCli64 -PDList -aALL #查看硬盘信息
下面看下具体需要采集的信息:
root@zhi:~# ./opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aAll
Adapter #0
Versions
================
Product Name : PERC H700 Adapter
Serial No : 1C601NX
FW Package Build: 12.10.1-0001
...
Device Present
================
Virtual Drives : 3
Degraded : 0
Offline : 0
Physical Devices : 4
Disks : 3
Critical Disks : 0
Failed Disks : 0
...
root@zhi:~# ./opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL
Adapter #0
Enclosure Device ID: 32
Slot Number: 0
Drives position: DiskGroup: 0, Span: 0, Arm: 0
Enclosure position: N/A
Device Id: 0
WWN: 5000C5001D5F0CA0
Sequence Number: 2
Media Error Count: 0 #代表扇区有问题,坏道等,数值越大越严重。
Other Error Count: 0 #代表磁盘可能有松动
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
主要就是采集上面这些信息了,下面通过Python脚本对文本处理后上报,通过邮件报警或者页面提示都可以。
#! /usr/bin/env python
# -*- coding: utf-8 -*-
# ---------------------------------------------
# Filename:
# Author: zhi
# Mail: 937042882@qq.com
# Version: 0.0.1
# LastChange:
# ---------------------------------------------
import re
import subprocess
def read_raid_info():
try:
allInfo = subprocess.check_output(['/root/MegaRAID/MegaCli/MegaCli64','-AdpAllInfo','-aALL'])
except OSError,e:
return None
return (i for i in allInfo.split('\n\n') if i)
def get_raid_info(info):
detail_info = {}
version = re.compile(r'Versions')
name = re.compile(r'Product Name')
device = re.compile(r'Device Present')
colon = re.compile(r':')
for i in info:
if version.search(i):
for eachLine in i.split('\n'):
if name.search(eachLine):
detail_info['Product Name'] = eachLine.split(':')[1].strip()
if device.search(i):
for eachLine in i.split('\n'):
if colon.search(eachLine):
detail_info[eachLine.split(':')[0].strip()] = eachLine.split(':')[1].strip()
return detail_info
def read_disk_info():
try:
allInfo = subprocess.check_output(['/root/MegaRAID/MegaCli/MegaCli64','-PDList','-aALL'])
except OSError,e:
return None
return (i for i in allInfo.split('\n') if i)
def get_disk_info(info):
media = re.compile(r'Media Error Count')
other = re.compile(r'Other Error Count')
media_count = 0
other_count = 0
for i in info:
if media.match(i):
if i.split(':')[1].strip() != '0':
media_count += 1
if other.match(i):
if i.split(':')[1].strip() != '0':
other_count += 1
detail_info = {'MediaErrorCount':str(media_count),'OtherErrorCount':str(other_count)}
return detail_info
def main():
all_info = {}
raid_info = read_raid_info()
disk_info = read_disk_info()
if raid_info:
all_info.update(get_raid_info(raid_info))
else:
print 'No megacli client.'
return None
if disk_info:
all_info.update(get_raid_info(disk_info))
return all_info
得到结果如下:
{'Failed Disks': '0', 'Physical Devices': '3', 'Disks': '3', 'MediaErrorCount': '1', 'Virtual Drives': '3', 'Product Name': 'PERC H310 Mini', 'Critical Disks': '0', 'Offline': '0', 'OtherErrorCount': '0', 'Degraded': '0'}