Wednesday, January 23, 2013

Disabling BBU Auto Learn with megacli

We are currently facing performance problems with mysql and I remember reading about RAID BBU Learning causing huge write performance drops. So I wanted to check if the RAID controller on our Master database had this configured.

Step#1: Find out what is the brand/model of the RAID controller installed on the server

I can of course ask accounting to pull-up the delivery receipt to find the brand/model but where is the fun in that. Googling led me to the following commands:

sudo lspci | grep -i raid
04:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 04)

sudo lshw -class storage
description: RAID bus controller
       product: MegaRAID SAS 2108 [Liberator]
       vendor: LSI Logic / Symbios Logic
       physical id: 0
       bus info: pci@0000:04:00.0
       logical name: scsi4
       version: 04
       width: 64 bits
       clock: 33MHz
       capabilities: storage pm pciexpress vpd msi msix bus_master cap_list rom
       configuration: driver=megaraid_sas latency=0
       resources: irq:26 ioport:d800(size=256) memory:fae7c000-fae7ffff memory:faec0000-faefffff memory:fae80000-faebffff

Step#2: Install the megacli package to be able to query the raid card for its status

Downloaded the megacli package from the LSI website. But they only provide RPMs so I had to convert them with:

sudo alien -k MegaCli-8.07.06-1.noarch.rpm

Then installed with:

sudo dpkg -i megacli_8.07.06-1_all.deb

To find out where the files got installed do:

sudo dpkg -c megacli_8.07.06-1_all.deb

Step#3: Use megacli to probe for BBU status and information

Running the command results to something unexpected:

./MegaCli64 -adpCount
Controller Count: 0.
Exit Code: 0x00

megacli cant find the raid adapter. It seems that megacli has an issue with kernels >= 3.0 and can be remedied with:

sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -adpCount
Controller Count: 1.
Exit Code: 0x01

So to find out about the BBU:

sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuStatus -aALL

BBU status for Adapter: 0

BatteryType: iBBU
Voltage: 3972 mV
Current: 0 mA
Temperature: 24 C
Battery State: Optimal
BBU Firmware Status:

  Charging Status              : None
  Voltage                                 : OK
  Temperature                             : OK
  Learn Cycle Requested                  : No
  Learn Cycle Active                      : No
  Learn Cycle Status                      : OK
  Learn Cycle Timeout                     : No
  I2c Errors Detected                     : No
  Battery Pack Missing                    : No
  Battery Replacement required            : No
  Remaining Capacity Low                  : No
  Periodic Learn Required                 : No
  Transparent Learn                       : No
  No space to cache offload               : No
  Pack is about to fail & should be replaced : No
  Cache Offload premium feature required  : No
  Module microcode update required        : No

  Fully Discharged        : No
  Fully Charged           : Yes
  Discharging             : Yes
  Initialized             : Yes
  Remaining Time Alarm    : No
  Discharge Terminated    : No
  Over Temperature        : No
  Charging Terminated     : No
  Over Charged            : No
  Relative State of Charge: 97 %
  Charger System State: 49168
  Charger System Ctrl: 0
  Charging current: 0 mA
  Absolute state of charge: 53 %
  Max Error: 2 %

Exit Code: 0x00

sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuCapacityInfo -aALL

BBU Capacity Info for Adapter: 0

  Relative State of Charge: 97 %
  Absolute State of charge: 53 %
  Remaining Capacity: 641 mAh
  Full Charge Capacity: 664 mAh
  Run time to empty: Battery is not being discharged.  
  Average time to empty: Battery is not being discharged.  
  Estimated Time to full recharge: Battery is not being charged.  
  Cycle Count: 37
Max Error = 2 %
Remaining Capacity Alarm = 120 mAh
Remining Time Alarm = 10 Min

Exit Code: 0x00

sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuProperties -aALL

BBU Properties for Adapter: 0
  Auto Learn Period: 30 Days
  Next Learn time: Sun Feb 17 19:37:09 2013
  Learn Delay Interval:0 Hours
  Auto-Learn Mode: Enabled
Exit Code: 0x00

Auto learn mode should be disabled and scheduled during off-peak time.

TMPFILE=$(mktemp -p /tmp bbu.relearn.XXXXXXXXXX) || exit 1
echo "autoLearnMode=1" > $TMPFILE
setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -SetBbuProperties -f $TMPFILE -aALL
sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuProperties -aALL

script for safe write back:

sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp ADRA -Lall -aALL
sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp -Cached -Lall -aALL
sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp DisDskCache -Lall -aALL
sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp NoCachedBadBBU -Lall -aALL
sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp WB -Lall -aALL
sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL

script to force write back without BBU protection:

sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp ADRA -Lall -aALL
sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp -Cached -Lall -aALL
sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp DisDskCache -Lall -aALL
sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp CachedBadBBU -Lall -aALL
sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp WB -Lall -aALL
sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL

Useful Links:

