HP MSA SAN – Basic Troubleshooting – Controller Faulty / Down (ember lights)

 

In this post we are going to look at basic troubleshooting steps a System Administrator can carry out to identify and potentially resolve a faulty storage controller in a HP MSA 2040.

This post is going to focus more on dual controller setups, with both a Controller A and a Controller B. In my case the faulty controller was brought to my attention by the health ember light on the front of the SAN and using the methods displayed in this article. I was able to gather information and ultimately bring the controller back up.

First step is to log into the SMU (Storage Management Utility)

Using the SMU you can use the event logs as a first step to identifying the error/problem

In my case I was getting the error below:

Error: Critical Error Fault Type: NMI p1: 0x037454E, p2: 0x0000000 ….. No Cur Thread

SMU Controller

Overall there wasn’t a lot of information to go off aside from confirmation that a serious error had occurred on Controller A and as a result Controller B had initiated the failover process.

Controller A was still powered on and ping able, plus I could get to it’s SMU. However, when I tried to log in it would either give an unsuccessful login or state that the controller was ‘initializing’.

Get more information using CLI

A good way to gather more information is to telnet or ssh into the working controller’s command line interface (CLI). An easy method to do this is to use a piece of software called Putty.

Just enter the controllers IP and press Open

Putty

When prompted put in your credentials (the one you would use to log into the SMU)

If a putty connection over your network isn’t working, try the below method to connect via USB.

Connecting to MSA CLI via Serial Connecton using USB to Mini USB

-Install driver for the USB connection. (http://h20564.www2.hp.com/hpsc/swd/public/detail?swItemId=MTX_8de10954d645450f9e3c0d015d )

-Check COM port on Device Manager from Windows.

USB device manager

-Connect USB to the Mini USB port of the MSA

-Use Putty, or Hyperterminal.

putty USB

-Hit enter to connect to the MSA.

Useful CLI Commands

Get information about both controllers

show controllers

Show recent controller events (can copy output into notepad)

show events

In my case the show controllers command displayed a health note advising me to restart the problem controller A.

Health Reason: The controller is not Healthy

Recommendation: – Restart the Storage Controller in this controller module, unless it is performing an operation where it is normal for it to be shut down, such as firmware update.

For all CLI commands, you can refer to HP’s CLI Reference Guide –

https://h50146.www5.hpe.com/lib/products/storage/manual/array/723979-001.pdf

Re-Seating the Problem Controller

Ultimately I chose to reseat the problem controller in an attempt to prompt a clean restart. Essentially what this involves is unscrewing the two pins holding the problem controller, followed by pulling it slightly out of the controller slot and putting it back in.

Although doing this caused the controller to show no health issues when re-running the #show controller command on the working Controller (B). I had no connectivity to the problem controller (A).

Progress! At least at this point I didn’t have any errors in my working controller’s (B) SMU and nothing was showing as unhealthy in the CLI.

Restarting the Controller

All had to do at this point was restart the problem controller (A) via CLI. It came back up healthy with network connectivity.

restart mc a

Additional info concerning CLI restarting: Syntax restart sc|mc a|b|both
Parameters sc|mc
The controller to restart:
– sc: Storage Controller
– mc: Management Controller
a|b|both

I could now connect using putty to Controller A, I could login into it’s SMU and the ember lights had cleared from the physical panel on the SAN. Happy days!

**If a simple reseat, restart, or cold reboot of the problem controller still fails to resolve your issue or you want to try and gather more information as to the preceding events leading up to the error. There is the option to generate detailed logs via CLI.

Get more detailed logs via CLI

  • Enter the command in CLI “#show protocols”
    • If FTP is not Enabled –
    • Enter the command: “#set protocols ftp enabled”
    • Exit the telnet session, putty etc.
    • Once FTP is enabled:
      • Go to Start – Command Prompt – enter “cd ../..” (Go to C:/ Drive or the location where you want to save the logs)
      • Type “#ftp ”
      • Then type the user name and password of controller to authenticate.
    • Use the command “#get logs filename.zip”

Once you have your logs, you can use Trace32 to view the logs. (Can use notepad but it’s very difficult to view due to a lack of formatting.

You can download Trace32 here:

Download Microsoft Config Manager tools and choose to only install Common Tools in the wizard.

https://www.microsoft.com/en-us/download/details.aspx?id=9257

configmgr

Tracer32

Hope this was helpful!

 


Thanks for reading – feel free to follow and stay updated 🙂 View sysadminguides’s profile on Facebook View GuidesSysadmin’s profile on Twitter View 115372466162675927272’s profile on Google+

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s