4.8 Basic troubleshooting tips for Fibre Channel (FC) SAN issues

HomeStorage Area Networking4.8 Basic troubleshooting tips for Fibre Channel (FC) SAN issues

There are many areas where the errors can be made and you might experience lots of issues with the mis-configuration settings. A thorough and deep understanding of the SAN configuration is needed to troubleshoot any storage related issues. Slight differences can make a huge data loss and could make the organisation collapse. To troubleshoot any kind of situation, follow these tips as a starting step before the advanced troubleshooting. There might be other tools to troubleshoot the issues but these are basic first steps which might help you save the time.

1) Always take backup of Switch Configurations

Regular backup of switch configurations needs to be done just in regular intervals just in case if you are unable to troubleshoot the issue and needs to revert back to the previous configuration. Such backup files tend to be human-readable flat files that are extremely useful if you need to compare a broken configuration image to a previously known working configuration. Another option might be to create a new zone configuration each time you make a change, and maintain previous versions that can be rolled back to if there are problems after committing the change.

2) Troubleshooting Connectivity Issues

Many of the day-to-day issues that you see are connectivity issues such as hosts not being able to see a new LUN or not being able to see storage or tape devices on the SAN. Connectivity issues will be due to misconfigured zoning. Each vendor provides different tools to configure and troubleshoot zoning, but the following common CLI commands can prove very helpful.

fcping

fcping is an FC version of the popular IP ping tool. fcping allows you to test the following:

Whether a device (N_Port) is alive and responding to FC frames

End-to-end connectivity between two N_Ports

Latency

Zoning between two devices

fcping is available on most switch platforms as well as being a CLI tool for most operating systems and some HBAs. It works by sending Extended Link Service (ELS) echo request frames to a destination, and the destination responding with ELS echo response frames. For example

# fcping 50:01:43:80:05:6c:22:ae

fctrace

Another tool that is modeled on a popular IP networking tool is the fctrace tool. This tool traces a route/path to an N_Port. The following command shows an fctrace command example

# fctrace fcid 0xef0010 vsan 1

3) Things to check while troubleshooting Zoning

Are your aliases correct ?

If using port zoning, have your switch domain IDs changed ?

If using WWPN zoning, have any of the HBA/WWPNs been changed ?

Is your zone in the active zone set?

4) Rescan the SCSI Bus if required

After making zoning changes, LUN masking changes or any other work that changes a LUN/volume presentation to a host, you may be required to rescan the SCSI bus on that host in order to detect the new device. The following command shows how to rescan the SCSI bus on a Windows server using the diskpart tool

DISKPART> list disk

DISKPART> rescan

If you know that your LUN masking and zoning are correct but the server still does not see the device, it may be necessary to reboot the host.

5) Understanding Switch Configuration Dumps

Each switch vendor also tends to have a built-in command/script that is used to gather configs and logs to be sent to the vendor for their tech support groups to analyze. The output of these commands/scripts can also be useful to you as a storage administrator. Each vendor has its own version of these commands/scripts

Cisco – show tech-support

Brocade – supportshow or supportsave

QLogic – create support

6) Use Port Error Counters

Switch-based port error counters are an excellent way to identify physical connectivity issues such as

Bad cables (bent, kinked, or otherwise damaged cables)

Bad connectors (dust on the connectors, loose connectors)

The following example shows the error counters for a physical switch port on a switch:

admin> portshow 4/15

These port counters can sometimes be misleading. It is perfectly normal to see high counts

against some of the values, and it is common to see values increase when a server is rebooted and when similar changes occur. If you are not sure what to look for, check your switch documentation, but also compare the counters to some of your known good ports.

If some counters are increasing on a given port that you are concerned with, but they are not increasing on some known good ports, then you know that you have a problem on that port.

Other commands show similar error counters as well as port throughput. The following porterrshow command shows some encoding out (enc out) and class 3 discard (disc c3) errors on port 0. This may indicate a bad cable, a bad port, or another hardware problem

admin> porterrshow

Go To >> Index Page

Follow Us

Basic & Fundamentals

Recent Posts

Most Read

4.8 Basic troubleshooting tips for Fibre Channel (FC) SAN issues

You might also like to read

LEAVE A REPLY Cancel reply

About us

Cloud Computing

Architectures

Subscribe