From Wiki-UX.info

Wiki-UX / How to correct Ultra320 SCSI Bus Reset EMS errors
Jump to: navigation, search

How to correct Ultra320 SCSI Bus Reset EMS errors

Contents


Description

There is a known issue on the MPT driver QAS feature that appears when using the Ultra320 SCSI controller with external enclosures.

A highly common combination is with RP3440, using the A7173A U320 SCSI interface attached to MSA30 external storage.

The error usually shows up on systems that have been online longer than 400-days without a system reboot.

The following are some error conditions that can be found on a faulty MPT driver system.


EMS error message

The EMS error event message (100091) will appears several times. Here's a detailed error example:

speclive:LIVE $  /opt/resmon/bin/resdata -R 206045256 -r /storage/events/disks/default/0_3_1_1.5.0 -n 206045336 -a 
 
CURRENT MONITOR DATA:
 
Event Time..........: Thu Feb 14 03:43:27 2008
Severity............: MAJORWARNING
Monitor.............: disk_em
Event #.............: 100091              
System..............: cahp.cahpcu.org
 
Summary:
     Disk at hardware path 0/3/1/1.5.0 : Software configuration error
 
 
Description of Error:
 
     The device is in a condition where it requires action on the part of the
     device driver or a human operator. 
 
Probable Cause / Recommended Action:
 
     The device has been reset by a Bus Device Reset message, a hard reset
     condition, or a power-on reset.
 
     If this is the case, no action is necessary.
 
     Alternatively, a removable medium has been loaded or replaced.
 
     If this is the case, no action is necessary.
 
     Alternatively, the mode parameters, microcode, or inquiry data for the
     device have been changed.
 
     If this is the case, no action is necessary.
 
     Alternatively, the installed version of the device driver does not match
     that of the installed version of HP-UX. Install the correct version of the
     driver.
 
Additional Event Data: 
     System IP Address...: 192.168.165.100
     Event Id............: 0x47b4295f00000000
     Monitor Version.....: B.01.01
     Event Class.........: I/O
     Client Configuration File...........:
     /var/stm/config/tools/monitor/default_disk_em.clcfg 
     Client Configuration File Version...: A.01.00 
          Qualification criteria met.
               Number of events..: 1 
     Associated OS error log entry id(s): 
          0x47b4295f00000000
     Additional System Data:
          System Model Number.............: 9000/800/rp3440#1 
          OS Version......................: B.11.11 
          STM Version.....................: A.49.00 
          EMS Version.....................: A.04.20 
     Latest information on this event:
          http://docs.hp.com/hpux/content/hardware/ems/scsi.htm#100091
 
v-v-v-v-v-v-v-v-v-v-v-v-v    D  E  T  A  I  L  S    v-v-v-v-v-v-v-v-v-v-v-v-v
 
 
 
Component Data: 
     Physical Device Path...: 0/3/1/1.5.0
     Device Class...........: Disk
     Inquiry Vendor ID......: COMPAQ  
     Inquiry Product ID.....: BF0368A4CA      
     Firmware Version.......: HPB5
     Serial Number..........: 3KQ2AQCD00009731L7LT
 
Product/Device Identification Information:
 
     Logger ID.........: sdisk
     Product Identifier: SCSI Disk
     Product Qualifier.: COMPAQBF0368A4CA
     SCSI Target ID....: 0x05
     SCSI LUN..........: 0x00
 
I/O Log Event Data: 
 
     Driver Status Code..................: 0x0000000B 
     Length of Logged Hardware Status....: 22 bytes. 
     Offset to Logged Manager Information: 24 bytes. 
     Length of Logged Manager Information: 34 bytes. 
 
Hardware Status: 
 
     Raw H/W Status:
          0x0000: 00 00 00 02   70 00 06 00   00 00 00 0A   00 00 00 00
          0x0010: 29 02 02 00   00 00
 
     SCSI Status...: CHECK CONDITION (0x02) 
          Indicates that a contingent allegiance condition has occurred.  Any
          error, exception, or abnormal condition that causes sense data to be
          set will produce the CHECK CONDITION status.
 
SCSI Sense Data: 
 
     Undecoded Sense Data:
          0x0000: 70 00 06 00   00 00 00 0A   00 00 00 00   29 02 02 00
          0x0010: 00 00
 
     SCSI Sense Data Fields:
          Error Code                      : 0x70
          Segment Number                  : 0x00
          Bit Fields:      
               Filemark                   : 0
               End-of-Medium              : 0
               Incorrect Length Indicator : 0
          Sense Key                       : 0x06
          Information Field Valid         : FALSE               
          Information Field               : 0x00000000
          Additional Sense Length         : 10
          Command Specific                : 0x00000000
          Additional Sense Code           : 0x29
          Additional Sense Qualifier      : 0x02
          Field Replaceable Unit          : 0x02
          Sense Key Specific Data Valid   : FALSE               
          Sense Key Specific Data         : 0x00 0x00 0x00
 
          Sense Key 0x06, UNIT ATTENTION, indicates that the target has been
          reset by a BUS DEVICE RESET message, a hard reset condition, or by a
          power-on reset. If not a reset, then one of the following may have
          occurred.
             1. A removable medium may have been changed.
             2. The mode parameters in effect for this initiator have been
             changed by another initiator.
             3. The version or level of microcode has been changed.
             4. Tagged commands queued for this initiator were cleared by
             another initiator.
             5. INQUIRY data has been changed.
             6. The mode parameters in effect for this initiator have been
             restored from non-volatile memory.
             7. A change in the condition of a synchronized spindle.
             8. Any other event that requires the attention of the initiator.
 
          The combination of Additional Sense Code and Sense Qualifier (0x2902)
          indicates: SCSI bus reset occurred.
 
SCSI Command Data Block:
 
     Command Data Block Contents:
          0x0000: 2A 00 00 DE   98 10 00 00   10 00
 
     Command Data Block Fields (10-byte fmt):
          Command Operation Code...(0x2A)..: WRITE
          Logical Unit Number..............: 0
          DPO Bit..........................: 0
          FUA Bit..........................: 0
          Relative Address Bit.............: 0
          Logical Block Address............: 14587920 (0x00DE9810)
          Transfer Length..................: 16 (0x0010)
 
Manager-Specific Data Fields: 
     Request ID.............: 0x07669FBC 
     Data Residue...........: 0x00002000 
     CDB status.............: 0x00000002 
     Sense Status...........: 0x00000000 
     Bus ID.................: 0x07 
     Target ID..............: 0x05 
     LUN ID.................: 0x00 
     Sense Data Length......: 0x12 
     Q Tag..................: 0xF9 
     Retry Count............: 0


Syslog error message

This event although called a Software configuration error is in fact reporting a SCSI Bus Reset.

Jun 18 23:08:55 cahp vmunix: SCSI Ultra320 0/3/1/1 instance 7:       IO Type : SCSI IO has timed-out.       Target ID: 0, LUN ID: 0.       Write Command - CDB: 2a 00 00 50 7d 98 00 00 08 00 
Jun 18 23:09:01 cahp vmunix: SCSI Ultra320 0/3/1/0 instance 6:       An IO timeout condition was detected.       Condition cleared, no intervention required. 
Jun 18 23:09:05 cahp vmunix: SCSI Ultra320 0/3/1/0 instance 6:       Controller reset has been successfully       completed. 
Jun 18 23:09:05 cahp vmunix: SCSI Ultra320 0/3/1/0 instance 6:       The driver is now online 
Jun 18 23:09:05 cahp vmunix: SCSI Ultra320 0/3/1/0 instance 6:       Driver initiating SCSI bus reset.       Condition cleared, no intervention required. 
Jun 18 23:09:06 cahp vmunix: SCSI Ultra320 0/3/1/1 instance 7:       The driver is now online 
Jun 18 23:09:06 cahp vmunix: SCSI Ultra320 0/3/1/1 instance 7:       Driver initiating SCSI bus reset.

This other condition can be found in the syslog.log file.

Aug 15 10:15:00 thusnw2 syslogd: restart
Aug 15 10:15:00 thusnw2 vmunix: Timeout called with negative time.
Aug 15 10:15:00 thusnw2 vmunix: function == 0x8647F8, arg == 0x41EC1C00, ticks == 0xF7E6D0FD,  flags == 0x0
Aug 15 10:15:00 thusnw2 vmunix: function == 0x8647F8, arg == 0x41EC1C00, ticks == 0xF7E6D0FC,  flags == 0x0
Aug 15 10:15:00 thusnw2 vmunix: function == 0x8647F8, arg == 0x41EC1C00, ticks == 0xF7E6D0FB,  flags == 0x0
Aug 15 10:15:00 thusnw2 vmunix: function == 0x8647F8, arg == 0x41EC1C00, ticks == 0xF7E6D0FA,  flags == 0x0
Aug 15 10:15:00 thusnw2 vmunix: function == 0x8647F8, arg == 0x41EC1C00, ticks == 0xF7E6D0F9,  flags == 0x0
Aug 15 10:15:00 thusnw2 vmunix: function == 0x8647F8, arg == 0x41EC1C00, ticks == 0xF7E6D0F8,  flags == 0x0
Aug 15 10:15:00 thusnw2 vmunix: function == 0x8647F8, arg == 0x41EC1C00, ticks == 0xF7E6CCD6,  flags == 0x0
Aug 15 10:15:00 thusnw2 vmunix: function == 0x8647F8, arg == 0x41EC1C00, ticks == 0xF7E6CCD5,  flags == 0x0
Aug 15 10:15:00 thusnw2 vmunix: function == 0x8647F8, arg == 0x41EC1C00, ticks == 0xF7E6CCD4,  flags == 0x0


Solution

Install the latest MPT driver. It can be downloaded from software.hp.com. Enter the SCSI controller part number in the Search box. For example: "A7173A".

Or check the HP-UX Ultra320 SCSI MPT Driver Release Notes: http://docs.hp.com/en/J6373-90015/index.html

Release notes for the mpt driver version B.11.11.0701. state: SR 8606472711: Enhanced I/O handling has been added to the Ultra320 SCSI mpt driver which will prevent SCSI I/O's from timing out prematurely on systems which have been online longer than 400-days without a system reboot.

The current driver can be downloaded from:

Select the appropiate driver for your OS specific version.


Reference

Special Thanks To

Thanks to John McAlexander from HP Services’ Global Competency Center (GCC) - Storage for his additional insight on the situation.


Authors

This page was last modified on 26 April 2010, at 12:52. This page has been accessed 4,911 times.