From Wiki-UX.info

Wiki-UX / Logical Volume Manager / How to replace a LVM mirror boot disk
Jump to: navigation, search

How to replace a LVM mirror boot disk

This document addresses the considerations and steps necessary to replace a failed LVM disk that form part of a MirrorDisk/UX mirror, with special attention to LVM boot disks.

Abstract

This document addresses the considerations and steps necessary to replace a failed LVM disk that form part of a MirrorDisk/UX mirror, with special attention to LVM boot disks. The document evaluates the different scenarios that you may encounter and considerations to help you to select the best and most secure replacing strategy of the failed disk(s).

Contents

Background

Replacing a mirror disk under LVM is a critical procedure. MirrorDisk/UX is used to protect critical data volumes, like the boot devices of a HP-UX 11i or other high sensitive data. There exist two common scenarios where this operation is performed.

  1. The mirror disk(s) is(are) inside the system or in an enclosure that cannot be extracted on a live system. If this is the case, you must shutdown the system, replace the failed disk and in case where are replacing a mirror boot disk, boot from the remaining disk. You can repair the mirror after booting the system.
  2. The mirror disk(s) is(are) in a hot swappable bay or an external array, that allows the disk or lun to be removed live, but only if the system bus (scsi/fc/sas) is not sending I/O requests to the disk(s). To be sure to comply with this requirement, is necessary to disable the usage of the disk. To perform this task, you may use the OLR (Online Replacement) feature, disable the disk from the volume group, or completly disable the volume group.

Completely disabling volume group vg00 is not an option since this volume group contains several critical filesystems (/, /var, /usr).

Using Hot-Swappable Disks

The hot-swap feature provides the ability to remove or add an inactive hard disk drive module to a system while electrical power is on the system and the SCSI bus is still active (but not sending I/O requests to the disk that need to be exchanged). In other words, you can replace or remove a hot-swappable disk from a system without turning off the power of the entire system or/and the external enclosure.

Consult your system hardware manuals for information about which disk in your system are hot-swappable. Specifications for other hard disks are available in their installation manuals at http://docs.hp.com.

Verify the required patches

LVM (Online Replacement feature) OLR is delivered in two patches for HP-UX 11.11 an 11.23. One patch for the kernel and another for the pvchange command. These patch introduce the "–a" flag on the "pvchange" command to disable I/O to specific physical volumes.

Required patches

The original patches where for HP-UX 11i version 1 and for HP-UX 11i version 2. Following are the recommend patch releases for both platforms.

HP-UX 11.11

Minimal:

  • s700_800 11.11 LVM Cumulative Patch; LVM OLR; SLVM 16 Node
    PHKL_31216 [1]
  • s700_800 11.11 LVM commands cumulative patch; LVM OLR
    PHCO_30698 [2]

Recommended:

  • s700_800 11.11 LVM Cumulative Patch; LVM OLR; SLVM 16 Node
    PHKL_35970[3]
  • s700_800 11.11 LVM commands cumulative patch
    PHCO_35955 [4]

HP-UX 11.23

Minimal:

  • s700_800 11.23 LVM Cumulative Patch
    PHKL_32095 [5]
  • s700_800 11.23 LVM commands patch
    PHCO_31709 [6]

Recommended:

  • s700_800 11.23 LVM Cumulative Patch
    PHKL_36745 [7]
  • s700_800 11.23 LVM commands patch
    PHCO_36744 [8]

Notes:

  • Apply these patches and any required dependencies.
  • These patches, as with any patch, may be superseded.
  • Please check for the latest patches at HP's IT Resource Center (ITRC) at itrc.hp.com

Test installed patches

The following script can be used to test the availability of the OLR features on a specific HP-UX 11i instance:

#!/usr/bin/sh
# \
# Copyright (c) 2007 by Hewlett-Packard
# All rights reserved.
# 2007.07.18, Marin, Alejandro
# last update: 2008.08.24
# alejandro.marin-badilla@hp.com

cat <<EOF
This script test the availability of the Online Replacement feature under
HP-UX 11i. Even if the system has the minimal patches required to work
is strongly recommended to update to the latest superseeding patches
when time is available.
Always verify the latest patch information availble at itrc.hp.com.
EOF

case `uname -r` in
'B.11.11')
   echo "HP-UX 11i v1 current LVM/OLR patches"
   swlist -l patch | grep -e "LVM Cumulative Patch" -e "LVM commands cumulative patch"
   ;;
'B.11.23')
   echo "HP-UX 11i v2 current LVM/OLR patches"
   swlist -l patch | grep -e "LVM Cumulative Patch" -e "LVM commands patch"
   ;;
'B.11.31')
   echo "HP-UX 11i v3"
   echo "This version of HP-UX includes the necessary software on the base release."
   ;;
'B.11.00')
   echo "HP-UX 11.0"
   echo "This version of HP-UX don’t support OLR feature."
   echo "Use alternative procedures to disable I/O to the disk or schedule downtime"
   echo "to replace the failed disk and boot with quorum restriction disable."
   echo "ISL> hpux –lq"
   ;;

*)
   echo "This version of HP-UX is not supported by the script!"
esac
exit 0

Collect current volume group configuration

Prior to use any replacement procedure, is good policy to create detailed volume group state report. The report will assist you to detect special volume group configuration. The following script creates a detailed volume group / logical volume configuration report.

vgdisplay -v /dev/<vgname> > /tmp/vgstatus.<vgname>

for lv in `vgdisplay -v /dev/<vgname> | awk '$0 ~ /LV Name/ {print $3}'`
do
lvdisplay -v $lv | head -n 30
echo
done >> /tmp/vgstatus.<vgname>

Be alert to the following points:

  • Physical Volumes (PV) on the volume group configure as alternate paths.
  • Mirrored logical volume distribution (number of mirror).
  • Logical volumes sizes vs physical volumes sizes.

Disable I/O requests to the physical volume

Using "vgchange" command

Sometimes, even if the OLR feature is not enable on the system, is still possible to disable the I/O requests to the disk that need to be exchanged, allowing you to exchange the drive.

You may also found situations when the failed disk has been hot replaced, without causing the system to hang, but leaving the volume group in a state where the “vgcfgrestore” command cannot be performed because the kernel still believes is an active part of the volume group.

Turning off the system and booting without quorum restrictions correct this situation, but require a maintenance window that may be unpractical. The basic concept is to make the kernel aware that the physical volume that forms part of the volume group conglomerate has failed.

If the disk is completely down (HARDWARE UNCLAIMED, “diskinfo” cannot query the disk or the PVRA/VGRA is damage), you can try to re enable the active volume group allowing the kernel to realize that the disk is not longer available.

# vgchange –a y –q n /dev/vgXX.

The command will query the physical volumes and report those physical volumes that cannot be reattached to the volume group. That left the physical volume in a state were it can be hot replaced or “vgcfgrestored”. After that, you follow the normal restore procedure.

If the vgcfgrestore –n /dev/vgXX /dev/rdsk/c#t#d# give an error message informing that the disk is still enable on the volume group, you will need to reboot the system, replace the disk and boot with the quorum restriction disable.

Reducing the logical volume mirrors

Is has been a common practice, before the appearance of the OLR feature in 11i v1, to reduce the logical volumes from the failed disk and reduce the physical volume from the volume group to assure that no I/O requests are going to the failed disk.

This procedure has several drawbacks:

  1. The disk should be accessible and the PVRA/VGRA should be working.
  2. This approach usually produces more harm than benefits. It’s not uncommon that the system hangs during these tasks.
  3. The procedure is prune to errors. You need to lvreduce the logical volume(s) mirror(s), vgreduce the logical volume group, replace the physical volume, pvcreate the physical volume, vgextend the volume group and lvextend the logical volume(s) mirror(s). That is particulary complex on Integrity systems when disk should be partitioned with the idisk command.

The best practice is to preserve the system logical volume mirror structure and try any of the procedures describe on this document. Never reduce the logical mirrors or remove the disk from the volume group if the OLR feature is available or if the disk report heavy damages with diskinfo or ioscan commands.

Be aware, nonetheless, that on HP-UX 11.0 and below, this is the only way to disable I/O requests to the disks. Be aware than trying this method may hang up the system, and you will have to boot into single user mode / maintenance mode without quorum restriction disable to recover back the system to a proper state.

SAS Controllers (Serial Attach SCSI)

Serial Attach SCSI controllers adds another layer of complexity to LVM mirror disk replacement. Every SAS attached disk create a new disk instance on the system. To acomplish the disk replacement, is necesary to "redirect" the the hardware path to the old disk instance.

Legacy Device Special Files

1. Determinate the current state of the legacy special device special files an the status of the SAS controller.

# ioscan -fnC disk
Class     I  H/W Path      Driver         S/W State   H/W Type     Description
===============================================================================
disk      3  0/4/1/0.0.0.0.0  sdisk            NO_HW       DEVICE       HP      DG072A9BB7
                          /dev/dsk/c0t0d0     /dev/rdsk/c0t0d0
disk      2  0/4/1/0.0.0.1.0  sdisk            CLAIMED     DEVICE       HP      DG072A9BB7
                          /dev/dsk/c0t1d0     /dev/rdsk/c0t1d0
disk      6  0/4/1/0.0.0.2.0  sdisk            CLAIMED     DEVICE       HP      DG072A9BB7
                          /dev/dsk/c0t2d0   /dev/rdsk/c0t2d0
disk      8  0/4/1/0.0.0.3.0  sdisk            CLAIMED     DEVICE       HP      DG072A9BB7
                          /dev/dsk/c0t3d0   /dev/rdsk/c0t3d0
disk     10  0/4/1/0.0.0.4.0  sdisk            CLAIMED     DEVICE       HP      DG072A9BB7
                          /dev/dsk/c0t4d0   /dev/rdsk/c0t4d0


# sasmgr get_info -D /dev/sasd0 -q lun=all -q lun_locate
/dev/rdsk/c0t1d0          0/4/1/0.0.0.1.0           1     8     OFF 
/dev/rdsk/c0t2d0          0/4/1/0.0.0.2.0           1     4     OFF 
/dev/rdsk/c0t3d0          0/4/1/0.0.0.3.0           1     3     OFF 
/dev/rdsk/c0t4d0          0/4/1/0.0.0.4.0           1     7     OFF

The 0/4/1/0.0.0.0.0 hardware path correspond to the failed drive. The new dsf /dev/rdsk/c0t4d0 has been created after installing the new drive.

2. Redirect the new dsf to the original dsf, so the SAS controller identified the new disk with the previous address.

# sasmgr replace_tgt -D /dev/sasd0 -q old_dev=/dev/dsk/c0t0d0  -q new_tgt_hwpath=0/4/1/0.0.0.4.0

Persistent Device Special files (HP-UX 11.31)

The approach is similar that with legacy device special files on HP-UX 11.23 / 11.31. The may difference is that the usage of persistent device special files (Agile View) needs that you use the "io_redirect_dsf" command instead of "sasmgr". The example uses an Integrity system that also display the dsf for the different EFI partitions.

1. Determinate the current state of the legacy special device special files an the status of the SAS controller.

# ioscan -N -fnC disk
Class     I  H/W Path  Driver         S/W State   H/W Type     Description
===========================================================================
disk      3  64000/0xfa00/0x1  esdisk           CLAIMED     DEVICE       HP      DG072A9BB7
                      /dev/disk/disk3      /dev/disk/disk3_p2   /dev/rdisk/disk3     /dev/rdisk/disk3_p2
                      /dev/disk/disk3_p1   /dev/disk/disk3_p3   /dev/rdisk/disk3_p1  /dev/rdisk/disk3_p3
disk      4  64000/0xfa00/0x2  esdisk           NO_HW       DEVICE       HP      DG072A9BB7
                      /dev/disk/disk4      /dev/disk/disk4_p2   /dev/rdisk/disk4     /dev/rdisk/disk4_p2
                      /dev/disk/disk4_p1   /dev/disk/disk4_p3   /dev/rdisk/disk4_p1  /dev/rdisk/disk4_p3
disk      5  64000/0xfa00/0x3  esdisk           CLAIMED     DEVICE       HP      DG072A9BB7
                      /dev/disk/disk5      dev/rdisk/disk5
disk      7  64000/0xfa00/0x8  esdisk           CLAIMED     DEVICE       HP      DG072A9BB7
                      /dev/disk/disk7   /dev/rdisk/disk7
disk      9  64000/0xfa00/0x9  esdisk           CLAIMED     DEVICE       HP      DG072A9BB7
                      /dev/disk/disk9   /dev/rdisk/disk9
disk     11  64000/0xfa00/0xa  esdisk           CLAIMED     DEVICE       HP      DG072A9BB7
                      /dev/disk/disk11   /dev/rdisk/disk11

2. Redirect the new dsf (/dev/disk/disk5) to the previous dsf allowing the SAS to replace one disk with the other.

# io_redirect_dsf -d /dev/disk/disk4 -n /dev/disk/disk5

Replacement procedures

HP 9000 (PA-RISC) - Required reboot

1. Shutdown the system.

# shutdown –hy 0

2. Replace the damaged disk.

Is the damage disk one of the boot disk mirrors? No, jump to step 6.

3. Boot up the system again.

4. Interrupt the PDC boot sequence.

[Escape]

5. Boot from the good mirror disk, with quorum disabled.

SEA IPL
Choose the boot disk:
P#
Interact with IPL? yes
ISL> hpux –lq

6. Rescan the hardware and create the new device special files:

# ioscan –C disk
# insf –C disk

7. Restore the LVM reserved areas (PVRA/VGRA):

# vgcfgrestore –n vgXX /dev/rdsk/cXtXdX

Is the damage disk one of the boot disk mirrors? No, jump to step 10.

8. Repopulate the LIF area:

# mkboot /dev/rdsk/cXtXdX

9. Change the AUTO file contents choosing the best police for the boot path:

A) Primary boot disk.

# mkboot –a “hpux” /dev/rdsk/cXtXdX

B) Alternate boot disk.

# mkboot –a “hpux –lq” /dev/rdsk/cXtXdX

10. Reactivate the volume group to attach the physical volumen.

# vgchange –a y vgXX
  • Note:
In case that the volume group don’t start to synchronize the logical volumes automatically, you can force synchronization with:
# vgsync vgXX

11. Use lvlnboot to ensure that the LVM logical volumes are prepared to be root, primary swap or dump volume.

# lvlnboot -R
# lvlnboot -v

HP 9000 (PA-RISC) - Online Hot Plug

1. Detach the physical volume from volume group:

# pvchange -a n /dev/dsk/cXtXdX

2. Hot swap the disk.

3. Restore the LVM reserved areas (PVRA/VGRA):

# vgcfgrestore –n vgXX /dev/rdsk/cXtXdX

Is a bootable disk? No, jump to step 6.

4. Repopulate the LIF area:

# mkboot /dev/rdsk/cXtXdX

5. Change the AUTO file contents to the proper mode:

A) Primary boot disk.

# mkboot –a “hpux” /dev/rdsk/cXtXdX

B) Alternate boot disk.

# mkboot –a “hpux –lq” /dev/rdsk/cXtXdX

6. Reattach the new disk:

# pvchange -a y /dev/dsk/cXtXdX

7. Reactivate the volume group to attach the physical volumen.

# vgchange –a y vgXX

Note: In case that the volume group don’t start to synchronize the logical volumes automatically, you can force synchronization with:

# vgsync vgXX

8. Use lvlnboot to ensure that the LVM logical volumes are prepared to be root, primary swap or dump volume.

# lvlnboot -R
# lvlnboot -v

HP 9000 (PA-RISC) 11.31 - Persistent DSF

1. Save hardware paths information of the disk (printout or file). It's very important to save this information, since some of these details won't be available after the scsimgr command.

# ioscan -fnkNC disk
Class     I  H/W Path  Driver S/W State   H/W Type     Description
===================================================================
disk      8  64000/0xfa00/0x2  esdisk   NO_HW     DEVICE       HP 36.4GST336754LC
                      /dev/disk/disk8   /dev/rdisk/disk8

# ioscan -m lun
Class     I  Lun H/W Path  Driver  S/W State   H/W Type     Health   Description
=======================================================================
disk      8  64000/0xfa00/0x2   esdisk  NO_HW       DEVICE       offline  HP 36.4GST336754LC
             0/1/1/0.0xa.0x0
                      /dev/disk/disk8   /dev/rdisk/disk8

# ioscan -fnkNC lunpath
Class     I  H/W Path  Driver S/W State   H/W Type     Description
==================================================================
lunpath   2  0/1/1/0.0xa.0x0    eslpt   NO_HW       LUN_PATH     LUN path for disk8

Note: If the server is rebooted to execute the change, only the new LUN instance will be displayed. The old LUN will disappear from the ioscan output. Keep the output of these commands in a secure place.

2. Detach the physical volume from the volume group.

# pvchange -a N /dev/disk/disk8

3. Physically replace the disk.

4. Replacing and executing ioscan again won't report the disk as CLAIMED yet. Checking the lunpath(s) you should be able to see the AUTH_FAILED state. This is a security mechanism implemented on HP-UX 11.31 to avoid replacing the bad disk unless you explicitly authorize it from the OS.

# scsimgr get_info -C lunpath -I 2

        STATUS INFORMATION FOR LUN PATH : lunpath2

Generic Status Information

SCSI services internal state                  = UNOPEN
Open close state                              = AUTH_FAILED

5. Notify the mass storage subsystem that the disk has been replaced (Authorize the replacement). Make sure you have created the logs specified in step #1, the lunpath HW path can't be read from the original disk after this command.

# scsimgr -f replace_wwid -D /dev/rdisk/disk8
scsimgr: Successfully validated binding of LUN paths with new LUN.

Note: This command allows the storage subsystem to replace the old disk's LUN World-Wide-Identifier (WWID) with the new disk's LUN WWID. The storage subsystem will create a new LUN instance and new device special files for the new disk. This command is not required if you reboot the server because no lunpath will be assign to the old /dev/rdisk/disk8, because the system automatically authorizes the replacement after the reboot.

6. Determine the new persistent device special file (agile view) of the disk. The lunpath HW path(0/1/1/0.0xa.0x0) was originally assigned to disk8, it is now temporary assigned to disk3 in this example. Using the lunpath HW path you ensure that disk3 is the correct new disk that replaces disk8.

# ioscan -m lun
Class     I  Lun H/W Path  Driver  S/W State   H/W Type     Health   Description
=======================================================================
disk      8  64000/0xfa00/0x2   esdisk  NO_HW       DEVICE       offline  HP 36.4GST336754LC
                      /dev/disk/disk8   /dev/rdisk/disk8
disk      3  64000/0xfa00/0x3   esdisk  CLAIMED     DEVICE       online   HP 36.4GST336753LC
             0/1/1/0.0xa.0x0
                      /dev/disk/disk3   /dev/rdisk/disk3

7. Assign the old instance number to the replacement disk. This commands restores disk8 as the valid device file to acces the new disk and removes disk3 device files.

# io_redirect_dsf -d /dev/disk/disk8 -n /dev/disk/disk3

# ioscan -m lun
Class     I  Lun H/W Path  Driver  S/W State   H/W Type     Health  Description
======================================================================
disk      8  64000/0xfa00/0x3   esdisk  CLAIMED     DEVICE       online  HP 36.4GST336753LC
             0/1/1/0.0xa.0x0
                      /dev/disk/disk8   /dev/rdisk/disk8

# ioscan -fnkNC disk
Class     I  H/W Path  Driver S/W State   H/W Type     Description
===================================================================
disk      8  64000/0xfa00/0x3  esdisk   CLAIMED     DEVICE       HP 36.4GST336753LC
                      /dev/disk/disk8   /dev/rdisk/disk8

8. Repopulate the LIF area:

# mkboot /dev/disk/disk8

9. Change the AUTO file contents to the proper mode:

A) Primary boot disk.

# mkboot –a “hpux” /dev/disk/disk8

B) Alternate boot disk.

# mkboot –a “hpux –lq” /dev/disk/disk8

10. Reattach the new disk:

# pvchange -a y /dev/disk/disk8

11. Use lvlnboot to ensure that the LVM logical volumes are prepared to be root, primary swap or dump volume.

# lvlnboot -R
# lvlnboot -v

12. Reactivate the volume group to attach the physical volumen.

# vgchange –a y vgXX

Note: In case that the volume group don’t start to synchronize the logical volumes automatically, you can force synchronization with:

# vgsync vgXX

13. Use lvlnboot to ensure that the LVM logical volumes are prepared to be root, primary swap or dump volume.

# lvlnboot -R
# lvlnboot -v

14. Update /stand/bootconf to reflect your current boot disks. The format is "l" for "larry" followed by an space and the disk name, for example:

# cat /stand/bootconf
l /dev/disk/disk8

Integrity (Itanium) - Required reboot

1. Initiate the boot sequence:

# shutdown –ry 0

2. Replace the damaged disk.

The damage disk is one of the boot disk mirrors? No, jump to step 11.

3. Interrupt the EFI boot manager autoboot.

EFI Boot Manager ver 1.10 (14.60] Firmware ver 1.61 [4241]
[Escape]

4. Select the proper mirror. Can be primary or alternate. Depend of which disk you have replaced. your mirrored disk from the boot manager selection menu.

EFI Boot Manager ver 1.10 (14.60] Firmware ver 1.61 [4241]
Please select a boot option
   HP-UX Primary Boot
   HP-UX Alternate Boot
   EFI Shell [Built-in]

5. Verify which disk/kernel you booted from

# grep ‘Boot device’s HP-UX path” /var/adm/syslog.log
 vmunix: Boot device’s HP-UX HW path is: 0.0.0.0.1.0

6. In the HP-UX system prompt, recreate the device files for EFI and OS partitions on the new disk:

# mksf –H 0/1/1/0.1.0 –s 1
# mksf –H 0/1/1/0.1.0 –s 2
# mksf –H 0/1/1/0.1.0 –s 3
# mksf –H 0/1/1/0.1.0 -r –s 1
# mksf –H 0/1/1/0.1.0 -r –s 2
# mksf –H 0/1/1/0.1.0 -r –s 3

7. Create the EFI and OS partititions using an IPF partition description file.

# cat >> /tmp/idf << EOF
3 
EFI 500MB 
HPUX 100% 
HPSP 400MB 
EOF

8. Use idisk to setup the disk partitioning using the file created above:

# idisk -wf /tmp/idf /dev/rdsk/cXtXdX 
  • Note:
There will be a prompt with a message saying the operation may be destructive and asks to continue. Be sure to answer 'yes' for the operation to be successful. If the prompt is answered with 'y' only, an error is received along with a message saying "user aborting".

9. Use mkboot to format and populate the newly created EFI partition:

# mkboot -e -l /dev/dsk/cXtXdX

10. Change the AUTO file contents to the proper mode:

A) Primary boot disk.

# cat >> /tmp/auto << EOF
boot vmunix
EOF
# efi_cp –d /dev/rdsk/cXtXdXs1 /tmp/auto /efi/hpux/auto

B) Alternate boot disk.

# cat >> /tmp/auto << EOF
boot vmunix –lq
EOF
# efi_cp –d /dev/rdsk/cXtXdXs1 /tmp/auto /efi/hpux/auto

11. Restore the LVM reserved areas (PVRA/VGRA):

# vgcfgrestore –n vg00 /dev/rdsk/cXtXdXs2

12. Reactivate the volume group to attach the physical volume.

# vgchange –a y vgXX

Note: In case that the volume group don’t start to synchronize the logical volumes automatically, you can force synchronization with:

# vgsync vgXX

13. Use lvlnboot to ensure that the LVM logical volumes are prepared to be root, primary swap or dump volume.

# lvlnboot -R
# lvlnboot -v

Integrity (Itanium) - Online Hot Plug

1. Detach the physical volume from volume group:

# pvchange -a n /dev/dsk/cXtXdXs2

2. Hot swap the disk.

Is a bootable disk? No, jump to step 5.

3. Create a description file by doing the following:

# cat >> /tmp/idf << EOF
3 
EFI 500MB 
HPUX 100% 
HPSP 400MB 
EOF

4. Use idisk to setup the disk partitioning using the file created above:

# idisk -wf /tmp/idf /dev/rdsk/cXtXdX 


Note: There will be a prompt with a message saying the operation may be destructive and asks to continue. Be sure to answer 'yes' for the operation to be successful. If the prompt is answered with 'y' only, an error is received along with a message saying "user aborting".

5. Use mkboot to format and populate the newly created EFI partition:

# mkboot -e -l /dev/dsk/cXtXdX

6. Change the AUTO file contents to the proper mode:

A) Primary boot disk.

# cat >> /tmp/auto << EOF
boot vmunix
EOF
# efi_cp –d /dev/rdsk/cXtXdXs1 /tmp/auto /efi/hpux/auto

B) Alternate boot disk.

# cat >> /tmp/auto << EOF
boot vmunix –lq
EOF
# efi_cp –d /dev/rdsk/cXtXdXs1 /tmp/auto /efi/hpux/auto

7. Restore the LVM reserved areas (PVRA/VGRA):

# vgcfgrestore –n vg00 /dev/rdsk/cXtXdXs2

8. Reattach the new disk:

# pvchange -a y /dev/dsk/cXtXdXs2

9. Reactivate the volume group to attach the physical volumen.

# vgchange –a y vgXX

Note: In case that the volume group don’t start to synchronize the logical volumes automatically, you can force synchronization with:

# vgsync vgXX

10. Use lvlnboot to ensure that the LVM logical volumes are prepared to be root, primary swap or dump volume.

# lvlnboot -R
# lvlnboot -v

Integrity (Itanium) 11.31 - Persistent DSF

1. Save hardware paths information of the disk (printout or file). It's very important to save this information, since some of these details won't be available after the scsimgr command.

# ioscan -fnkNC disk
Class     I  H/W Path  Driver S/W State   H/W Type     Description
===================================================================
disk      8  64000/0xfa00/0x2  esdisk   NO_HW     DEVICE       HP 36.4GST336754LC
                      /dev/disk/disk8      /dev/disk/disk8_p2   /dev/rdisk/disk8     /dev/rdisk/disk8_p2
                      /dev/disk/disk8_p1   /dev/disk/disk8_p3   /dev/rdisk/disk8_p1  /dev/rdisk/disk8_p3


# ioscan -m lun
Class     I  Lun H/W Path  Driver  S/W State   H/W Type     Health   Description
=======================================================================
disk      8  64000/0xfa00/0x2   esdisk  NO_HW       DEVICE       offline  HP 36.4GST336754LC
             0/1/1/0.0xa.0x0
                      /dev/disk/disk8      /dev/disk/disk8_p2   /dev/rdisk/disk8     /dev/rdisk/disk8_p2
                      /dev/disk/disk8_p1   /dev/disk/disk8_p3   /dev/rdisk/disk8_p1  /dev/rdisk/disk8_p3


# ioscan -fnkNC lunpath
Class     I  H/W Path  Driver S/W State   H/W Type     Description
==================================================================
lunpath   2  0/1/1/0.0xa.0x0    eslpt   NO_HW       LUN_PATH     LUN path for disk8

Note: If the server is rebooted to execute the change, only the new LUN instance will be displayed. The old LUN will disappear from the ioscan output. Keep the output of these commands in a secure place.

2. Detach the physical volume from the volume group.

# pvchange -a N /dev/disk/disk8_p2

3. Physically replace the disk.

4. Replacing and executing ioscan again won't report the disk as CLAIMED yet. Checking the lunpath(s) you should be able to see the AUTH_FAILED state. This is a security mechanism implemented on HP-UX 11.31 to avoid replacing the bad disk unless you explicitly authorize it from the OS.

# scsimgr get_info -C lunpath -I 2

        STATUS INFORMATION FOR LUN PATH : lunpath2

Generic Status Information

SCSI services internal state                  = UNOPEN
Open close state                              = AUTH_FAILED

5. Notify the mass storage subsystem that the disk has been replaced (Authorize the replacement). Make sure you have created the logs specified in step #1, the lunpath HW path can't be read from the original disk after this command.

# scsimgr -f replace_wwid -D /dev/rdisk/disk8
scsimgr: Successfully validated binding of LUN paths with new LUN.

Note: This command allows the storage subsystem to replace the old disk's LUN World-Wide-Identifier (WWID) with the new disk's LUN WWID. The storage subsystem will create a new LUN instance and new device special files for the new disk. This command is not required if you reboot the server because no lunpath will be assign to the old /dev/rdisk/disk8, because the system automatically authorizes the replacement after the reboot.

6. Determine the new persistent device special file (agile view) of the disk. The lunpath HW path(0/1/1/0.0xa.0x0) was originally assigned to disk8, it is now temporary assigned to disk3 in this example. Using the lunpath HW path you ensure that disk3 is the correct new disk that replaces disk8.

# ioscan -m lun
Class     I  Lun H/W Path  Driver  S/W State   H/W Type     Health   Description
=======================================================================
disk      8  64000/0xfa00/0x2   esdisk  NO_HW       DEVICE       offline  HP 36.4GST336754LC
                      /dev/disk/disk8      /dev/disk/disk8_p2   /dev/rdisk/disk8     /dev/rdisk/disk8_p2
                      /dev/disk/disk8_p1   /dev/disk/disk8_p3   /dev/rdisk/disk8_p1  /dev/rdisk/disk8_p3
disk      3  64000/0xfa00/0x3   esdisk  CLAIMED     DEVICE       online   HP 36.4GST336753LC
             0/1/1/0.0xa.0x0
                      /dev/disk/disk3   /dev/rdisk/disk3

7. Create a description file to create the EFI partitions. Use the following command:

# cat >> /tmp/idf << EOF
3 
EFI 500MB 
HPUX 100% 
HPSP 400MB 

8. Use idisk to setup the disk partitioning using the file created above and create the persistent device special files.

# idisk -wf /tmp/idf /dev/rdisk/disk# 
  • Note:
There will be a prompt with a message saying the operation may be destructive and asks to continue. Be sure to answer 'yes' for the operation to be successful. If the prompt is answered with 'y' only, an error is received along with a message saying "user aborting".
# insf -e -H 64000/0xfa00/0x#
insf: Installing special files for esdisk instance # address 64000/0xfa00/0x#

9. Verify the state of the mass storage subsystem after creating the EFI partitions.

# ioscan -m lun
Class     I  Lun H/W Path  Driver  S/W State   H/W Type     Health   Description
=======================================================================
disk      8  64000/0xfa00/0x2   esdisk  NO_HW       DEVICE       offline  HP 36.4GST336754LC
                      /dev/disk/disk8      /dev/disk/disk8_p2   /dev/rdisk/disk8     /dev/rdisk/disk8_p2
                      /dev/disk/disk8_p1   /dev/disk/disk8_p3   /dev/rdisk/disk8_p1  /dev/rdisk/disk8_p3
disk      3  64000/0xfa00/0x3   esdisk  CLAIMED     DEVICE       online   HP 36.4GST336753LC
             0/1/1/0.0xa.0x0
                      /dev/disk/disk3      /dev/disk/disk3_p2   /dev/rdisk/disk3     /dev/rdisk/disk3_p2
                      /dev/disk/disk3_p1   /dev/disk/disk3_p3   /dev/rdisk/disk3_p1  /dev/rdisk/disk3_p3

10. Assign the old instance number to the replacement disk. This commands restores disk8 as the valid device file to acces the new disk and removes disk3 device files.

# io_redirect_dsf -d /dev/disk/disk8 -n /dev/disk/disk3

  • Note:
If you miss to create the efi partition before using io_redirect_dsf, the command will gracefully fail with the following error message:
# io_redirect_dsf -d /dev/disk/disk8 -n /dev/disk/disk3
Number of old DSFs=8.
Number of new DSFs=2.
The number of old and new DSFs must be the same.

11. Verify that the io_redirect_dsf has properly attached the disk to the previous persistent dsf and that the physical volume status is CLAIMED.

# ioscan -m lun
Class     I  Lun H/W Path  Driver  S/W State   H/W Type     Health  Description
======================================================================
disk      8  64000/0xfa00/0x3   esdisk  CLAIMED     DEVICE       online  HP 36.4GST336753LC
             0/1/1/0.0xa.0x0
                      /dev/disk/disk8      /dev/disk/disk8_p2   /dev/rdisk/disk8     /dev/rdisk/disk8_p2
                      /dev/disk/disk8_p1   /dev/disk/disk8_p3   /dev/rdisk/disk8_p1  /dev/rdisk/disk8_p3

# ioscan -fnkNC disk
Class     I  H/W Path  Driver S/W State   H/W Type     Description
===================================================================
disk      8  64000/0xfa00/0x3  esdisk   CLAIMED     DEVICE       HP 36.4GST336753LC
                      /dev/disk/disk8      /dev/disk/disk8_p2   /dev/rdisk/disk8     /dev/rdisk/disk8_p2
                      /dev/disk/disk8_p1   /dev/disk/disk8_p3   /dev/rdisk/disk8_p1  /dev/rdisk/disk8_p3

12. Use mkboot to format and populate the newly created EFI partition:

# mkboot -e -l /dev/dsk/disk8

13. Change the AUTO file contents to the proper mode:.

A) Primary boot disk.

# cat >> /tmp/auto << EOF
boot vmunix
EOF
# efi_cp –d /dev/rdisk/disk8_p1 /tmp/auto /efi/hpux/auto

B) Alternate boot disk.

# cat >> /tmp/auto << EOF
boot vmunix –lq
EOF
# efi_cp –d /dev/rdisk/disk8_p1 /tmp/auto /efi/hpux/auto

14. Restore the LVM reserved areas (PVRA/VGRA).

# vgcfgrestore –n vg00 /dev/rdisk/disk8_p2

15. Reattach the new disk:

# pvchange -a y /dev/rdisk/disk8_p2

16. Reactivate the volume group to attach the physical volumen.

# vgchange –a y vgXX

Note: In case that the volume group don’t start to synchronize the logical volumes automatically, you can force synchronization with:

# vgsync vgXX

17. Use lvlnboot to ensure that the LVM logical volumes are prepared to be root, primary swap or dump volume.

# lvlnboot -R
# lvlnboot -v

18. Update /stand/bootconf to reflect your current boot disks. The format is "l" for "larry" followed by an space and the disk name, for example:

# cat /stand/bootconf
l /dev/disk/disk8

Integrity (Itanium) - SAS Disk Replacement

Please follow these instructions. In this example the failed drive is c1t4d0 and it's corresponding SAS controller name sasd0:

1. Check current configuration state:

# ioscan -fnH 0/4/1/0

Class        I  H/W Path        Driver    S/W State   H/W Type     Description
===============================================================================
escsi_ctlr   0  0/4/1/0         sasd      CLAIMED     INTERFACE    HP  PCI/PCI-X SAS MPT Adapter
                               /dev/sasd0
ext_bus      1  0/4/1/0.0.0     sasd_vbus CLAIMED     INTERFACE    SAS Device Interface
target       4  0/4/1/0.0.0.4   tgt       NO_HW       DEVICE
disk         5  0/4/1/0.0.0.4.0   sdisk     NO_HW       DEVICE       HP      DG036A8B5B
                               /dev/dsk/c1t4d0     /dev/rdsk/c1t4d0
                               /dev/dsk/c1t4d0s1   /dev/rdsk/c1t4d0s1
                               /dev/dsk/c1t4d0s2   /dev/rdsk/c1t4d0s2
                               /dev/dsk/c1t4d0s3   /dev/rdsk/c1t4d0s3
target       5  0/4/1/0.0.0.7   tgt       CLAIMED     DEVICE
disk         8  0/4/1/0.0.0.7.0   sdisk     CLAIMED     DEVICE       HP      DG036A9BB6
                               /dev/dsk/c1t7d0     /dev/rdsk/c1t7d0
                               /dev/dsk/c1t7d0s1   /dev/rdsk/c1t7d0s1
                               /dev/dsk/c1t7d0s2   /dev/rdsk/c1t7d0s2
                               /dev/dsk/c1t7d0s3   /dev/rdsk/c1t7d0s3

Note: It's important to save also the output of "sasmgr get_info -D /dev/sasd# -q raid=all", this way you can compare the original disk with the replacement disk using the bay number, since the target id will always change and can't be used as a comparison strategy when replacing SAS disks.

2. If the physical volume is part of an existing volume group, temporaly disable LVM I/O to the drive:

# pvchange -a n /dev/dsk/c1t4d0

3. Turn on the disk’s locator LED to ensure the remove the correct disk from the sas bay.

# sasmgr set_attr -D /dev/sasd0 -q lun=/dev/rdsk/c1t4d0 -q locate_led=on

Verify that only the failed drive locate LED is set to ON.

# sasmgr get_info -D /dev/sasd0 -q lun=all -q lun_locate
/dev/rdsk/c1t2d0          0/4/1/0.0.0.2.0           1     3     OFF
/dev/rdsk/c1t3d0          0/4/1/0.0.0.3.0           1     4     OFF
/dev/rdsk/c1t4d0          0/4/1/0.0.0.4.0           1     5     ON
/dev/rdsk/c1t7d0          0/4/1/0.0.0.7.0           1     8     OFF

RAID VOL ID is 4 :
/dev/rdsk/c1t10d0         0/4/1/0.0.0.10.0

Physical disks in volume are :
        1     1     OFF           HP            DG072A9BB7         HPD0
        1     2     OFF           HP            DG072A9BB7         HPD0

4. At this point the drive in bay 5 can be physically remove and replace with the new drive.

5. Running ioscan again will output the new disk HW path, 0/4/1/0.0.0.11.0 in this example. Failed drive will still show NO_HW. This behavior is normal:

# ioscan -fnH 0/4/1/0

Class        I  H/W Path        Driver    S/W State   H/W Type     Description
===============================================================================
escsi_ctlr   0  0/4/1/0         sasd      CLAIMED     INTERFACE    HP  PCI/PCI-X SAS MPT Adapter
                               /dev/sasd0
ext_bus      1  0/4/1/0.0.0     sasd_vbus CLAIMED     INTERFACE    SAS Device Interface
target       4  0/4/1/0.0.0.4   tgt       NO_HW       DEVICE
disk         5  0/4/1/0.0.0.4.0   sdisk     NO_HW       DEVICE       HP      DG036A8B5B
                               /dev/dsk/c1t4d0     /dev/rdsk/c1t4d0
                               /dev/dsk/c1t4d0s1   /dev/rdsk/c1t4d0s1
                               /dev/dsk/c1t4d0s2   /dev/rdsk/c1t4d0s2
                               /dev/dsk/c1t4d0s3   /dev/rdsk/c1t4d0s3
target       5  0/4/1/0.0.0.7   tgt       CLAIMED     DEVICE
disk         8  0/4/1/0.0.0.7.0   sdisk     CLAIMED     DEVICE       HP      DG036A9BB6
                               /dev/dsk/c1t7d0     /dev/rdsk/c1t7d0
                               /dev/dsk/c1t7d0s1   /dev/rdsk/c1t7d0s1
                               /dev/dsk/c1t7d0s2   /dev/rdsk/c1t7d0s2
                               /dev/dsk/c1t7d0s3   /dev/rdsk/c1t7d0s3
target       7  0/4/1/0.0.0.11  tgt       CLAIMED     DEVICE
disk        12  0/4/1/0.0.0.11.0  sdisk     CLAIMED     DEVICE       HP      DG036A8B5B

6. The new disk will use a different SAS address (Similar to WWN on fibre channel connections). The old device special file name must be redirected to the new HW path. Issue the following command to update the configuration:

# sasmgr replace_tgt -D /dev/sasd0 -q old_dev=/dev/dsk/c1t4d0 -q new_tgt_hwpath=0/4/1/0.0.0.11.0

WARNING: This is a DESTRUCTIVE operation.
This might result in failure of current I/O requests.
Do you want to continue ?(y/n) [n]...
LUN has been replaced with new Target.

7. Verify the system state with ioscan:

# ioscan -fnH 0/4/1/0

Class        I  H/W Path        Driver    S/W State   H/W Type     Description
===============================================================================
escsi_ctlr   0  0/4/1/0         sasd      CLAIMED     INTERFACE    HP  PCI/PCI-X SAS MPT Adapter
                               /dev/sasd0
ext_bus      1  0/4/1/0.0.0     sasd_vbus CLAIMED     INTERFACE    SAS Device Interface
target       4  0/4/1/0.0.0.4   tgt       CLAIMED       DEVICE
disk         5  0/4/1/0.0.0.4.0   sdisk     CLAIMED       DEVICE       HP      DG036A8B5B
                               /dev/dsk/c1t4d0     /dev/rdsk/c1t4d0
                               /dev/dsk/c1t4d0s1   /dev/rdsk/c1t4d0s1
                               /dev/dsk/c1t4d0s2   /dev/rdsk/c1t4d0s2
                               /dev/dsk/c1t4d0s3   /dev/rdsk/c1t4d0s3
target       5  0/4/1/0.0.0.7   tgt       CLAIMED     DEVICE
disk         8  0/4/1/0.0.0.7.0   sdisk     CLAIMED     DEVICE       HP      DG036A9BB6
                               /dev/dsk/c1t7d0     /dev/rdsk/c1t7d0
                               /dev/dsk/c1t7d0s1   /dev/rdsk/c1t7d0s1
                               /dev/dsk/c1t7d0s2   /dev/rdsk/c1t7d0s2
                               /dev/dsk/c1t7d0s3   /dev/rdsk/c1t7d0s3
target       7  0/4/1/0.0.0.11  tgt       NO_HW     DEVICE
disk        12  0/4/1/0.0.0.11.0  sdisk     NO_HW     DEVICE       HP      DG036A8B5B
  • Note: The S/W State of the H/W Path 0/4/1/0.0.0.4.0 changed to CLAIMED, and the S/W State of the H/W Path 0/4/1/0.0.0.11.0 changed to NO_H/W. The hardware path 0/4/1/0.0.0.11.0 will remain as NO_HW in the ioscan output until the system next reboot.

8. Now you must restore LVM mirroring to the new disk. If this is a bootable volume group, it involves creating the EFI partitions and formatting partition 1; changing the autoboot file if this was the mirror (not necessary for the primary disk); and, restoring the LVM information to EFI partition 2.

9. Note: Only necessary for vg00. Rewrite the boot information to the EFI Boot Menu so the system will be able to boot from the new path.

# setboot
Primary bootpath : 0/4/1/0.0.0.7.0
HA Alternate bootpath : <none>
Alternate bootpath : <none>
Autoboot is ON (enabled)

# setboot -h 0/4/1/0.0.0.4.0

# setboot
Primary bootpath : 0/4/1/0.0.0.7.0
HA Alternate bootpath : 0/4/1/0.0.0.4.0
Alternate bootpath : 0/0/2/0
Autoboot is ON (enabled)

Monitor volume group syncronization

If you need to monitor the advance of the volume group sincronization, you can use this script to quickly monitor extends that are still "stale". Running this command should give a lower value each iteration until it reachs zero.

while true
do
   for lv in $(vgdisplay -v <vgname> | grep "LV Name" | awk '{print $3}')
   do
      lvdisplay -v $lv
   done | grep -i stale | wc -l
    sleep 10
done

Example:

while true
do
   for lv in $(vgdisplay -v vg00 | grep "LV Name" | awk '{print $3}')
   do
      lvdisplay -v $lv
   done | grep -i stale | wc -ldone | grep -i stale | wc -l
   sleep 10
done
5
0

Reference

Authors

This page was last modified on 3 August 2011, at 19:48. This page has been accessed 23,793 times.