Replacing Faulty Disk in ROOTVG
Analyzing Disk Fault
The first signs that a hard disk is going faulty are temporary error log messages in Error Reporter. If you see random temporary errors, then you don't have an immediate problem but if you start to see a bundle of temporary errors then the disk will need replacing. The worse case scenario is permanent error against a hard disk and stale partitions.
Check to see how many errors have been logged and whether they are permanent of temporary by:
errpt |more
1581762B 0727203502 T H hdisk0 DISK OPERATION ERROR
1581762B 0727203502 P H hdisk0 DISK OPERATION ERROR
The first error log message shows that there is a temporary disk problem on hdisk0, whilst the second error log message shows a permanent error also on hdisk0. The procedures for replacing hdisk0 & hdisk1 <part of rootvg> are slightly different. See the steps below.
To check for stale partitons, run the command: lsvg -l rootvg
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 2 2 closed/syncd N/A
hd6 paging 64 128 2 open/syncd N/A
hd8 jfslog 1 2 2 open/stale N/A
hd4 jfs 4 8 2 open/stale /
Steps for replacing faulty disks in other volume groups are much simpler than replacing disks in rootvg. I have written a procedure for this below also.
For procedures on replacing faulty SSA disk, refer to the link
Replacing hdisk0 in rootvg
Change bootlist
bosboot -a -d hdisk1 Make sure hdisk1 has a boot image
bootlist -m normal hdisk1 hdisk0 Change the bootlist so the system will use hdisk1 before hdisk0
Removing Primary Dump Device
sysdumpdev -l The primary dump device will always be on hdisk0, this will need to be changed
primary /dev/pdumplv
secondary /dev/sdumplv
copy directory /var/adm/dump
forced copy flag FALSE
always allow dump TRUE
dump compression ON
sysdumpdev -Pp /dev/hd6 Changes primary dump device
primary /dev/hd6
secondary /dev/sdumplv
copy directory /var/adm/dump
forced copy flag FALSE
always allow dump TRUE
dump compression ON
rmlv pdumplv Remove the logical volume pdumplv, the primary dump device
Un-Mirroring Hard Disk from VG
Now you need to un-mirror the volume group so the disk can be removed. There are two ways you can do this, one is whereby you run it at a disk level and the other is at a logical partition level. The outcome will be the same with both commands but with the second you have more control.
Method One
unmirrorvg rootvg hdisk0 Unmirrors the disk.
NB: Sometimes this is unstable, especially if you have stale partitions. I have also noticed that if pdumplv is mirrored <shouldn't be by default>, this command will fail. In this instance, unmirror the logical volume and then run the unmirrorvg command, alternatively follow the method below.
Method Two
lsvg -l rootvg Lists all logical volumes in rootvg
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 2 2 closed/syncd N/A
hd6 paging 64 128 2 open/syncd N/A
hd8 jfslog 1 2 2 open/syncd N/A
hd4 jfs 4 8 2 open/syncd /
rmlvcopy LVNAME 1 hdisk0 Run this command for each logical volume
e.g: rmlvcopy hd5 1 hdisk0
Check the disk has been umirrored by: lsvg -l rootvg. For each LV, the PVs column will have 1
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 2 1 closed/syncd N/A
hd6 paging 64 128 1 open/syncd N/A
hd8 jfslog 1 2 1 open/syncd N/A
hd4 jfs 4 8 1 open/syncd /
Make a note of the SCSI id and serial number which will make the CE's life easier when he has to remove the disk. I have highlighted the SCSI id <8> and serial number <4DFJY156> from the example below. The command you need to run is. lscfg -vl hdisk0
DEVICE LOCATION DESCRIPTION
hdisk0 10-88-00-8,0 16 Bit LVD SCSI Disk Drive <9100 MB>
Manufacturer............................IBM
Machine Type and Model......DDYS-T09170M
FRU Number...........................00P1517
ROS Level and ID...................53394841
Serial Number.........................4DFJY156
EC Level...................................F79924
Part Number............................07N3852
Device Specific.<Z0>...............000003029F00013A
Device Specific.<Z1>...............07N4925
Device Specific.<Z2>...............0933
Device Specific.<Z3>...............00315
Device Specific.<Z4>...............0001
Device Specific.<Z5>...............22
Device Specific.<Z6>...............F79924
Remove the Disk from VG
reducevg rootvg hdisk0 Remove hdisk0 from the volume group
rmdev -l hdisk0 -d Remove the definition of hdisk0 from the system
lsvg rootvg Ensure disk is removed
lspv hdisk0 Ensure disk is removed
Now Remove the Disk physically and add the New Disk.
Add the New Disk to the System
cfgmgr Now run configuration Manager to add the new disk to the system
diag Then go into diagnostics to update the system log so the system is aware that hdisk0 has been replaced
Task Selection ->
Log Repair Action ->
hdisk0
Esc 0 To exit diagnostics after Log Repair Action has completed.
errpt | more Check Log Repair Action has taken place. You should see an entry like :-
2F3E09A4 0819110902 I H hdisk2 REPAIR ACTION
diag Go back into diagnostics and certify this disk. This will indicate whether the new disk is ok
Task Selection ->
Certify the disk ->
hdisk0 Commit the changes and exit by pressing F3
Esc 0 To exit diagnostics after Certifying the new disk
Add disk into the Volume Group
extendvg rootvg hdisk0 Add disk into the volume group rootvg
Now you need to re-mirror the disk. Again you can mirror at a disk level or at a logical level.
Re-Mirroring Hard Disk
Method One
mirrorvg rootvg hdisk0 Mirrors the disk
syncvg -v rootvg Synchronizes the volume group and the data contained within it
NB: This method will mirror the logical volume pdumplv. Unmirror the logical volume by:
rmlvcopy pdumplv 1 hdisk1
Method Two
lsvg -l rootvg Lists all the logical volumes to re-mirror
mklvcopy -k LVNAME 2 hdisk0 Run this command for each logical volume. This will also synchronize the data <-k>
e.g: mklvcopy hd5 hdisk0
NB: Do not mirror the logical volume pdumplv
syncvg -v rootvg Synchronizes the volume group and the data contained within it
lsvg -l rootvg Check datavg has been mirrored and status is open/syncd
Check the volume group has been completely re-mirrored by: lsvg -l rootvg. The PV column should have 2 for each LVNAME apart from pdumplv & sdumplv
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 2 2 closed/syncd N/A
hd6 paging 64 128 2 open/syncd N/A
hd8 jfslog 1 2 2 open/syncd N/A
hd4 jfs 4 8 2 open/syncd /
mklv -y 'pdumplv' rootvg 40 hdisk0 Re-create the logical volume for your primary dump device
sysdumpdev -Pp /dev/pdumplv Re-alocate your primary dump device.
primary /dev/pdumplv
secondary /dev/sdumplv
copy directory /var/adm/dump
forced copy flag FALSE
always allow dump TRUE
dump compression ON
bosboot -a -d hdisk0 Update the boot image on hdisk0
bootlist -m normal hdisk0 hdisk1 Change your boot list back.