Friday, March 23, 2012

Replace Faulty Disk in a SCSI Mirrored VG in AIX

Found a Faulty SCSI Disk in a volume group on one of the LPAR in p595, Volume Group has 40 SCSI Disks mirrored and one Logical Volume and it is marked stale.,
Following are the steps taken to replace the Faulty Disk

Error Report Details

# errpt |more

IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
EAA3D429   0321085312 U S LVDD           PHYSICAL PARTITION MARKED STALE
EAA3D429   0321084712 U S LVDD           PHYSICAL PARTITION MARKED STALE

16F35C72   0321044812 P H hdisk78        DISK OPERATION ERROR
16F35C72   0321023812 P H hdisk78        DISK OPERATION ERROR

$ errpt -a -j F7DDA124 |more
---------------------------------------------------------------------------
LABEL:          LVM_SA_PVMISS
IDENTIFIER:     F7DDA124

Date/Time:       Wed Mar 21 04:48:29 EDT 2012
Sequence Number: 99327
Machine Id:      00CFEFAF4C00
Node Id:         test01t
Class:           H
Type:            UNKN
WPAR:            Global
Resource Name:   LVDD
Resource Class:  NONE
Resource Type:   NONE
Location:

Description
PHYSICAL VOLUME DECLARED MISSING

Probable Causes
POWER, DRIVE, ADAPTER, OR CABLE FAILURE

Detail Data
MAJOR/MINOR DEVICE NUMBER
8000 0013 0000 0051
SENSE DATA
00CF EFAF 0000 4C00 0000 0113 D553 FECC 00CF EFAF 9018 3EC4 0000 0000 0000 0000

$ errpt -a -j 16F35C72 |more
---------------------------------------------------------------------------
LABEL:          DISK_ERR2
IDENTIFIER:     16F35C72

Date/Time:       Wed Mar 21 04:48:29 EDT 2012
Sequence Number: 99325
Machine Id:      00CFEFAF4C00
Node Id:         test01t
Class:           H
Type:            PERM
WPAR:            Global
Resource Name:   hdisk78
Resource Class:
Resource Type:
Location:
VPD:
        Manufacturer................IBM
        Machine Type and Model......ST373454LC
        FRU Number..................00P2685
        ROS Level and ID............43373137
        Serial Number...............0005D90D
        EC Level....................H13092
        Part Number.................26K5280
        Device Specific.(Z0)........000004129F00013E
        Device Specific.(Z1)........0721C717
        Device Specific.(Z2)........0002
        Device Specific.(Z3)........05179
        Device Specific.(Z4)........0001
        Device Specific.(Z5)........22
        Device Specific.(Z6)........H13092

Description
DISK OPERATION ERROR

Probable Causes
DASD DEVICE

Failure Causes
DISK DRIVE
DISK DRIVE ELECTRONICS

$ lscfg -vl hdisk78
  hdisk78          U5791.001.9920546-P2-T6-L10-L0  16 Bit LVD SCSI Disk Drive (7
3400 MB)

        Manufacturer................IBM
        Machine Type and Model......ST373454LC
        FRU Number..................00P2685
        ROS Level and ID............43373137
        Serial Number...............0005D90D
        EC Level....................H13092
        Part Number.................26K5280
        Device Specific.(Z0)........000004129F00013E
        Device Specific.(Z1)........0721C717
        Device Specific.(Z2)........0002
        Device Specific.(Z3)........05179
        Device Specific.(Z4)........0001
        Device Specific.(Z5)........22
        Device Specific.(Z6)........H13092


LSPV LVM Details

$ lspv -l hdisk78
hdisk78:
LV NAME               LPs     PPs     DISTRIBUTION          MOUNT POINT
testlv                 545     545     110..109..109..109..108 /testview
16F35C72   0321044812 P H hdisk78        DISK OPERATION ERROR

#lsvg -l testview2vg
testview2vg:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
testlv               jfs2       10890   21780   40   open/stale    /testview
loglv13             jfs2log    1       2       2    open/syncd    N/A


#lslv -m testlv  ----- This command gives the paired disks

test:/testview
LP    PP1  PV1               PP2  PV2               PP3  PV3
0001  0111 hdisk1            0111 hdisk75
0002  0111 hdisk2            0111 hdisk77
0003  0111 hdisk5            0111 hdisk78
0004  0111 hdisk6            0111 hdisk79
0005  0111 hdisk7            0111 hdisk76
0006  0111 hdisk8            0110 hdisk52
0007  0111 hdisk9            0110 hdisk53
0008  0111 hdisk11           0110 hdisk68
0009  0111 hdisk12           0110 hdisk69
0010  0111 hdisk13           0110 hdisk70
0011  0111 hdisk16           0110 hdisk71
0012  0111 hdisk18           0110 hdisk72

#/usr/sbin/unmirrorvg testview2vg hdisk78  ------This unmirrors all the mirrored drives
#/usr/sbin/reducevg testview2vg hdisk78 ---Remove the failed disk from Volume Group

#lspv -l hdisk78   ----Verify that the disk is not used
0516-320 : Physical volume 00cfefaf90183ec40000000000000000 is not assigned to
        a volume group.

Identify & Replace Procedure

#diag -->Task Selection->Hot Plug Task->SCSI and SCSI RAID Hot Plug Manager->Identify a Device Attached to a SCSI Hot Swap Enclosure Device-> select the device...once identified Select
Replace/Remove a Device Attached to an SCSI Hot Swap Enclosure Devic-->Select the Device--> it will mark the device in replace mode-> Have CE replace the device-> hit enter in the screen--> it will mark the device populated -----> then select Configure Added/Replaced Devices--> this will run cfgmgr

Assign the Drive to Volume Group and Remirro...

#/usr/sbin/extendvg 'testview2vg' 'hdisk78'
#/usr/sbin/unmirrorvg testview2vg hdisk46 hdisk47 hdisk48  hdisk49  hdisk50  hdisk51  hdisk52  hdisk53  hdisk68  hdisk69  hdisk70  hdisk71  hdisk72  hdisk73  hdisk74  hdisk75  hdisk76  hdisk77  hdisk78  hdisk79
#nohup /usr/sbin/syncvg -testlv00  &  ---This will sync the Volumes, will take few hours depending on the size










No comments:

Post a Comment