Difference between revisions of "Replacing failed disk on Server"

From DISI
Jump to: navigation, search
(On shin / resh)
 
(2 intermediate revisions by one user not shown)
Line 1: Line 1:
 
== How to check if Disk failed==
 
== How to check if Disk failed==
On shin
 
/opt/MegaRAID/storcli/storcli64 /c0 /eall /sall show all
 
<pre>Drive /c0/e8/s18 :
 
================
 
 
-----------------------------------------------------------------------------
 
EID:Slt DID State  DG    Size Intf Med SED PI SeSz Model            Sp Type
 
-----------------------------------------------------------------------------
 
8:18    24 Failed  0 3.637 TB SAS  HDD N  N  512B ST4000NM0023    U  -   
 
-----------------------------------------------------------------------------
 
 
EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
 
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
 
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
 
Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info
 
SeSz-Sector Size|Sp-Spun|U-Up|D-Down|T-Transition|F-Foreign
 
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
 
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded
 
 
 
Drive /c0/e8/s18 - Detailed Information :
 
=======================================
 
 
Drive /c0/e8/s18 State :
 
======================
 
Shield Counter = 0
 
Media Error Count = 0
 
Other Error Count = 16
 
BBM Error Count = 0
 
Drive Temperature =  32C (89.60 F)
 
Predictive Failure Count = 0
 
S.M.A.R.T alert flagged by drive = No
 
 
 
Drive /c0/e8/s18 Device attributes :
 
==================================
 
SN = Z1Z2S2TL0000C4216E9V
 
Manufacturer Id = SEAGATE
 
Model Number = ST4000NM0023   
 
NAND Vendor = NA
 
WWN = 5000C50057DB2A28
 
Firmware Revision = 0003
 
Firmware Release Number = 03290003
 
Raw size = 3.638 TB [0x1d1c0beb0 Sectors]
 
Coerced size = 3.637 TB [0x1d1b00000 Sectors]
 
Non Coerced size = 3.637 TB [0x1d1b0beb0 Sectors]
 
Device Speed = 6.0Gb/s
 
Link Speed = 6.0Gb/s
 
Write cache = N/A
 
Logical Sector Size = 512B
 
Physical Sector Size = 512B
 
Connector Name = Port 0 - 3 & Port 4 - 7 </pre>
 
 
 
 
===Check for the light on disk===
 
===Check for the light on disk===
  
Line 69: Line 15:
  
 
== How to check if disk is failed or install correctly==
 
== How to check if disk is failed or install correctly==
 +
=== On Cluster 0 's machines ===
 
1. Log into gimel as root  
 
1. Log into gimel as root  
 
  $ ssh root@sgehead1.bkslab.org
 
  $ ssh root@sgehead1.bkslab.org
Line 129: Line 76:
 
   Enclosure SEP (Vendor ID HP, Model MSA60) 244 (WWID: 500143800460A625, Port: 2E, Box: 1)
 
   Enclosure SEP (Vendor ID HP, Model MSA60) 244 (WWID: 500143800460A625, Port: 2E, Box: 1)
 
   SEP (Vendor ID HP, Model P800) 247 (WWID: 50014380055E913E)
 
   SEP (Vendor ID HP, Model P800) 247 (WWID: 50014380055E913E)
 +
 +
=== On '''shin'''===
 +
/opt/MegaRAID/storcli/storcli64 /c0 /eall /sall show all
 +
<pre>Drive /c0/e8/s18 :
 +
================
 +
 +
-----------------------------------------------------------------------------
 +
EID:Slt DID State  DG    Size Intf Med SED PI SeSz Model            Sp Type
 +
-----------------------------------------------------------------------------
 +
8:18    24 Failed  0 3.637 TB SAS  HDD N  N  512B ST4000NM0023    U  -   
 +
-----------------------------------------------------------------------------
 +
 +
EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
 +
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
 +
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
 +
Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info
 +
SeSz-Sector Size|Sp-Spun|U-Up|D-Down|T-Transition|F-Foreign
 +
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
 +
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded
 +
 +
 +
Drive /c0/e8/s18 - Detailed Information :
 +
=======================================
 +
 +
Drive /c0/e8/s18 State :
 +
======================
 +
Shield Counter = 0
 +
Media Error Count = 0
 +
Other Error Count = 16
 +
BBM Error Count = 0
 +
Drive Temperature =  32C (89.60 F)
 +
Predictive Failure Count = 0
 +
S.M.A.R.T alert flagged by drive = No
 +
 +
 +
Drive /c0/e8/s18 Device attributes :
 +
==================================
 +
SN = Z1Z2S2TL0000C4216E9V
 +
Manufacturer Id = SEAGATE
 +
Model Number = ST4000NM0023   
 +
NAND Vendor = NA
 +
WWN = 5000C50057DB2A28
 +
Firmware Revision = 0003
 +
Firmware Release Number = 03290003
 +
Raw size = 3.638 TB [0x1d1c0beb0 Sectors]
 +
Coerced size = 3.637 TB [0x1d1b00000 Sectors]
 +
Non Coerced size = 3.637 TB [0x1d1b0beb0 Sectors]
 +
Device Speed = 6.0Gb/s
 +
Link Speed = 6.0Gb/s
 +
Write cache = N/A
 +
Logical Sector Size = 512B
 +
Physical Sector Size = 512B
 +
Connector Name = Port 0 - 3 & Port 4 - 7 </pre>
 +
 +
=== On ZFS machines ===
 +
$ zpool status
 +
For instruction on how to identify and replace failed disk on ZFS system. [http://wiki.docking.org/index.php/Zfs#Example:_Fixing_degraded_pool.2C_replacing_faulted_disk '''Read here''']
  
 
[[ Category: Ben ]] [[ Category : Sysadmin ]]
 
[[ Category: Ben ]] [[ Category : Sysadmin ]]

Latest revision as of 14:52, 23 July 2020

How to check if Disk failed

Check for the light on disk

Solid Yellow => Fail

Blinking Yellow => Predictive Failure (going to fail soon)

Green => Normal

Replace disk instruction

  • Determine what machine the disk below to
  • Press the red button on the disk to turn it off.
  • Gently pull a little bit out (NOT all the way) and wait for 10 sec until it stops spinning before pulling all the way out.
  • Find replacement with a similar disk with the same specs
  • Carefully unscrew the disk from disk holder (if the disk holder part on the replacement is the same then you don't have to).

How to check if disk is failed or install correctly

On Cluster 0 's machines

1. Log into gimel as root

$ ssh root@sgehead1.bkslab.org

2. Log in as root to the machine that you determined from earlier

$ ssh root@<machine_name>
Example: RAID 3,6,7 belongs to nfshead2

3. Run this command

$ /opt/compaq/hpacucli/bld/hpacucli ctrl all show config
Output Example:
Smart Array P800 in Slot 1                (sn: PAFGF0N9SXQ0MX)
  array A (SATA, Unused Space: 0 MB)
     logicaldrive 1 (5.5 TB, RAID 1+0, OK)
     physicaldrive 1E:1:1 (port 1E:box 1:bay 1, SATA, 1 TB, OK)
     physicaldrive 1E:1:2 (port 1E:box 1:bay 2, SATA, 1 TB, OK)
     physicaldrive 1E:1:3 (port 1E:box 1:bay 3, SATA, 1 TB, OK)
     physicaldrive 1E:1:4 (port 1E:box 1:bay 4, SATA, 1 TB, OK)
     physicaldrive 1E:1:5 (port 1E:box 1:bay 5, SATA, 1 TB, OK)
     physicaldrive 1E:1:6 (port 1E:box 1:bay 6, SATA, 1 TB, OK)
     physicaldrive 1E:1:7 (port 1E:box 1:bay 7, SATA, 1 TB, OK)
     physicaldrive 1E:1:8 (port 1E:box 1:bay 8, SATA, 1 TB, OK)
     physicaldrive 1E:1:9 (port 1E:box 1:bay 9, SATA, 1 TB, OK)
     physicaldrive 1E:1:10 (port 1E:box 1:bay 10, SATA, 1 TB, OK)
     physicaldrive 1E:1:11 (port 1E:box 1:bay 11, SATA, 1 TB, OK)
     physicaldrive 1E:1:12 (port 1E:box 1:bay 12, SATA, 1 TB, OK)
  array B (SATA, Unused Space: 0 MB)
     logicaldrive 2 (5.5 TB, RAID 1+0, OK)
     physicaldrive 2E:1:1 (port 2E:box 1:bay 1, SATA, 1 TB, OK)
     physicaldrive 2E:1:2 (port 2E:box 1:bay 2, SATA, 1 TB, Predictive Failure)
     physicaldrive 2E:1:3 (port 2E:box 1:bay 3, SATA, 1 TB, OK)
     physicaldrive 2E:1:4 (port 2E:box 1:bay 4, SATA, 1 TB, OK)
     physicaldrive 2E:1:5 (port 2E:box 1:bay 5, SATA, 1 TB, OK)
     physicaldrive 2E:1:6 (port 2E:box 1:bay 6, SATA, 1 TB, OK)
     physicaldrive 2E:1:7 (port 2E:box 1:bay 7, SATA, 1 TB, OK)
     physicaldrive 2E:1:8 (port 2E:box 1:bay 8, SATA, 1 TB, OK)
     physicaldrive 2E:1:9 (port 2E:box 1:bay 9, SATA, 1 TB, OK)
     physicaldrive 2E:1:10 (port 2E:box 1:bay 10, SATA, 1 TB, OK)
     physicaldrive 2E:1:11 (port 2E:box 1:bay 11, SATA, 1 TB, OK)
     physicaldrive 2E:1:12 (port 2E:box 1:bay 12, SATA, 1 TB, OK)
  array C (SATA, Unused Space: 0 MB)
     logicaldrive 3 (5.5 TB, RAID 1+0, Ready for Rebuild)
     physicaldrive 2E:2:1 (port 2E:box 2:bay 1, SATA, 1 TB, OK)
     physicaldrive 2E:2:2 (port 2E:box 2:bay 2, SATA, 1 TB, OK)
     physicaldrive 2E:2:3 (port 2E:box 2:bay 3, SATA, 1 TB, OK)
     physicaldrive 2E:2:4 (port 2E:box 2:bay 4, SATA, 1 TB, OK)
     physicaldrive 2E:2:5 (port 2E:box 2:bay 5, SATA, 1 TB, OK)
     physicaldrive 2E:2:6 (port 2E:box 2:bay 6, SATA, 1 TB, OK)
     physicaldrive 2E:2:7 (port 2E:box 2:bay 7, SATA, 1 TB, OK)
     physicaldrive 2E:2:8 (port 2E:box 2:bay 8, SATA, 1 TB, OK)
     physicaldrive 2E:2:9 (port 2E:box 2:bay 9, SATA, 1 TB, OK)
     physicaldrive 2E:2:10 (port 2E:box 2:bay 10, SATA, 1 TB, OK)
     physicaldrive 2E:2:11 (port 2E:box 2:bay 11, SATA, 1 TB, OK)
     physicaldrive 2E:2:12 (port 2E:box 2:bay 12, SATA, 1 TB, OK)
  Expander 243 (WWID: 50014380031A4B00, Port: 1E, Box: 1)
  Expander 245 (WWID: 5001438005396E00, Port: 2E, Box: 2)
  Expander 246 (WWID: 500143800460A600, Port: 2E, Box: 1)
  Expander 248 (WWID: 50014380055E913F)
  Enclosure SEP (Vendor ID HP, Model MSA60) 241 (WWID: 50014380031A4B25, Port: 1E, Box: 1)
  Enclosure SEP (Vendor ID HP, Model MSA60) 242 (WWID: 5001438005396E25, Port: 2E, Box: 2)
  Enclosure SEP (Vendor ID HP, Model MSA60) 244 (WWID: 500143800460A625, Port: 2E, Box: 1)
  SEP (Vendor ID HP, Model P800) 247 (WWID: 50014380055E913E)

On shin

/opt/MegaRAID/storcli/storcli64 /c0 /eall /sall show all
Drive /c0/e8/s18 :
 ================

 -----------------------------------------------------------------------------
EID:Slt DID State  DG     Size Intf Med SED PI SeSz Model            Sp Type 
-----------------------------------------------------------------------------
8:18     24 Failed  0 3.637 TB SAS  HDD N   N  512B ST4000NM0023     U  -    
-----------------------------------------------------------------------------

EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded


Drive /c0/e8/s18 - Detailed Information :
=======================================

Drive /c0/e8/s18 State :
======================
Shield Counter = 0
Media Error Count = 0
Other Error Count = 16
BBM Error Count = 0
Drive Temperature =  32C (89.60 F)
Predictive Failure Count = 0
S.M.A.R.T alert flagged by drive = No


Drive /c0/e8/s18 Device attributes :
==================================
SN = Z1Z2S2TL0000C4216E9V
Manufacturer Id = SEAGATE 
Model Number = ST4000NM0023    
NAND Vendor = NA
WWN = 5000C50057DB2A28
Firmware Revision = 0003
Firmware Release Number = 03290003
Raw size = 3.638 TB [0x1d1c0beb0 Sectors]
Coerced size = 3.637 TB [0x1d1b00000 Sectors]
Non Coerced size = 3.637 TB [0x1d1b0beb0 Sectors]
Device Speed = 6.0Gb/s
Link Speed = 6.0Gb/s
Write cache = N/A
Logical Sector Size = 512B
Physical Sector Size = 512B
Connector Name = Port 0 - 3 & Port 4 - 7 

On ZFS machines

$ zpool status

For instruction on how to identify and replace failed disk on ZFS system. Read here