Linux device management

From DISI
Jump to navigation Jump to search

DEVICE MANAGEMENT/CHECKING

To see if there was a disk failure on a machine type:

cat /proc/mdstat

You’ll see something that looks like this:

[root@server ~]# cat /proc/mdstat
Personalities : [raidZ]
md0 : active raidZ sdb1[1] sda1[0]
  	128384 blocks [2/2] [UU]
 	 
md1 : active raidZ sdb2[1] sda2[0]
  	8385856 blocks [2/2] [UU]
 	 
md2 : active raidZ sdb3[1] sda3[0]
  	147773824 blocks [2/2] [UU]

If there is a disk failure (pretend sdb1 failed) you would see something like this:

[root@server ~]# cat /proc/mdstat
Personalities : [raidZ]
md0 : active raidZ sdb1[1] (F) sda1[0]
  	128384 blocks [2/2] [_U]
 	 
md1 : active raidZ sdb2[1] sda2[0]
  	8385856 blocks [2/2] [UU]
 	 
md2 : active raidZ sdb3[1] sda3[0]
  	147773824 blocks [2/2] [UU]

So, there would be a “(F)” after the sdb1[1] and the [UU] would change to [_U]

Another way to check the state of any device is by using smartctl:

You can only check one device at a time, but it gives very extensive information. For example:

smartctl -a /dev/sda

[root@server ~]# smartctl -a /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.18-371.1.2.el5] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model: 	GB0160CAABV
Serial Number:	5RXAAWFP
Firmware Version: HPG1
User Capacity:	160,041,885,696 bytes [160 GB]
Sector Size:  	512 bytes logical/physical
Device is:    	Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 4a
Local Time is:	Wed Feb 12 10:38:15 2014 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
   				 was completed without error.
   				 Auto Offline Data Collection: Enabled.
Self-test execution status:  	(   0)    The previous self-test routine completed
   				 without error or no self-test has ever
   				 been run.
Total time to complete Offline
data collection:    	 (  433) seconds.
Offline data collection
capabilities:    		  (0x5b) SMART execute Offline immediate.
   				 Auto Offline data collection on/off support.
   				 Suspend Offline collection upon new
   				 command.
   				 Offline surface scan supported.
   				 Self-test supported.
   				 No Conveyance Self-test supported.
   				 Selective Self-test supported.
SMART capabilities:        	(0x0003)    Saves SMART data before entering
   				 power-saving mode.
   				 Supports SMART auto save timer.
Error logging capability:    	(0x01)    Error logging supported.
   				 General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (  54) minutes.
SCT capabilities:        	(0x003d)    SCT Status supported.
   				 SCT Error Recovery Control supported.
   				 SCT Feature Control supported.
   				 SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME      	FLAG 	VALUE WORST THRESH TYPE  	UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 	0x000f   100   253   006	Pre-fail  Always   	-   	0
  3 Spin_Up_Time        	0x0002   098   097   000	Old_age   Always   	-   	0
  4 Start_Stop_Count    	0x0033   100   100   020	Pre-fail  Always   	-   	54
  5 Reallocated_Sector_Ct       0x0033   100   100   036	Pre-fail  Always   	-   	0
  7 Seek_Error_Rate     	0x000f   081   060   030	Pre-fail  Always   	-   	164061635
  9 Power_On_Hours      	0x0032   049   049   000	Old_age   Always   	-   	44875
 10 Spin_Retry_Count    	0x0013   100   100   097	Pre-fail  Always   	-   	0
 12 Power_Cycle_Count   	0x0033   100   100   020	Pre-fail  Always   	-   	54
184 End-to-End_Error    	0x0032   100   253   000	Old_age   Always   	-   	0
187 Reported_Uncorrect  	0x003a   100   100   051	Old_age   Always   	-   	0
189 High_Fly_Writes     	0x0022   100   100   000	Old_age   Always   	-   	0
190 Airflow_Temperature_Cel     0x001a   074   064   000	Old_age   Always   	-   	26 (Min/Max 23/29)
194 Temperature_Celsius 	0x0000   026   040   000	Old_age   Offline  	-   	26 (0 16 0 0 0)
195 Hardware_ECC_Recovered      0x0032   074   063   000	Old_age   Always   	-   	158710899
197 Current_Pending_Sector      0x0000   100   100   000	Old_age   Offline  	-   	0
198 Offline_Uncorrectable       0x0000   100   100   000	Old_age   Offline  	-   	0
199 UDMA_CRC_Error_Count	0x0000   200   200   000	Old_age   Offline  	-   	0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1    	0    	0  Not_testing
	2    	0    	0  Not_testing
	3    	0    	0  Not_testing
	4    	0    	0  Not_testing
	5    	0    	0  Not_testing
Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.

If you type ls /dev and there are no devices that start with sd, you can check the status of the drives this way:

hpacucli
=> ctrl all show config

Here’s a great hpacucli reference:

http://www.google.com/url?q=http%3A%2%2Fh20565.www2.hp.com%2Fportal%2Fsite%2Fhpsc%2Ftemplate.PAGE%2Fpublic%2Fpsi%2FmostViewedDisplay%3Fjavax.portlet.begCacheTok%3Dcom.vignette.cachetoken%26javax.portlet.endCacheTok%3Dcom.vignette.cachetoken%26javax.portlet.prp_efb5c0793523e51970c8fa22b053ce01%3Dwsrp-navigationalState%253DdocId%25253Demr_na-c03493210-1%25257CdocLocale%25253Den_US%26javax.portlet.tpst%3Defb5c0793523e51970c8fa22b053ce01%26sp4ts.oid%3D5177957%26ac.admitted%3D1392232572656.876444892.492883150&sa=D&sntz=1&usg=AFQjCNHzyW07LNs1-qCqC0ZUcnOhjJHjjA

To see a list of all pci devices

lscpi -v | more

To see the make and model of the machine (serial number, etc)

As root type:

dmidecode

Also iostat and iotop is great