Linux device management: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
No edit summary
 
Line 1: Line 1:
DEVICE MANAGEMENT/CHECKING
=DEVICE MANAGEMENT/CHECKING=


To see if there was a disk failure on a machine type:
To see if there was a disk failure on a machine type:
Line 99: Line 99:
   3 Spin_Up_Time        0x0002  098  097  000 Old_age  Always  -  0
   3 Spin_Up_Time        0x0002  098  097  000 Old_age  Always  -  0
   4 Start_Stop_Count    0x0033  100  100  020 Pre-fail  Always  -  54
   4 Start_Stop_Count    0x0033  100  100  020 Pre-fail  Always  -  54
   5 Reallocated_Sector_Ct   0x0033  100  100  036 Pre-fail  Always  -  0
   5 Reallocated_Sector_Ct       0x0033  100  100  036 Pre-fail  Always  -  0
   7 Seek_Error_Rate    0x000f  081  060  030 Pre-fail  Always  -  164061635
   7 Seek_Error_Rate    0x000f  081  060  030 Pre-fail  Always  -  164061635
   9 Power_On_Hours      0x0032  049  049  000 Old_age  Always  -  44875
   9 Power_On_Hours      0x0032  049  049  000 Old_age  Always  -  44875
Line 107: Line 107:
187 Reported_Uncorrect  0x003a  100  100  051 Old_age  Always  -  0
187 Reported_Uncorrect  0x003a  100  100  051 Old_age  Always  -  0
189 High_Fly_Writes    0x0022  100  100  000 Old_age  Always  -  0
189 High_Fly_Writes    0x0022  100  100  000 Old_age  Always  -  0
190 Airflow_Temperature_Cel 0x001a  074  064  000 Old_age  Always  -  26 (Min/Max 23/29)
190 Airflow_Temperature_Cel     0x001a  074  064  000 Old_age  Always  -  26 (Min/Max 23/29)
194 Temperature_Celsius 0x0000  026  040  000 Old_age  Offline  -  26 (0 16 0 0 0)
194 Temperature_Celsius 0x0000  026  040  000 Old_age  Offline  -  26 (0 16 0 0 0)
195 Hardware_ECC_Recovered 0x0032  074  063  000 Old_age  Always  -  158710899
195 Hardware_ECC_Recovered     0x0032  074  063  000 Old_age  Always  -  158710899
197 Current_Pending_Sector 0x0000  100  100  000 Old_age  Offline  -  0
197 Current_Pending_Sector     0x0000  100  100  000 Old_age  Offline  -  0
198 Offline_Uncorrectable   0x0000  100  100  000 Old_age  Offline  -  0
198 Offline_Uncorrectable       0x0000  100  100  000 Old_age  Offline  -  0
199 UDMA_CRC_Error_Count 0x0000  200  200  000 Old_age  Offline  -  0
199 UDMA_CRC_Error_Count 0x0000  200  200  000 Old_age  Offline  -  0


Line 131: Line 131:
If Selective self-test is pending on power-up, resume after 0 minute delay.
If Selective self-test is pending on power-up, resume after 0 minute delay.


If you type ls /dev and there are no devices that start with sd, you can check the status of the drives this way:
==If you type ls /dev and there are no devices that start with sd, you can check the status of the drives this way:==
 
  hpacucli
  hpacucli
=> ctrl all show config
=> ctrl all show config


Here’s a great hpacucli reference:
Here’s a great hpacucli reference:
  http://www.google.com/url?q=http%3A%2F%2Fh20565.www2.hp.com%2Fportal%2Fsite%2Fhpsc%2Ftemplate.PAGE%2Fpublic%2Fpsi%2FmostViewedDisplay%3Fjavax.portlet.begCacheTok%3Dcom.vignette.cachetoken%26javax.portlet.endCacheTok%3Dcom.vignette.cachetoken%26javax.portlet.prp_efb5c0793523e51970c8fa22b053ce01%3Dwsrp-navigationalState%253DdocId%25253Demr_na-c03493210-1%25257CdocLocale%25253Den_US%26javax.portlet.tpst%3Defb5c0793523e51970c8fa22b053ce01%26sp4ts.oid%3D5177957%26ac.admitted%3D1392232572656.876444892.492883150&sa=D&sntz=1&usg=AFQjCNHzyW07LNs1-qCqC0ZUcnOhjJHjjA
  http://www.google.com/url?q=http%3A%2%2Fh20565.www2.hp.com%2Fportal%2Fsite%2Fhpsc%2Ftemplate.PAGE%2Fpublic%2Fpsi%2FmostViewedDisplay%3Fjavax.portlet.begCacheTok%3Dcom.vignette.cachetoken%26javax.portlet.endCacheTok%3Dcom.vignette.cachetoken%26javax.portlet.prp_efb5c0793523e51970c8fa22b053ce01%3Dwsrp-navigationalState%253DdocId%25253Demr_na-c03493210-1%25257CdocLocale%25253Den_US%26javax.portlet.tpst%3Defb5c0793523e51970c8fa22b053ce01%26sp4ts.oid%3D5177957%26ac.admitted%3D1392232572656.876444892.492883150&sa=D&sntz=1&usg=AFQjCNHzyW07LNs1-qCqC0ZUcnOhjJHjjA
 
To see a list of all pci devices


==To see a list of all pci devices==
  lscpi -v | more
  lscpi -v | more


==To see the make and model of the machine (serial number, etc)==
As root type:
dmidecode
Also iostat and iotop is great


[[Category:Sysadmin]]
[[Category:Sysadmin]]

Latest revision as of 17:29, 30 June 2016

DEVICE MANAGEMENT/CHECKING

To see if there was a disk failure on a machine type:

cat /proc/mdstat

You’ll see something that looks like this:

[root@server ~]# cat /proc/mdstat
Personalities : [raidZ]
md0 : active raidZ sdb1[1] sda1[0]
  	128384 blocks [2/2] [UU]
 	 
md1 : active raidZ sdb2[1] sda2[0]
  	8385856 blocks [2/2] [UU]
 	 
md2 : active raidZ sdb3[1] sda3[0]
  	147773824 blocks [2/2] [UU]

If there is a disk failure (pretend sdb1 failed) you would see something like this:

[root@server ~]# cat /proc/mdstat
Personalities : [raidZ]
md0 : active raidZ sdb1[1] (F) sda1[0]
  	128384 blocks [2/2] [_U]
 	 
md1 : active raidZ sdb2[1] sda2[0]
  	8385856 blocks [2/2] [UU]
 	 
md2 : active raidZ sdb3[1] sda3[0]
  	147773824 blocks [2/2] [UU]

So, there would be a “(F)” after the sdb1[1] and the [UU] would change to [_U]

Another way to check the state of any device is by using smartctl:

You can only check one device at a time, but it gives very extensive information. For example:

smartctl -a /dev/sda

[root@server ~]# smartctl -a /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.18-371.1.2.el5] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model: 	GB0160CAABV
Serial Number:	5RXAAWFP
Firmware Version: HPG1
User Capacity:	160,041,885,696 bytes [160 GB]
Sector Size:  	512 bytes logical/physical
Device is:    	Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 4a
Local Time is:	Wed Feb 12 10:38:15 2014 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
   				 was completed without error.
   				 Auto Offline Data Collection: Enabled.
Self-test execution status:  	(   0)    The previous self-test routine completed
   				 without error or no self-test has ever
   				 been run.
Total time to complete Offline
data collection:    	 (  433) seconds.
Offline data collection
capabilities:    		  (0x5b) SMART execute Offline immediate.
   				 Auto Offline data collection on/off support.
   				 Suspend Offline collection upon new
   				 command.
   				 Offline surface scan supported.
   				 Self-test supported.
   				 No Conveyance Self-test supported.
   				 Selective Self-test supported.
SMART capabilities:        	(0x0003)    Saves SMART data before entering
   				 power-saving mode.
   				 Supports SMART auto save timer.
Error logging capability:    	(0x01)    Error logging supported.
   				 General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (  54) minutes.
SCT capabilities:        	(0x003d)    SCT Status supported.
   				 SCT Error Recovery Control supported.
   				 SCT Feature Control supported.
   				 SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME      	FLAG 	VALUE WORST THRESH TYPE  	UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 	0x000f   100   253   006	Pre-fail  Always   	-   	0
  3 Spin_Up_Time        	0x0002   098   097   000	Old_age   Always   	-   	0
  4 Start_Stop_Count    	0x0033   100   100   020	Pre-fail  Always   	-   	54
  5 Reallocated_Sector_Ct       0x0033   100   100   036	Pre-fail  Always   	-   	0
  7 Seek_Error_Rate     	0x000f   081   060   030	Pre-fail  Always   	-   	164061635
  9 Power_On_Hours      	0x0032   049   049   000	Old_age   Always   	-   	44875
 10 Spin_Retry_Count    	0x0013   100   100   097	Pre-fail  Always   	-   	0
 12 Power_Cycle_Count   	0x0033   100   100   020	Pre-fail  Always   	-   	54
184 End-to-End_Error    	0x0032   100   253   000	Old_age   Always   	-   	0
187 Reported_Uncorrect  	0x003a   100   100   051	Old_age   Always   	-   	0
189 High_Fly_Writes     	0x0022   100   100   000	Old_age   Always   	-   	0
190 Airflow_Temperature_Cel     0x001a   074   064   000	Old_age   Always   	-   	26 (Min/Max 23/29)
194 Temperature_Celsius 	0x0000   026   040   000	Old_age   Offline  	-   	26 (0 16 0 0 0)
195 Hardware_ECC_Recovered      0x0032   074   063   000	Old_age   Always   	-   	158710899
197 Current_Pending_Sector      0x0000   100   100   000	Old_age   Offline  	-   	0
198 Offline_Uncorrectable       0x0000   100   100   000	Old_age   Offline  	-   	0
199 UDMA_CRC_Error_Count	0x0000   200   200   000	Old_age   Offline  	-   	0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1    	0    	0  Not_testing
	2    	0    	0  Not_testing
	3    	0    	0  Not_testing
	4    	0    	0  Not_testing
	5    	0    	0  Not_testing
Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.

If you type ls /dev and there are no devices that start with sd, you can check the status of the drives this way:

hpacucli
=> ctrl all show config

Here’s a great hpacucli reference:

http://www.google.com/url?q=http%3A%2%2Fh20565.www2.hp.com%2Fportal%2Fsite%2Fhpsc%2Ftemplate.PAGE%2Fpublic%2Fpsi%2FmostViewedDisplay%3Fjavax.portlet.begCacheTok%3Dcom.vignette.cachetoken%26javax.portlet.endCacheTok%3Dcom.vignette.cachetoken%26javax.portlet.prp_efb5c0793523e51970c8fa22b053ce01%3Dwsrp-navigationalState%253DdocId%25253Demr_na-c03493210-1%25257CdocLocale%25253Den_US%26javax.portlet.tpst%3Defb5c0793523e51970c8fa22b053ce01%26sp4ts.oid%3D5177957%26ac.admitted%3D1392232572656.876444892.492883150&sa=D&sntz=1&usg=AFQjCNHzyW07LNs1-qCqC0ZUcnOhjJHjjA

To see a list of all pci devices

lscpi -v | more

To see the make and model of the machine (serial number, etc)

As root type:

dmidecode

Also iostat and iotop is great