2013年秋のファイルサーバHDD障害発生

またHDDが読み取れなくなりました。

いつもどおりの復旧作業です。
fsckします。

メモ: 2013年夏のファイルサーバHDD障害発生

# service smb stop
SMB サービスを停止中: [ OK ]
NMB サービスを停止中: [ OK ]
# fsck -fy /dev/sdb
fsck 1.39 (29-May-2006)
e2fsck 1.39 (29-May-2006)
/dev/sdb is mounted.  

WARNING!!!  Running e2fsck on a mounted filesystem may cause
SEVERE filesystem damage.

Do you really want to continue (y/n)? yes

Error reading block 1027 (Attempt to read block from filesystem resulted in short read).  Ignore error? yes

Force rewrite? yes

Resize inode not valid.  Recreate? yes

Pass 1: Checking inodes, blocks, and sizes

大量にエラーがでます。ヤバイ。



↓コレが大量に出ます。
全ブロック出てるんじゃないかというくらい。

Error reading block 1028 (Attempt to read block from filesystem resulted in short read) while doing inode scan.  Ignore error? yes

Force rewrite? yes

Error reading block 1029 (Attempt to read block from filesystem resulted in short read) while doing inode scan.  Ignore error? yes

Force rewrite? yes


(略)

Error reading block 732495873 (Attempt to read block from filesystem resulted in short read) while reading inode and block bitmaps.  Ignore error? yes

Force rewrite? yes

Error reading block 732528640 (Attempt to read block from filesystem resulted in short read) while reading inode and block bitmaps.  Ignore error? yes

Force rewrite? yes

Error reading block 732528641 (Attempt to read block from filesystem resulted in short read) while reading inode and block bitmaps.  Ignore error? yes

Force rewrite? yes

Error reading block 732561408 (Attempt to read block from filesystem resulted in short read) while reading inode and block bitmaps.  Ignore error? yes

Force rewrite? yes

Error reading block 732561409 (Attempt to read block from filesystem resulted in short read) while reading inode and block bitmaps.  Ignore error? yes

Force rewrite? yes

Warning... fsck.ext3 for device /dev/sdb exited with signal 11.

1時間半くらいかかっておわりました。
なんかほぼ全ブロックを復旧したんじゃないかと思う量です。
ログをみると1ブロックずつではないようなので流石に全部ではなかったですがログのサイズが1.72 GB (1,849,350,069 バイト)もありました。

さて、一応終わったのですが、最後の1行のWarningの意味を調べてみます。

fsck doesn't work on a partition (signal 11)

QUESTION
Signal 11, what does that mean?
ANSWER
Signal 11, or officially know as "segmentation fault", means that the program
accessed a memory location that was not assigned. That's usually a bug in the
program. So if you're writing your own program, that's the most likely cause.
However, this FAQ will concentrate on the possibilities besides that.

e2fsckのバグでメモリの割り当てを失敗するという・・・。

うーむ。
なんか面倒なのでもう一度実行してみます。

ですが、一番最初から始まった感があります。
なので待機。

待機中に考えてみて2点の疑問というか改善点というか。

  • 3Tまるごと1領域ではなくもうすこし分割したほうがいいのでは?
    (これだけ頻繁にエラー&復旧するのであれば…)
  • Linuxでファイルサーバの意味があるのか?
    (SVNサーバをやめてしまったやめてしまってファイルサーバとしての用途しかない)

特に後者、電気代もかかるしWindows上でrsync的なことができればもうファイルサーバいらないんじゃないの?
Macbookも全然つかわなくなったしたまーーーーーにnexus7にデータいれたりするだけなのでUSBでいいんじゃ…

というわけで次回、Windowsで普通にHDD接続する方法に変更するとして、fsckが終わるのを待ちます。

で、処理がおわったのでログを見てみましたが同じ場所で同じエラーになってました。
マウントしたままでfsckやっていたことに気づいたのでアンマウントしてやってみます。

# umount /dev/sdb
# fsck -fy /dev/sdb
fsck 1.39 (29-May-2006)
e2fsck 1.39 (29-May-2006)
fsck.ext2: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb
Could this be a zero-length partition?

いよいよヤバくなってきました。
スーパーブロックが壊れているような気がします。
スーパーブロックにはバックアップがあるらしいので指定してやってみます。

e2fsck

スーパーブロックのバックアップの場所はドライブのブロックサイズによって違うらしいです。

e2fsck - システム管理コマンドの説明 - Linux コマンド集 一覧表

バックアップスーパーブロックの場所は、 ファイルシステムのブロックサイズによって異なる。 ファイルシステムのブロックサイズが 1k の場合、 バックアップスーパーブロックは 8193 にある。 また、ブロックサイズが 2k の場合は 16384 に、 4k の場合は 32768 にある。

ブロックサイズを調べてみます。
対象のドライブは死んでいるので同じようにフォーマットしてある他のドライブで確認してみます。

# dumpe2fs /dev/sdc
dumpe2fs 1.39 (29-May-2006)
Filesystem volume name:   00mirror
Last mounted on:          
Filesystem UUID:          64ad8c36-92d2-41f3-914d-9eccbd11f26e
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              366297088
Block count:              732566646
Reserved block count:     36628332
Free blocks:              365925600
Free inodes:              366024015
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      849
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16384
Inode blocks per group:   512
Filesystem created:       Mon Jan 30 21:21:06 2012
Last mount time:          Sat Nov  9 04:20:24 2013
Last write time:          Sat Nov  9 04:20:24 2013
Mount count:              48
Maximum mount count:      33
Last checked:             Mon Jan 30 21:21:06 2012
Check interval:           15552000 (6 months)
Next check after:         Sat Jul 28 21:21:06 2012
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal inode:            8
Default directory hash:   tea
Directory Hash Seed:      f9daf461-1700-45e3-b271-47903e96a632
Journal backup:           inode blocks
Journal size:             128M

4096らしいです。
4kの場合は32768らしいので指定して実行します。

# fsck -fy -b 32768 /dev/sdb
fsck 1.39 (29-May-2006)
e2fsck 1.39 (29-May-2006)
fsck.ext2: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb
Could this be a zero-length partition?

だめです!!

認識してないのかも。
確認してみます。

# fdisk -l

Disk /dev/sda: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = シリンダ数 of 16065 * 512 = 8225280 bytes

デバイス Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14         242     1839442+  83  Linux
/dev/sda3             243         258      128520   82  Linux swap / Solaris

Disk /dev/sdc: 3000.5 GB, 3000592982016 bytes
255 heads, 63 sectors/track, 364801 cylinders
Units = シリンダ数 of 16065 * 512 = 8225280 bytes

ディスク /dev/sdc は正常な領域テーブルを含んでいません

Disk /dev/sdd: 3000.5 GB, 3000592982016 bytes
255 heads, 63 sectors/track, 364801 cylinders
Units = シリンダ数 of 16065 * 512 = 8225280 bytes

ディスク /dev/sdd は正常な領域テーブルを含んでいません

Disk /dev/sde: 3000.5 GB, 3000592982016 bytes
255 heads, 63 sectors/track, 364801 cylinders
Units = シリンダ数 of 16065 * 512 = 8225280 bytes

ディスク /dev/sde は正常な領域テーブルを含んでいません

Disk /dev/sdf: 3000.5 GB, 3000592982016 bytes
255 heads, 63 sectors/track, 364801 cylinders
Units = シリンダ数 of 16065 * 512 = 8225280 bytes

ディスク /dev/sdf は正常な領域テーブルを含んでいません

Disk /dev/sdg: 3000.5 GB, 3000592982016 bytes
255 heads, 63 sectors/track, 364801 cylinders
Units = シリンダ数 of 16065 * 512 = 8225280 bytes

ディスク /dev/sdg は正常な領域テーブルを含んでいません

sdbがありませんね。

# ls /dev
MAKEDEV          full     loop1   md0                 parport1  ram1   ram4     root  sde  sg6       tty1   tty18  tty26  tty34  tty42  tty50  tty59  ttyS0           usbdev2.1_ep00  usbdev5.2_ep81  vcsa2
X0R              hidraw0  loop2   mem                 parport2  ram10  ram5     rtc   sdf  shm       tty10  tty19  tty27  tty35  tty43  tty51  tty6   ttyS1           usbdev2.1_ep81  usbdev5.2_ep82  vcsa3
bus              hidraw1  loop3   net                 parport3  ram11  ram6     sda   sdg  snapshot  tty11  tty2   tty28  tty36  tty44  tty52  tty60  ttyS2           usbdev3.1_ep00  vcs             vcsa4
console          hpet     loop4   network_latency     port      ram12  ram7     sda1  sg0  stderr    tty12  tty20  tty29  tty37  tty45  tty53  tty61  ttyS3           usbdev3.1_ep81  vcs2            vcsa5
core             initctl  loop5   network_throughput  ppp       ram13  ram8     sda2  sg1  stdin     tty13  tty21  tty3   tty38  tty46  tty54  tty62  urandom         usbdev4.1_ep00  vcs3            vcsa6
cpu              input    loop6   null                ptmx      ram14  ram9     sda3  sg2  stdout    tty14  tty22  tty30  tty39  tty47  tty55  tty63  usbdev1.1_ep81  usbdev4.1_ep81  vcs4            zero
cpu_dma_latency  kmsg     loop7   nvram               pts       ram15  ramdisk  sdb   sg3  systty    tty15  tty23  tty31  tty4   tty48  tty56  tty7   usbdev1.2_ep00  usbdev5.1_ep00  vcs5
disk             log      mapper  oldmem              ram       ram2   random   sdc   sg4  tty       tty16  tty24  tty32  tty40  tty49  tty57  tty8   usbdev1.2_ep02  usbdev5.1_ep81  vcs6
fd               loop0    mcelog  parport0            ram0      ram3   rawctl   sdd   sg5  tty0      tty17  tty25  tty33  tty41  tty5   tty58  tty9   usbdev1.2_ep81  usbdev5.2_ep00  vcsa

devにはあるのに…。
もうわからなくなってきたのでリブートしてみます。

起動中にデバイス見つからないメッセージ

00mainが見つからないエラーメッセージ

物理的に認識しなくなりました。

# fdisk -l

Disk /dev/sda: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = シリンダ数 of 16065 * 512 = 8225280 bytes

デバイス Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14         242     1839442+  83  Linux
/dev/sda3             243         258      128520   82  Linux swap / Solaris

Disk /dev/sdb: 3000.5 GB, 3000592982016 bytes
255 heads, 63 sectors/track, 364801 cylinders
Units = シリンダ数 of 16065 * 512 = 8225280 bytes

ディスク /dev/sdb は正常な領域テーブルを含んでいません

Disk /dev/sdc: 3000.5 GB, 3000592982016 bytes
255 heads, 63 sectors/track, 364801 cylinders
Units = シリンダ数 of 16065 * 512 = 8225280 bytes

ディスク /dev/sdc は正常な領域テーブルを含んでいません

Disk /dev/sdd: 3000.5 GB, 3000592982016 bytes
255 heads, 63 sectors/track, 364801 cylinders
Units = シリンダ数 of 16065 * 512 = 8225280 bytes

ディスク /dev/sdd は正常な領域テーブルを含んでいません

Disk /dev/sde: 3000.5 GB, 3000592982016 bytes
255 heads, 63 sectors/track, 364801 cylinders
Units = シリンダ数 of 16065 * 512 = 8225280 bytes

ディスク /dev/sde は正常な領域テーブルを含んでいません

Disk /dev/sdf: 3000.5 GB, 3000592982016 bytes
255 heads, 63 sectors/track, 364801 cylinders
Units = シリンダ数 of 16065 * 512 = 8225280 bytes

ディスク /dev/sdf は正常な領域テーブルを含んでいません

sdbだったものがいなくなって1つずつつめられてsdgがなくなってます。

# cd /dev
[root@localhost dev]# ls
MAKEDEV          hpet     loop7               parport1  ram12  ramdisk  sde       stdout  tty17  tty28  tty39  tty5   tty60    usbdev1.1_ep81  usbdev5.2_ep00  vcsa4
X0R              initctl  mapper              parport2  ram13  random   sdf       systty  tty18  tty29  tty4   tty50  tty61    usbdev1.2_ep00  usbdev5.2_ep81  vcsa5
bus              input    mcelog              parport3  ram14  rawctl   sg0       tty     tty19  tty3   tty40  tty51  tty62    usbdev1.2_ep02  usbdev5.2_ep82  vcsa6
console          kmsg     md0                 port      ram15  root     sg1       tty0    tty2   tty30  tty41  tty52  tty63    usbdev1.2_ep81  vcs             zero
core             log      mem                 ppp       ram2   rtc      sg2       tty1    tty20  tty31  tty42  tty53  tty7     usbdev2.1_ep00  vcs2
cpu              loop0    net                 ptmx      ram3   sda      sg3       tty10   tty21  tty32  tty43  tty54  tty8     usbdev2.1_ep81  vcs3
cpu_dma_latency  loop1    network_latency     pts       ram4   sda1     sg4       tty11   tty22  tty33  tty44  tty55  tty9     usbdev3.1_ep00  vcs4
disk             loop2    network_throughput  ram       ram5   sda2     sg5       tty12   tty23  tty34  tty45  tty56  ttyS0    usbdev3.1_ep81  vcs5
fd               loop3    null                ram0      ram6   sda3     shm       tty13   tty24  tty35  tty46  tty57  ttyS1    usbdev4.1_ep00  vcs6
full             loop4    nvram               ram1      ram7   sdb      snapshot  tty14   tty25  tty36  tty47  tty58  ttyS2    usbdev4.1_ep81  vcsa
hidraw0          loop5    oldmem              ram10     ram8   sdc      stderr    tty15   tty26  tty37  tty48  tty59  ttyS3    usbdev5.1_ep00  vcsa2
hidraw1          loop6    parport0            ram11     ram9   sdd      stdin     tty16   tty27  tty38  tty49  tty6   urandom  usbdev5.1_ep81  vcsa3

こっちにもいません。

BIOSレベルで認識してないのかもしれません。
確認してみます。

BIOSで認識したドライブは6つ

認識してない感じです…。
電源を切るために一旦そのまま起動…とやってみたところ認識してました。
謎です。
また認識しなくなる前にとりあえずrsyncしておきます。
怖ろしい。

deleteオプション付だと危なすぎるので無しで手動で実行です。

#rsync -av /var/storage/00/main/data /var/storage/00/mirror >> /var/storage/00/main/scripts/log/sync_storage_201311101129.log
#rsync -av /var/storage/01/main/data /var/storage/01/mirror >> /var/storage/00/main/scripts/log/sync_storage_201311101134.log
#rsync -av /var/storage/02/main/data /var/storage/02/mirror >> /var/storage/00/main/scripts/log/sync_storage_201311101135.log

無事終わったのでfsckしておいてみます。

# fsck -fy /dev/sdb
fsck 1.39 (29-May-2006)
e2fsck 1.39 (29-May-2006)
/dev/sdb is mounted.  

WARNING!!!  Running e2fsck on a mounted filesystem may cause
SEVERE filesystem damage.

Do you really want to continue (y/n)? yes

00main: recovering journal
Pass 1: Checking inodes, blocks, and sizes
(略)
Pass 5: Checking group summary information

00main: ***** FILE SYSTEM WAS MODIFIED *****
00main: 273184/366297088 files (3.0% non-contiguous), 366711110/732566646 blocks

正常におわりました。
ではSMARTしてみます。

# smartctl -t short /dev/sdb
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.18-308.el5] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Sun Nov 10 12:51:35 2013

Use smartctl -X to abort test.

そして確認

# smartctl -a /dev/sdb
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.18-308.el5] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     TOSHIBA DT01ACA300
Serial Number:    53HV57EGS
LU WWN Device Id: 5 000039 ff4cbe551
Firmware Version: MX6OABB0
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Sun Nov 10 12:58:09 2013 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (22652) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   140   140   054    Pre-fail  Offline      -       68
  3 Spin_Up_Time            0x0007   253   253   024    Pre-fail  Always       -       163 (Average 169)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       31
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   121   121   020    Pre-fail  Offline      -       34
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       2680
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       13
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       42
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       42
194 Temperature_Celsius     0x0002   150   150   000    Old_age   Always       -       40 (Min/Max 25/44)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       2

SMART Error Log Version: 1
ATA Error Count: 2
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 occurred at disk power-on lifetime: 2671 hours (111 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 10 28 ab 01 00  Error: ICRC, ABRT at LBA = 0x0001ab28 = 109352

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 40 f8 aa 01 e0 08      23:57:42.093  WRITE DMA
  27 00 00 00 00 00 e0 08      23:57:42.093  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 08      23:57:42.090  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08      23:57:42.086  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 08      23:57:42.086  READ NATIVE MAX ADDRESS EXT

Error 1 occurred at disk power-on lifetime: 2671 hours (111 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 20 18 ab 01 00  Error: ICRC, ABRT at LBA = 0x0001ab18 = 109336

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 40 f8 aa 01 e0 08      23:57:41.930  WRITE DMA
  c8 00 08 08 91 01 e0 08      23:57:41.927  READ DMA
  c8 00 08 b0 50 00 e0 08      23:57:41.908  READ DMA
  b0 d5 01 09 4f c2 00 08      23:57:38.802  SMART READ LOG
  b0 d5 01 06 4f c2 00 08      23:57:38.562  SMART READ LOG

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      2679         -
# 2  Short offline       Completed without error       00%      2679         -
# 3  Short offline       Completed without error       00%      2679         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

ちょっとエラーが出ているようですが、問題無さそうですが、よく見るとTOSHIBAとなってます。
別のドライブの情報になってます。
前回起動時のドライブ認識しなかった時の結果かも知れません。
この際だから他のドライブも見てみます。

# smartctl -a /dev/sdc
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.18-308.el5] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     ST3000DM001-9YN166
Serial Number:    W1F06H8R
LU WWN Device Id: 5 000c50 044de7db0
Firmware Version: CC47
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Sun Nov 10 13:03:52 2013 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  584) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   112   099   006    Pre-fail  Always       -       46780064
  3 Spin_Up_Time            0x0003   094   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       120
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   046   039   030    Pre-fail  Always       -       1060868547439
  9 Power_On_Hours          0x0032   083   083   000    Old_age   Always       -       15487
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       117
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   062   040   045    Old_age   Always   In_the_past 38 (20 35 39 38 0)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       92
193 Load_Cycle_Count        0x0032   047   047   000    Old_age   Always       -       107462
194 Temperature_Celsius     0x0022   038   060   000    Old_age   Always       -       38 (0 17 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       88772679047248
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       6796113683641
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       943814641006

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     15486         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Serial Numberが違うのでコレも違うようです。

次のディスクへ。

# smartctl -a /dev/sdd
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.18-308.el5] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (Adv. Format)
Device Model:     WDC WD30EZRX-00MMMB0
Serial Number:    WD-WMAWZ0384322
LU WWN Device Id: 5 0014ee 206f9e16d
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sun Nov 10 13:05:51 2013 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (49380) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   200   151   021    Pre-fail  Always       -       6983
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       120
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   085   085   000    Old_age   Always       -       11450
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       118
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       94
193 Load_Cycle_Count        0x0032   179   179   000    Old_age   Always       -       65417
194 Temperature_Celsius     0x0022   117   093   000    Old_age   Always       -       35
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

# smartctl -a /dev/sde
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.18-308.el5] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (Adv. Format)
Device Model:     WDC WD30EZRX-00MMMB0
Serial Number:    WD-WMAWZ0381888
LU WWN Device Id: 5 0014ee 25c4f0a2e
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sun Nov 10 13:07:17 2013 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (51000) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   202   150   021    Pre-fail  Always       -       6858
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       120
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   085   085   000    Old_age   Always       -       11456
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       118
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       93
193 Load_Cycle_Count        0x0032   181   181   000    Old_age   Always       -       59698
194 Temperature_Celsius     0x0022   119   087   000    Old_age   Always       -       33
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

# smartctl -a /dev/sdf
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.18-308.el5] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (Adv. Format)
Device Model:     WDC WD30EZRX-00MMMB0
Serial Number:    WD-WCAWZ2949188
LU WWN Device Id: 5 0014ee 20758c64b
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sun Nov 10 13:07:52 2013 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (51000) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   198   156   021    Pre-fail  Always       -       7075
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       86
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   087   087   000    Old_age   Always       -       9624
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       86
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       64
193 Load_Cycle_Count        0x0032   185   185   000    Old_age   Always       -       46253
194 Temperature_Celsius     0x0022   118   097   000    Old_age   Always       -       34
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       6
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

WDのディスク3つはとってもクリーンでした。

# smartctl -a /dev/sdg
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.18-308.el5] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     ST3000DM001-1CH166
Serial Number:    Z1F10XY2
LU WWN Device Id: 5 000c50 04e207527
Firmware Version: CC43
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Sun Nov 10 13:08:42 2013 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  584) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   117   099   006    Pre-fail  Always       -       121869464
  3 Spin_Up_Time            0x0003   097   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       137
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   064   061   030    Pre-fail  Always       -       2725211
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       9638
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       80
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   065   049   045    Old_age   Always       -       35 (Min/Max 34/35)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       118
193 Load_Cycle_Count        0x0032   080   080   000    Old_age   Always       -       40017
194 Temperature_Celsius     0x0022   035   051   000    Old_age   Always       -       35 (0 21 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       92505005621977
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       9201817624
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       30659811275

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

最後の一つ、コレが該当のディスクでした。
でもエラーはでてないですね。
物理的な故障ではなかったということです。
このまま使い続けられそう。

ですが、Linuxファイルサーバは廃止します。
それは次回。

コメント