Users browsing this forum: No registered users and 1 guest
Server Nerds Thread
This may only be of interest to me and wolfer99, so I figure I should give the topic its own ghetto.
I RMA'd an old Seagate 1.5TB drive that was close to its 5 year warranty expiration, got a refurb drive and that one was flaky and kept dropping off the SATA port (tested on 3 SATA ports on 2 machines), so I had to return that as well (bah!)
Finally a few days ago, I got the replacement for the replacement and decided to give it a really tough test. So, I ran 4 passes of preclear, as I figure that should be enough. I only usually run one pass, and not many people do more than 2 or 3. It only took about 72 hours, which isn't so bad compared to a 4TB drive which would have taken over a week.
Anyhoo, that passed and it seems OK, and it went in Server 2 (N36L) which has the smaller drives (2TB or lower), but I've heard some noises from one of the drives, and I suspected it was the Hitachi drive (it's sleeping in the pic below), but when I actually switched to the SMART view, it turns out the Seagate drive looks to be on the way out. Bloody Seagate, I don't know why I trusted them. That one had a 3 year warranty, but it expired in July 2013 and I really don't know if it had started playing up before then, as I had not kept a good eye on the SMART reports and the warranty expiration.
Look at that thing, dying on its arse!
Now what I'm concerned about is that drive may be about to die, I don't know if I trust the 1.5TB Seagate either and the 2TB Hitachi may be making funny noises. For now, I'm moving important data off the 2TB Seagate and then I'm going to keep a very close eye on that drive to see if the reallocated sector count increases. It's already pretty high. And I'll write some data to it and do another parity check at some point. So far, the other 2 drives I have suspicions about have no reported errors, so in theory, the RAID array will be protected if that one dying drive does die, but I will monitor those drives closely as well.
Anyhoo, I figured I should keep a better eye on the SMART reports and so will install SMARTHistory on both servers. On MICRO, my primary server with the bigger drives, I went through all the drives and checked the warranty expiration and entered that in to the notes section for each drive, so it means I can actually sort the drives by warranty expiration!
Therefore, when the warranty is about to expire, I can do a SMART check to see if the drive needs replacing.
Ooh, nice new drives!
Yep, that reallocated sector count is very suspicious. Not yet catastrophic, but def reason for concern. And the hours on arent really that high yet. Man, that is such a different seagate story than we usually here. I mean I get that there is never a 100% but you seam to have no luck.
I would say keep an eye on the smart reports and see if somethings go up. Apart from that, with 4 preclear runs and a currently clean smart report, you should be good. But I know, once you lose the trust, it's tough not to be suspicious. I caught myself freaking out before because some random value changed from 0 to 1. Guess it's important to use the tools and reports in a reasonable way. These drives get older, they'll have errors but there is a lot of design in place to capture these errors, use 'fresh' blocks and not let a data corruption happen.
Thats a good idea. I mean, theoretically you are safe as long as only one drives died but why risk it. From my understanding a) once it starts to reallocate sectors it probably will continue and b) you have some time to keep an eye on it because these drives have quite a number of spare blocks to reallocate to. Unfortunately the smart notification isnt really fancy out of the box. I wish there were some nice 'weekly health report' and 'immediate action required' report based on the smart data that gets mailed out. You can always check the gui, but sometimes you forget, things come up blablabla.
That is a really good idea! I just had a WD 2TB that brought up some read errors in the syslog during the parity check. Some weirdness in the smart report as well. And when I checked the warranty, it was of course expicted July of this year. Same here, I dont know if there was anything suspicious in the report back in July, but now it's too late anyway. I replaced it with one hot standby that I hat precleared and will probably use it in my future backup server, after running it through a couple of preclear cycles.
My current status:
With my drives getting older, I'll follow the same procedure and add the warranty info. So far I have only added drives to the array but now with the 4TB drives here and bigger coming, I'll start replacing the 2TB ones. And using a combination of warranty expiration plus smart errors should give me the most likely candidates.
The only thing that concerns me, is that the brandnew unassigned 4TB drive has 28 high_fly_writes after a single preclear run. Never seen that before and it seems to be a recent addition to the smart values. I'll just started another round of preclear to see if something changes.
I wouldn't worry about high-fly writes, it's just a warning, it's not like a head crash. Most of my SGs had those and a lot more than you have there.
You can reduce the load cycle count (head parks/spindowns) with the wdidle utility. I think that's available for Lunix. Be sure it's compatible with the drive and firmware. It's a known problem with WD EARS drives running in Lunix NAS boxen. I ran it on all of my EARS drives of a certain vintage.
400 reallocated sectors is a lot. I had a couple of old (5 years old) 1.5TB SGs with about 100 reallocated, but that number never went up. If there's just one more reallocated -- i.e. it rises, I'm going to pull that drive!
Interesting, never heard about wdidle. I'm going to check that one out. Not sure how long I'll be using the 2TB in prod, but they'll most probably end up in my backup system, so it should be worth it.
Agreed, if you see the number go up, I'd pull it too. Btw, do you have 'preclear rig' or do you hook up new drives to one of your prod servers? Was thinking of setting some small box up just for that purpose as not to interfere with prod too much.
I preclear on the N36L. It doesn't run any add-ons and has 3GB RAM, so I don't have to worry about running out of lowmem. It also has a SATA hotswap tray, so it's very easy to bung a drive in!
Finally got around to running the wdidle3 command on my drives. Funny, it took me until yesterday, that the tool is called WD-idle, not w-didle. I was sitting here thinking, these hardware guys sure have funny names ... silly me.
Anyway, changed the timeout on 4 of 5 drives from 8 to 300 seconds. the 5fth one is connected to my HBA and wasnt recognized by the tool. have to reconnect it temporarily one of these days ... but 4 of 5 is not bad.
then tried to upgrade 5.0 -> 5.03 (use the planned downtime when you can!) but ran into trouble with wget causing libc segmentfaults. spend 20 minutes online, updated some stuff but couldnt fix it. so, downgrade to 5.0. that's why make backups of the install before upgrading. all good now.
You can also run wget on a Windows box. I've never found any problems downloading just using a Pee Cee web browser.
I removed the dodgy 2TB drive from the array and precleared it (minus the pre-read). The reallocated count didn't go up, but I still didn't trust it, so I assigned it as a cache drive and moved data to it that I was either going to delete, or don't care about -- e.g. old OS drive images. After writing about 1TB, the reallocated count went up by 2 more sectors. So, I think it's going to die, but I don't really care if the data goes with it. In the meantime, it means I now have 3TB free on the MICRO server. Woohoo!
Oh yeah, I know. I was talking about the wget some of the plg run during startup. sabnzbd, sickbeard, apcuspd etc pp. to get the most recent binaries and/or metainfo. one of them fails downloading because wget fails. but, going back to 5.0 took care of that and there really isnt anything in 5.01-5.03 that I need. and a new kernel always breaks something.
Yeah, that's very suspicious. good thing you got rid of it from your prod data. 3TB free? excellent. will be a while before I have that data available. getting another 4TB shortly and then will slowly but surely remove 2TB from the main server and start building the backup one with all the spare 2TB drives I got. same here, if something dies in there, too bad, it will simply be replaced. I hope the smart-improvements (mail support, weekly reports, diff reports etc) that are currently being discussed find their way into unraid. even as an addon, I'd like to see some improvements. until then -> daily visits to unmenu / smart view and smart history
Holy Macaroni! I guess if we ever need a backup of the Internet, that's our guy ...
I need to save that for whenever folks question why I need 24TB ...
Wolfer, we need to stop adding hard drives. Another 3TB and you are a confirmed pedo:
Read more: http://www.dailymail.co.uk/news/article ... z2nsDQyG2H
oh geez, next thing we'll be aiding terrorists somehow with all that storage ...
I wonder what that makes your 160+TB buddy you mentioned earlier.
so, have you been watching your smart history as you planned? anything weird going on? I only have 'high power on' numbers and one drive that has 6 current_pending_sectors. probably going to replace that first with one of the new 4TB drives I got. any news on 5tb/6tb drives?
I didn't set up SH in the end. Can't remember why, but I think sth to do with PHP.
The SG 4TB DM has high load cycle count which you can fix with a HDPARM command, so may do that at some point.
Dunno about 5TB but if it has been interesting to see the development of a 64 bit version of unRAID. I hope that comes to fruition.
Who is online
Users browsing this forum: No registered users and 1 guest