• February 25, 2025, 11:05:02 AM
  • Welcome, Guest
Please login or register.

Login with username, password and session length
Advanced search  

News:

This Forum Beta is ONLY for registered owners of D-Link products in the USA for which we have created boards at this time.

Author Topic: Why do RAID 1 failures happen?  (Read 6285 times)

ChosenGSR

  • Guest
Why do RAID 1 failures happen?
« on: April 08, 2011, 05:33:02 AM »

You keep on reading about RAID 1 failing on these devices from time to time and people either lose their data or struggle with linux to get some of it back.  Why does this happen?  Why is it unreasonable to expect it to work as advertised?  Are these hardware related issues or bugs in the software?
Logged

fordem

  • Level 10 Member
  • *****
  • Posts: 2168
Re: Why do RAID 1 failures happen?
« Reply #1 on: April 08, 2011, 06:49:56 AM »

The RAID1 failures you keep on reading about occur when people misunderstand the function of RAID1 and use it incorrectly - consider it user error - or - incorrect expectations.

RAID1 is not intended to be a form of backup, or to protect you from data loss in the event of a disk failure (that is a side effect) - it's purpose is to reduce or eliminate the downtime that would result from a disk failure.  With a RAID1 array, you still need to back up the data.

I've been using RAID personally and also implementing it professionally on servers for my clients since the mid 90's - in fact, I will not install a server that does not have some level of disk redundancy AND some form of data backup - I would rather walk away from that job that risk the "fall out" when the inevitable data loss occurs.

I've had my DNS-323 since 2006 when they were first released, it has been running a RAID1 array for much of that time, I did several disk failure simulations when I first acquired it and every one worked flawlessly (the DNS-323 was my first linux based NAS appliance and rigorous testing of any new technology is normal before deployment) - in the four years I've had it, I have one actual disk failure and again it worked flawlessly.

Based on my experiences - as far as the RAID goes, it does work as advertised - it is software RAID by the way.
Logged
RAID1 is for disk redundancy - NOT data backup - don't confuse the two.

dosborne

  • Level 5 Member
  • *****
  • Posts: 598
Re: Why do RAID 1 failures happen?
« Reply #2 on: April 08, 2011, 12:19:19 PM »

I'd put the majority of failures down to Green and Blue drives. These drives seem to have an incredibly high failure rate and are not rated for NAS use.
Logged
3 x DNS-323 with 2 x 2TB WD Drives each for a total of 12 TB Storage and Backup. Running DLink Firmware v1.08 and Fonz Fun Plug (FFP) v0.5 for improved software support.

fordem

  • Level 10 Member
  • *****
  • Posts: 2168
Re: Why do RAID 1 failures happen?
« Reply #3 on: April 08, 2011, 01:35:07 PM »

The majority of SATA drives out there are not "rated for NAS use" - SATA is a desktop technology.
Logged
RAID1 is for disk redundancy - NOT data backup - don't confuse the two.

chriso

  • Guest
Re: Why do RAID 1 failures happen?
« Reply #4 on: April 10, 2011, 09:59:23 PM »

I would say at least half of the "failures" I have read about on the DNS-323 are in fact the user not understanding what they are doing.  But I would also mention that people don't come to sites like this and post "My Raid 1 is great it saved me, exactly how it should". They come because they have problems, they want help with.

Anyway a person in the know when given a new setup like this.  Sets it up with dummy data and then tries things like breaking the mirror and rebuilding and such so that they know not only that it works, but that they understand the recovery procedure.

The people that hear "Raid 1" will save you, and know nothing about it an slap two drives in the system and hope.  Are setting themselves up for failure.  Not only will they panic and do the wrong thing if a drive goes bad, they will also play around with things they don't understand and cause their own problems.

The same goes for backups.  First off trying to get people to even do a backup usually requires them to lose their data at least once.

Second if once they are convinced, getting them to do it on a regular basis (or automatically) is really hard.

Third they never even try a recovery of data until it fails.  So they don't even know if their backup is good or not, let alone if they can properly follow the recovery steps.

One thing I really like about the DNS-323 is the method I have for backing up.  Instead of some software that backs up to some unknown compressed format, I'm using rsync to provide an efficient set of backups that I can just browse and will no special tools.

I run the two machines in my house where the personal data is all served up from the first drive on the DNS-323, and nightly that is rsync'ed to the second drive.  I have a daily history back to 2008, and I keep a current copy online.
Logged

fordem

  • Level 10 Member
  • *****
  • Posts: 2168
Re: Why do RAID 1 failures happen?
« Reply #5 on: April 11, 2011, 04:05:24 AM »

Very well said chriso, very well said.

For some users, it's RAID will save you - yes - I've seen users complain when they lost their data using a RAID0 array, and I have seen then download and run all sorts of different "recovery utilities" and complain when they don't find the drives (because they're in a NAS and not in the PC the utility is being run on).

Unfortunately this is not a home user malady - I've seen it in the commercial & even the banking sectors (I've since closed my accounts with those (yes there was more than one of them) banks.
Logged
RAID1 is for disk redundancy - NOT data backup - don't confuse the two.

chriso

  • Guest
Re: Why do RAID 1 failures happen?
« Reply #6 on: April 11, 2011, 08:10:12 PM »

 ;D I just had to laugh at the thought that people think RAID 0 will protect their when it actually doubles the chance of data lose.  In my opinion it shouldn't even be allowed on the DNS-323.  Any single drive will out perform what is needed to keep up with the maximum throughput through the DNS-323.
Logged

MJBURNS

  • Level 1 Member
  • *
  • Posts: 24
Re: Why do RAID 1 failures happen?
« Reply #7 on: April 11, 2011, 08:47:45 PM »

I find the posts above to be puzzling as they mix apples and oranges, in addition to drifting aimlessly. I've been running servers of various sorts and OS's all sorts all the way back to PDP 11-34’s in the late 70's. So I'm not quite a neophyte. (No dinosaur jokes, please.) RAID 0 is for speed at laying data down. It has zero redundancy and if either dive fails, you are screwed. RAID 1 (or higher) does give you data redundancy and helps in a crisis, but as posters above have pointed out, redundancy is not backup. If the software (or OS) told the RAID 1 (or greater) to delete a file, it is deleted. So if archival data retention is valued, and it usually is, you need to provide backup in addition to whatever data redundancy you have intrinsic to your server. I’ve never lost data with a NAS device. I’ve lost plenty of data with SAN devices mainly because the one’s I’ve worked with had poor recovery capability if a mirror was broken. All SAN devices I’ve used in RAID 1 would break their mirrors about every 6 months, and data recovery was often not possible from the SAN itself. The backup was needed. Since I’ve never experienced a broken mirror with any of the NAS devices I’ve used (albeit limited to Netgear and D-Link NAS’s), I have no idea if my backups will be needed. I’d like to believe replacing the failed drive would be sufficient, and I’d like to hear from people if this is true.
Logged

fordem

  • Level 10 Member
  • *****
  • Posts: 2168
Re: Why do RAID 1 failures happen?
« Reply #8 on: April 12, 2011, 05:41:46 AM »

I'm not certain why you feel the previous posts mix apples & oranges, or drift aimlessly - they seem pretty straight forward to me.  You do need to take into consideration that they reflect the opinions of different people, who may sometimes not agree.  I see two users who share the opinion that user error or user ignorance is a prime cause of problems, and another who feels the choice of disks is where the problem lies, an opinion with which I do not agree.

Onto the issue you raise however - backups are vital if archival data retention is an issue, however if it's not, and you're willing to risk the loss of your data (for whatever reason) then by all means don't bother about a backup - and if that is your approach, then might I suggest that you forget about RAID also - if you're willing to risk the data, then why waste 50% of the space?

You see, I know from personal experience, that data loss due to drive failure is not as much of a risk as it used to be.

Within the last decade drive reliability has increased tremendously - one of the earlier RAID implementations I did was a Y2K project, a Pentium II server with 3x4.5GB SCSI drives on a LSI RAID controller - over the first few years I experienced drive failures at the rate of perhaps one a year, that system was replaced in 2005 with another entry level system, this time running 2x80GB SATA drives in RAID1 that has yet to have a disk failure.

Both of these units were equipped with 4mm DAT drives and those drives saw plenty of use, mostly daily backups, along with the occasional restore of a file here or a folder there - we never lost data to a drive failure, but you'll occasionally have a file corrupted because of a power glitch or likewise.

Should I advise my client to stop backing up just because they haven't experienced a disk failure in almost six years?  We just added another server, 2x250GB SAS drives, RAID1 and another 4mm DAT drive and that gets backed up on daily basis also.

Closer to home - I run 3x250GB SATA drives in RAID5 on an Adaptec controller, backup is the DNS-323 - I've had no drive failures on the main server, one on the DNS-323, restores of single files (one particular file to be honest) are frequent - due to the quirky nature of a very old mail system that I really should replace.

The question you need to ask is not if replacing the failed drive is sufficent, but rather "can I afford to loose this data" and the person you need to ask is yourself.

By the way - NAS & SAN are similarly sounding technologies with significant differences in the execution - the only similarity is that they can be considered storage at the other end of a network cable.  A poorly implemented SAN installation will cause considerably more grief than a similarly shoddy NAS implementation.
Logged
RAID1 is for disk redundancy - NOT data backup - don't confuse the two.