• February 25, 2025, 04:28:15 AM
  • Welcome, Guest
Please login or register.

Login with username, password and session length
Advanced search  

News:

This Forum Beta is ONLY for registered owners of D-Link products in the USA for which we have created boards at this time.

Pages: [1] 2

Author Topic: Firmware 1.05 introduces new flaws in RAID resynch  (Read 15221 times)

fordem

  • Level 10 Member
  • *****
  • Posts: 2168
Firmware 1.05 introduces new flaws in RAID resynch
« on: May 13, 2008, 04:22:07 PM »

It would be interesting to know what changes were made in the disk error detect and format routines.

I upgraded with a "mismatched" RAID1 pair - a Seagate & a Maxtor, both 7200 rpm 250GB drives - and after looking through the changes in the web admin, shut it down and replaced the Maxtor with a Seagate, something that I had been planning to do so as to get matched pairs, not only in the DNS-323, but in an IBM xSeries server as well.

The first thing I noticed is the box did not give me any error indications - no pink LED and no email - which is what it had done in a similar situation with fw 1.04 - I logged into the web admin page and was greeted with the expected offer to format the drive, but this time, there is a check box in the corner that apparently will let you configure it as a RAID pair - I guess this is the new ability to add a second drive and switch to RAID1

I allowed the unit to format and ....

Well - I don't know where to start - first it appears to have formatted the wrong drive - and second it appears to be really confused (or maybe I am).

The status page shows that I have a degraded RAID1 array - and it does not mention anything about how long it will take to resync, there appears to be no resync activity, and attempt to access the drives now shows TWO separate drives, one of which is accessible and the other is not - AND the one which is accessible contains data from "a previous life" in the DNS-323 (which is why I said earlier it appears to have formatted the wrong drive.

Needless to say I am quite disappointed - the RAID worked better with both 1.03 & 1.04

It might be a better idea to put a format menu which will allow a user to choose which drive is to be formatted - if there is more than one drive - and which drive should be rebuilt if replacing a drive in a RAID1 pair.

I'll probably spend a few days playing with it before I recreate my RAID array and put it back into service, but, this could get really tiring if I had to do it with every firmware upgrade.
Logged
RAID1 is for disk redundancy - NOT data backup - don't confuse the two.

D-Link Multimedia

  • Poweruser
  • Level 7 Member
  • **
  • Posts: 1066
    • D-link Systems, Inc.
Re: Firmware 1.05 introduces new flaws in RAID resynch
« Reply #1 on: May 13, 2008, 05:06:25 PM »

Fordem,

If I got this right your scenario is the following.

1xHD Raid 1(Drive A)
1xHD (Drive B)
DNS-323 1.05

Taking Drive A and inserting it into the DNS-323 with 1.05 firmware and then placing Drive B in, instead of a simple format and resync it is also offering the option to reconfigure the secondary drive to work with an already Raided drive?

Is the intial drive, Drive A recognized as a Raid 1 degraded volume when looking at the DNS-323 interface or does it show up as a standard volume? The firmware should only reconfigure into Raid 1 if the primary drive is detected as a Standard Volume. Volumes already raided should only be formatted and resyned.

Please keep us up to date on your findings. We will run some tests here to see how easy it is to reproduce.
Logged

fordem

  • Level 10 Member
  • *****
  • Posts: 2168
Re: Firmware 1.05 introduces new flaws in RAID resynch
« Reply #2 on: May 13, 2008, 06:28:09 PM »

Not quite - at the time I upgraded to 1.05 I had a RAID1 pair in it that had been there for a month or so with firmware 1.04.  The upgrade went well and the unit appeared to be working as it should.

After the upgrade I pulled one drive and replaced it with a different drive - what I expected was that the unit would recognize that it no longer had a "valid" mirror and that it would report an error to this effect and then after going through the login & prompts etc. that it would format the newly inserted drive and then rebuild the mirror.

1)  It never reported any disk fail errors - under 1.04 I would have seen a "pink" LED and gotten an email alert
2)  It did format a drive, but, apparently not the newly inserted drive
3)  It did show the array as degraded in the status page - but there was no estimate of time remaining for the resynch.
4)  It never resynched - there was no drive activity on the LEDs.

It is probably important to note that the newly inserted drive was neither new nor "clean" (as in had no partitions or data) - it was in fact one half of a RAID1 array previously removed from the DNS-323, and that is how I was able to determine that the wrong drive had been formatted, the data from the old array was still on it.

I have seen this type of thing in the past - back in the days of Novell NetWare ELS II when I first started playing with mirrored disks (20 years ago), I know the technology has improved since then because if I were to do the same with Microsoft Windows Server 2003 it know which disk to rebuild to - the controller keeps track of the disks, probably by the volume ID

My comment on the checkbox was poorly worded - it was there but it was "greyed out" or disabled, presumably because I already had a RAID array.

Right now I have a pair of 80GB Maxtors in the unit, in a RAID1 config and all appears well - I plan to do some disk failure simulations over the next few days and hopefully will have the unit back in service toward the week-end

Please be aware that this is not a situation where I have lost data or need assistance to get the system back on line - the data is backed up and once I'm through fiddling, I'll format & restore it.
Logged
RAID1 is for disk redundancy - NOT data backup - don't confuse the two.

bspvette86

  • Guest
Re: Firmware 1.05 introduces new flaws in RAID resynch
« Reply #3 on: May 13, 2008, 09:08:57 PM »

Fordem,
Sounds to me like you need to do a better job of wiping the disk before re-inserting it into the DNS-323.  (ie. overwrite the whole disk with 0)

Cheers!
Karl
Logged

fordem

  • Level 10 Member
  • *****
  • Posts: 2168
Re: Firmware 1.05 introduces new flaws in RAID resynch
« Reply #4 on: May 14, 2008, 05:54:13 AM »

I agree with you Karl - to some extent .

Yes wiping the drive (just deleting the partitions is usually adequate, no need to zero fill the entire disk) would probably have prevented that, but, it is an area that in my opinion needs attention - other manufacturers have resolved it.

Did you notice my suggestion on having a format menu where the user selects which drive will be formatted and resynched?  This is more in keeping with how RAID is handled nowadays - it may not be the easiest thing for a consumer (ie non-technical user) but it is a heck of a lot safer than what exists now.
Logged
RAID1 is for disk redundancy - NOT data backup - don't confuse the two.

bspvette86

  • Guest
Re: Firmware 1.05 introduces new flaws in RAID resynch
« Reply #5 on: May 14, 2008, 07:38:42 AM »

I agree with you Karl - to some extent .

Yes wiping the drive (just deleting the partitions is usually adequate, no need to zero fill the entire disk) would probably have prevented that, but, it is an area that in my opinion needs attention - other manufacturers have resolved it.

Did you notice my suggestion on having a format menu where the user selects which drive will be formatted and resynched?  This is more in keeping with how RAID is handled nowadays - it may not be the easiest thing for a consumer (ie non-technical user) but it is a heck of a lot safer than what exists now.

Fordem,
I agree with you that the disk tools are weak, but in this example, you are probably using the device in a way that was never intended.

Cheers!
Karl
Logged

fordem

  • Level 10 Member
  • *****
  • Posts: 2168
Re: Firmware 1.05 introduces new flaws in RAID resynch
« Reply #6 on: May 14, 2008, 09:41:15 AM »

Again - I agree with you - but only to some extent.

Let me give you a scenario.

User has a DNS-323 with a pair of 250GB disks in a RAID1 array - he has another 250GB disk that he is using in another system - for the sake of this discussion it just happens to be an ext2 format also - maybe it's in a system running Ubuntu.

He now suffers a disk failure in his DNS-323 - and - he is unable to find a new 250GB disk, all he can get is 500GB - so rather than put a 500GB disk in the DNS-323 he decides to put it in his desktop and use the old desktop hard drive in the DNS-323.

You should be able to see where this is going - the user will be installing a drive with an ext2 file system on it as a replacement for the failed RAID1 member - how will the DNS-323 react?  Will it recognize the "Ubuntu" drive as a foreign drive and format that one and resynch it correctly?

Can you really consider this scenario as "using the device in a way that was never intended"?  All I have done is replace a failed drive with one that just happened to contain an ext2 file system - I think that could happen in real life - don't you?

DLink needs to provide a reliable way for the user to replace a failed disk in a RAID array - and - if the solution is to always use a new or clean disk, then that needs to be documented - which I don't think has happened.
Logged
RAID1 is for disk redundancy - NOT data backup - don't confuse the two.

fordem

  • Level 10 Member
  • *****
  • Posts: 2168
Re: Firmware 1.05 introduces new flaws in RAID resynch
« Reply #7 on: May 14, 2008, 12:05:28 PM »

Got another scenario for you Karl.

User has a pair of disks in a RAID1 array in a DNS-323 - he's using it and everything is good - he keeps it on a shelf in the basement alongside the router and cable modem and so on, and he accesses it through a mixed wired/wireless network from upstairs

One day he goes down in the basement and notices an amber LED - so he asks his buddy what to do next - and he's told to "reseat" the drives - maybe there's a bad connection - and so he does.  Trouble is, that amber LED has been on for a month without him noticing it and he has written fresh data to the "degraded" drive - if the unit formats and resyncs the correct drive, he's happy - if the unit formats and resynchs the wrong drive (kind of what it did in my case) he is going to be majorly pissed (if he doesn't have a backup) or if he does he wil be less upset, but either way, he will question the reliability of the device and whether or not he should continue to use it to store data.

Now - this is not as far fetched as it might seem - I don't know if you're aware of it, but on another forum, there have been anecdotal reports of drives disappearing and then reappearing when the user reseats them.
Logged
RAID1 is for disk redundancy - NOT data backup - don't confuse the two.

fordem

  • Level 10 Member
  • *****
  • Posts: 2168
Re: Firmware 1.05 introduces new flaws in RAID resynch
« Reply #8 on: May 14, 2008, 12:24:26 PM »

OK D-Link Multimedia - this one is for you

As I mentioned earlier I had installed a pair of 80GB Maxtors in a RAID 1 configuration - I left the unit like that for perhaps 18 hours or so before removing them - I did write a few GB of data and also test the print server functionality.

I removed that pair and put it aside and then installed a supposedly defective 80GB drive (I have no idea what is wrong with the drive, but one of the other techs pulled it from a PC as defective - I'll try to determine what's wrong with it later) the unit formatted the drive (standard volume) and it seems to work just fine - I am less than happy, but I cannot fault the unit, the problem may actually lie with the tech who marked the drive as defective - see note below

I then removed the single 80GB drive and reinstalled the 250 GB Seagate pair that the unit had previously refused to synch and this time when it booted the status page showed a time to sync of 78 minutes so I left it - it did synch and it was the data from the older drive.

I'm now attempting to format these drives as a RAID1 pair and so far two attempts have returned a format failure message "Hard Drive(s) Formatting Failure - with pink LEDs on both drives.

This is not good - whatever is on the drives, I should be able to tell the unit to format them and it should do it, as long as the drives are good, which I believe these to be.

An attempt to format the 250GB drives as standard volumes appears to be hung at 94% - I guess the next step is to try formatting them one at a time and if that fails transfer them to a PC and wipe them.

OK - a second attempt at formatting the drives as standard volumes was successful after which I was able to format them as a RAID1 pair - I think that's enough fiddling for today.

Note - I finally found the time to check the supposedly defective 80GB disk - there's nothing wrong with it - so as I said I can't fault the unit - the problem WAS the tech.

These drives are Seagate Barrcuda 7200.9 with 16MB cache
« Last Edit: May 14, 2008, 05:26:25 PM by fordem »
Logged
RAID1 is for disk redundancy - NOT data backup - don't confuse the two.

bspvette86

  • Guest
Re: Firmware 1.05 introduces new flaws in RAID resynch
« Reply #9 on: May 14, 2008, 07:36:37 PM »

Again - I agree with you - but only to some extent.

Let me give you a scenario.

...<snip> for the sake of this discussion it just happens to be an ext2 format also - maybe it's in a system running Ubuntu.

He now suffers a disk failure in his DNS-323 - and - he is unable to find a new 250GB disk, all he can get is 500GB - so rather than put a 500GB disk in the DNS-323 he decides to put it in his desktop and use the old desktop hard drive in the DNS-323.
<snip>...

Fordem,
One last one for you, and I agree with you up to a point as well.  But, if you understand the partition layout that the DNS-323 uses, you would realize that a Ubuntu EXT2 format disk would not be recognized by the DNS-323 as a DNS-323 formatted disk.  Each DNS-323 formatted drive has a specific partition layout on it and one of the partitions on each disk contains a file with the DNS raid/disk layout  info such as :

fdisk:
Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot    Start       End    Blocks   Id  System
/dev/sda1               1          66      530113+  82  Linux swap
/dev/sda2             131       60702   486544590   83  Linux
/dev/sda4              67         130      514080   83  Linux


# cat /dev/sda4/.systemfile/raidtab
raiddev /dev/md0
        raid-level      raid0
        nr-raid-disks   2
        chunk-size      64
        persistent-superblock   1
        device          /dev/sda2
        raid-disk       0
        device          /dev/sdb2
        raid-disk       1

Had you wiped at a minimum the partition table before reinserting the disk to be resynced, you would not have ended up with the scenario that you did.  Therefore it is a general design "Feature" and not patch level 1.05 specific problem.

What I do agree with you on is that the disk utilities in general are lacking in features.

Regards,
Karl

PS:  OMG IS THAT RAID-0 IN MY RAIDTAB?  (and it's not backed up or off site either!)
« Last Edit: May 14, 2008, 07:47:07 PM by bspvette86 »
Logged

fordem

  • Level 10 Member
  • *****
  • Posts: 2168
Re: Firmware 1.05 introduces new flaws in RAID resynch
« Reply #10 on: May 14, 2008, 08:55:34 PM »

Karl

I don't know if you noticed this, but I never suggested that an Ubuntu formatted disk would be recognized by the DNS-323 as one of it's own - I questioned whether it would recognize the disk as a "foreign" drive.

I have seen the unit when a "failed" RAID1 drive was replaced with a drive containing NTFS partitions, format, first the remaining half of the RAID pair, and then format the replacement drive - this happened with fw 1.04, I will be testing it with fw 1.05 tomorrow.

If as you state, the disks are adequately identified through their specific partition layouts, then the unit should be able to determine which of the disks is foreign and thus the one that should be formatted.

Now - I notice you have made no comment on the second scenario - should I take that to mean that you consider it a possible "real life" one?

This scenario is theoretically "closer" to what happened to me, and does present a situation where the unit may well be confused as to which disk should be considered as having valid data, and I hasten to point out this does not happen for example with Adaptec's HostRAID - which is a software (driver) based RAID solution.  I have no idea how it does it, but it knows which drive needs to be rebuilt.

If you like I can include this in my suite of proposed tests - it would only require me to pull a drive, write data to the remaining one and then reinstall the removed drive.

Last thing - you may call it a "Design Feature" if you so choose - but if that were the case, I would have expected it to be documented as a requirement when replacing a failed disk (after all - it is a part of the design - isn't it?) and as I mentioned earlier, I don't think that has been done.
Logged
RAID1 is for disk redundancy - NOT data backup - don't confuse the two.

bspvette86

  • Guest
Re: Firmware 1.05 introduces new flaws in RAID resynch
« Reply #11 on: May 15, 2008, 11:15:41 AM »

Fordem,
My point is that the subject line of this thread is incorrect.  This scenario affects more releases than just the 1.05 release and was not introduced by the latest patch.  I am in total agreement that there should be additional functionality added for drive / partition management and monitoring. 

Regards,
Karl
Logged

fordem

  • Level 10 Member
  • *****
  • Posts: 2168
Re: Firmware 1.05 introduces new flaws in RAID resynch
« Reply #12 on: May 15, 2008, 01:03:20 PM »

Oh - I see - well the "newly introduced flaw" is not so much that it formatted the wrong disk, which I had seen before, but rather, that it made no attempt to resynch and that I haven't seen with either 1.03 or 1.04.  The data from the older disk was available and that was it.

I'll admit to not doing extensive failure simulations with 1.04 - I had neither the time nor the "free" disks.
Logged
RAID1 is for disk redundancy - NOT data backup - don't confuse the two.

sgtdale

  • Level 1 Member
  • *
  • Posts: 4
Re: Firmware 1.05 introduces new flaws in RAID resynch
« Reply #13 on: May 15, 2008, 08:42:04 PM »

I am a new to the DNS-323.  I read in the FAQ for replacing a defective drive the new drive must be "Note: The new drive must blank with no partition on the drive"
http://support.dlink.com/faq/view.asp?prod_id=2797&question=DNS-323

sgtdale
Logged

bspvette86

  • Guest
Re: Firmware 1.05 introduces new flaws in RAID resynch
« Reply #14 on: May 15, 2008, 09:17:04 PM »

I am a new to the DNS-323.  I read in the FAQ for replacing a defective drive the new drive must be "Note: The new drive must blank with no partition on the drive"
http://support.dlink.com/faq/view.asp?prod_id=2797&question=DNS-323

sgtdale

B-I-N-G-O-!-!-! 

Cheers!
Karl
Logged
Pages: [1] 2