D-Link Forums

The Graveyard - Products No Longer Supported => D-Link Storage => DNS-321 => Topic started by: peas on March 16, 2009, 08:18:18 PM

Title: status of '321 data corruption caused by Linux kernel bug?
Post by: peas on March 16, 2009, 08:18:18 PM
An individual found a data corruption bug by doing copy-compare to the DNS-321 :
http://www.amazon.com/review/R3I00GAEVWL1QZ/ref=cm_cr_pr_viewpnt#R3I00GAEVWL1QZ (http://www.amazon.com/review/R3I00GAEVWL1QZ/ref=cm_cr_pr_viewpnt#R3I00GAEVWL1QZ)

D-Link what is the status of this bug?  I'd hate to think that my data is being silently corrupted.

Here is the full text of the user review:

Quote
By    S. Kosto
Well, it was working fine for the features I was using. Immediately updated to their latest firmware release. Put 2 1TB drives in it, all the backup options (rebuild drive, etc.) seemed fine as I played around with swapping drives out. Then I tried to copy all of my current data over to this NAS box. After about a full day of copying (I have several hundred gigs of files) I went to check the status of the backup.

The backup had completed... HOWEVER, since I had turned on data validation (rereads the destination and source files and compares after the backup) it noted that out of the 1000s of files I had backed up that 12 of them were "not equivalent to the source files".

I took down the names of the files and then did a hex dump compare of the old and new files. To my surprise the files that were copied onto the NAS box had *exactly* 76 bytes of zero in very specific relative offsets in each file. It was always at hex offsets with the last 3 nibbles of the file offset being in the range of xfb4-xfff that were all zero, in all of the "corrupted" files.

Puzzled, I did some Google searching and found that there was a Linux kernel bug found at the end of 2006 that just happens to exactly match this behavior! The kernel was losing the "dirty bits" (modified memory page indicators) when it was writing to ext2 or ext3 file systems (this box uses ext2). This only happened on certain "chunks" (76 bytes for the Linux case) if they were the 76 bytes that fall at the end of a 4k memory page boundary (the last 76 bytes of a 4k page are... you guessed it!! bytes xfb4-xfff).

The data I was transferring was from a Windows XP machine and this NAS box is internally running.. yep, LINUX! I believe they likely have a version of kernel running on this thing that was silently corrupting my data, as all the issues seem to exactly match my conditions.

That is the WORST kind of data corruption ("silent") because there were NO error indications at ALL except for when it had done the final recompare, which good thing I had turned that on or I would have NEVER known my data was being corrupted as it was copied to this NAS box!

I notified the D-Link tech support people about this issue, and they responded back saying that they are looking into what is causing the problem (think I gave them a good enough head's up on this one!)

I promptly returned the box to get my money back and am now running w/ a RAID 1 configuration in my main PC instead of having an external NAS box.

Support notes - I stayed on the phone for the D-Link tech support number for a good 20+ minutes, all I got was the answering service kept repeating "due to a large volume of calls, ... " so I just hung up and emailed them instead. Took them about a week to get back to me (but they did).

Other gripes about the box - the little levers to remove the drives were REALLY hard to use, my thumb got sore after swapping the drives a couple times for doing the failed drive testing.

This review is specifically about the DNS-321 as that was the only one I tested, however the DNS-323 is VERY similar to this box (just basically added a print server), so I can't say if that one is any better or does the same corruption as this one does (it's quite possible).
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: ECF on March 17, 2009, 03:05:15 PM
I am very sorry but I have not herd of any verification of this issue what so ever however it will be investigated to look for possible issues. Thank you for the post.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: djy8131 on March 20, 2009, 08:07:30 AM
Anyone else had this problem?  It sounds like it is a likely bug if Dlink uses the buggy version of the kernel.  I think I will have to replace mine with a different model if this is true since we have not seen a firmware update for quite some time.  Can Dlink confirm the Linux kernel version that is being used?
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: fordem on March 20, 2009, 10:23:51 AM
Forget about which version of the kernel is in use for a few minutes - Google linux kernel bug xbf4-xfff and see what shows up.

I got exactly nine hits - all pointing back to the same source - have you tried to find any further details on this alleged bug?

Further searches on linux kernel bug and different variations of 76 bytes, zero fill, 4k page turned up nothing of any significance.

Have you tried testing your data?

Personally I don't put a lot of faith in many of those reviews, especially the ones on Amazon which often come from inexperienced end users - this reviewer does not mention (does he even know) which version of the kernel has the bug or which version of the kernel is in use on the DNS-321, and suggestions such as ...

Quote
This review is specifically about the DNS-321 as that was the only one I tested, however the DNS-323 is VERY similar to this box (just basically added a print server), so I can't say if that one is any better or does the same corruption as this one does (it's quite possible).

don't exactly inspire my confidence.

For what it's worth - the DNS-323 is a precursor to the DNS-321, so it's really that the print server was removed, rather than added, and although the units are similar, they are also quite different - among other things, they use different processors.  I use the DNS-323 and I verify my backups and have never had a verification error that could not be tracked to the data on the client being changed between the backup and the verify (this happens when I forget to close my email client before backing up).

I've also, on rare ocassion, had the need to restore from those backups, which can be considered the unltimate verification, and again, never had an issue.

Oh - the DNS-323 with firmware 1.06, runs kernel version 2.6.12.6, and I believe, so does the DNS-321.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: garyhgaryh on March 22, 2009, 10:10:55 PM
You have to admit this post by the OP does not inspire confidence in the box.  I bought a dns-321 and dns-323 and I would hate to think my files are being corrupted.

On a different subject, I had a very weird thing happen to my dns-323.  I can no longer log into the [download] section of the UI.  I get the following:

Backup your files before proceeding!

To stabilize operation, please login and select TOOLS-->RAID to reformat your device with an EXT2 file system.

What the hell? I haven't done anything to this box...

Gary

Forget about which version of the kernel is in use for a few minutes - Google linux kernel bug xbf4-xfff and see what shows up.

I got exactly nine hits - all pointing back to the same source - have you tried to find any further details on this alleged bug?

Further searches on linux kernel bug and different variations of 76 bytes, zero fill, 4k page turned up nothing of any significance.

Have you tried testing your data?

Personally I don't put a lot of faith in many of those reviews, especially the ones on Amazon which often come from inexperienced end users - this reviewer does not mention (does he even know) which version of the kernel has the bug or which version of the kernel is in use on the DNS-321, and suggestions such as ...

don't exactly inspire my confidence.

For what it's worth - the DNS-323 is a precursor to the DNS-321, so it's really that the print server was removed, rather than added, and although the units are similar, they are also quite different - among other things, they use different processors.  I use the DNS-323 and I verify my backups and have never had a verification error that could not be tracked to the data on the client being changed between the backup and the verify (this happens when I forget to close my email client before backing up).

I've also, on rare ocassion, had the need to restore from those backups, which can be considered the unltimate verification, and again, never had an issue.

Oh - the DNS-323 with firmware 1.06, runs kernel version 2.6.12.6, and I believe, so does the DNS-321.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: peas on March 23, 2009, 01:07:42 AM
You're using the wrong search terms.  Try "Linux kernel bug dirty bits 76 bytes", the 1st hit specifically mentions this bug in at least versions 2.6.5 thru 2.6.19 :
http://kerneltrap.org/node/7517 (http://kerneltrap.org/node/7517)

I don't verify the data that I store to the DNS-321, so I can't say one way or the other.  I store mostly media files, so I'm not likely to notice a small glitch here or there.  One thing that I've learned over the years developing and testing computer HW/SW is that if one person encounters a bug, it's likely a real problem lurking in the corners.  Just because you haven't encountered it doesn't disqualify its existence.  Let's say the data is safe 99.9% of the time.  That 0.1% corruption could occur in something critical and I'd rather be proactive and have D-Link investigate/fix this bug than dismiss it.

Since you present yourself as an expert here, please tell us which kernel version the DNS-321 runs.  Regardless of what you assume of the Amazon reviewer, he did a valid copy-verify test and discovered an issue.  Let's refrain from character bashing and stick to the evidence in front of us.

Forget about which version of the kernel is in use for a few minutes - Google linux kernel bug xbf4-xfff and see what shows up.

I got exactly nine hits - all pointing back to the same source - have you tried to find any further details on this alleged bug?

Further searches on linux kernel bug and different variations of 76 bytes, zero fill, 4k page turned up nothing of any significance.

Have you tried testing your data?

Personally I don't put a lot of faith in many of those reviews, especially the ones on Amazon which often come from inexperienced end users - this reviewer does not mention (does he even know) which version of the kernel has the bug or which version of the kernel is in use on the DNS-321, and suggestions such as ...
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: fordem on March 23, 2009, 06:00:08 AM
I don't know if you noticed - I did state that I was using a 323 and provided the kernel version for that - 2.6.12.6 - although I have good reason to believe that the 321 uses the same kernel, I would not make that as a statement, since I have not (and can not) personally checked it.

By the way - I don't claim to be an expert - just a sceptic, I do not believe everything I read on line, and I do not support the "whipping up of hysteria" that so often happens when bugs and other "holes" are discovered - if you can and have duplicated the problem, by all means spread the word, if you can't, then further investigation is warranted.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: peas on March 23, 2009, 08:14:09 AM
I paid money for this product and have entrusted my data to it.  It's not my job to wring out the bugs.  To the contrary, I hope D-Link seriously investigates reports of data loss on their data storage products.  By the time I encounter data corruption, it will be too late for my data.  And I'm not about to buy another '321 just for testing purposes.

I don't understand why you're so virulently opposed to investigating this problem.  Maybe you work in the sustaining dept and don't want to see new bugs, or own D-Link stock and don't want bad press?  In the latter case it's actually better for D-Link if they work on this because it shows consumers that they're responsible and support the product well.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: mig on March 23, 2009, 08:45:00 AM
According to the D-Link GPL site ftp://ftp.dlink.com/GPL/DNS-321/
the DNS-321 runs a 2.6.12.6 kernel; however, the D-Link FTP site does
not indicate which version of the DNS-321 firmware this posted GPL
represents.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: kimgkimg on March 23, 2009, 10:49:13 AM

How often are update released for the product?  I just took delivery of a DNS-321 last week, but am on the fence about keeping it or returning it.

Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: D-Link Multimedia on March 23, 2009, 11:09:04 AM
We take issues like this VERY seriously. It is being fully investigated and if the DNS-321 or any other NAS we develope is at risk, it will be resolved.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: mig on March 23, 2009, 11:22:54 AM
How often are update released for the product?  I just took delivery of a DNS-321 last week, but am on the fence about keeping it or returning it.
According to D-Link's web site for DNS-321:
   Firmware:
      v1.00 released 11/10/2008 (shipping version)
      v1.01 released 11/11/2008
   Easy Search
      v4.1.0.0 released 8/21/2008 (shipping version)
      v4.5.0.0 released 11/11/2008
 
However, this forum group was started (welcome message posted) 07/20/2008
and the first issue was posted 8/12/2008 (a few months prior to the FW v1.00 shipping version ???)
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: fordem on March 23, 2009, 04:21:40 PM
You're using the wrong search terms.  Try "Linux kernel bug dirty bits 76 bytes", the 1st hit specifically mentions this bug in at least versions 2.6.5 thru 2.6.19 :

http://kerneltrap.org/node/7517 (http://kerneltrap.org/node/7517)


Now that I've had a chance to sit down and study the link you provided - I fail to see it's relevance to the DNS-321

The link you posted relates to a kernel bug causing an IO race condition, and subsequent corruption when rtorrent hashes are checked, I fail to see how it relates to corruption when transferring files, presumably using CIFS/SMB (it's not stated by your reviewer) on a device that doesn't even have rtorrent installed.

Whilst I'm about it - did you notice any mention of 76 bytes anywhere in that page
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: Fatman on March 25, 2009, 08:37:57 AM
Now that I've had a chance to sit down and study the link you provided - I fail to see it's relevance to the DNS-321

The link you posted relates to a kernel bug causing an IO race condition, and subsequent corruption when rtorrent hashes are checked, I fail to see how it relates to corruption when transferring files, presumably using CIFS/SMB (it's not stated by your reviewer) on a device that doesn't even have rtorrent installed.

Whilst I'm about it - did you notice any mention of 76 bytes anywhere in that page

Sorry to burst your bubble, but there is relevance, they were using rtorrent to test this issue because it was discovered with rtorrent.

If I am reading this correctly (I am no kernel maintainer)...

The bug is in the way the IO layer interacts with the EXT2/3 drivers.  That definitely is something that could apply to the DNS series if the correct (unpatched) Kernels are in place.

That said given the difficulty that existed in showing corruption shown by Linus and friends, I have to question if you can trigger this bug through networked IO.  This bug required that no FS activity could occur between subsequent page dirties and cleans.  With network data coming in and being separately buffered by samba I would expect (not that it is not possible, again I am not qualified for this) that we would escape the race.

Also the 76 byte spiel was listed in the e-mail, I found it with a ctrl+f and searched for the number 76.  I think it is essentially a red herring to this bug however, it appears that the number came up in a particular test and is being spread around because of it.

D-Link is testing and will patch if a vulnerable kernel is in place.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: fordem on March 25, 2009, 10:00:22 AM
I'm well aware of what a kernel bug is and also that fact that a kernel bug, because of what it is, can affect
everything else that runs over the kernel - in short linux itself.

Yes, this was tested with rtorrent because it was discovered with rtorrent, but I have no seen no evidence to suggest that it occurs in the absence of rtorrent - and if it did - there would probably have been a lot more about it in the search engines.

Being a kernel bug, this has the potential to affect every linux distro and embedded device running the affected kernel versions - surely someone else would have experienced it by now.

As I believe I've already said, what I'm against is the "whipping up of hysteria" that occurs in these cases - so to speak - do the due diligence rather than simply regurgitating someone else's unsubstantiated hype.

Am I right?  Am I wrong?  We may never know - D-Link may simply upgrade to a new kernel version, which by the way - and you have noticed it in Linux Torvalds comments - may still have a "tiny" race condition.

Perhaps we should all avoid linux because it has race conditions than can potentially corrupt our data ;)
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: Fatman on March 25, 2009, 10:39:57 AM
Please do not think for a second I am disagreeing with you fordem, I wholeheartedly agree that we should not be whipping up any hysterias.

I just wanted to make sure that someone voiced an opinion that was serious, but not hysterical, acknowledging that D-Link will look into it.  Cause any discussion here is kinda pointless, the people who have the answers and will be looking into things already have this in their que, our job now is to wait.

I was trying to break the cycle of people only seeing only the half of the posts in this thread.  I smelled hysteria coming in spite of your efforts and was trying to put an end to it by putting forward a 3rd view that is 100% agreeable to both sides.

The reason that the problem showed up in rtorrent is assuredly that the combination of large parallelized writes and hashing everything repeatedly led to any corruption conditions being optimal.  Copies to this device that are later hashed might be a similar condition, though I don't personally believe they would be.

As for the patched race, it is a pretty difficult target, as Linus said, it will probably never be seen in silicon.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: ECF on March 25, 2009, 05:00:36 PM
kimgkimg
If you are on the fence about returning the item, i would do it immediately.

If you look at this post:
http://forums.dlink.com/index.php?topic=2880.0 (http://forums.dlink.com/index.php?topic=2880.0), you can see that they have been aware of problems almost ever since they released the 1.01 firmware.

That topic was started on nov 11, 2008. It is now late march 2009, and nothing has been done. That gives you an indication of how rapidly they will respond to a problem if it exists. I can't speak to your question exactly, since I put my unit on the shelf a while back, and haven't used it since. I have no confidence in it.

Go look elsewhere for a product that works reliably.

What you are referring to is a bug in the 1.01 firmware and is not present in the 1.00 or our 1.02 b5 beta firmware this has nothing to do with the kernel bug mentioned in this forum.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: JordiBoy on April 02, 2009, 12:58:11 PM
What you are referring to is a bug in the 1.01 firmware and is not present in the 1.00 or our 1.02 b5 beta firmware this has nothing to do with the kernel bug mentioned in this forum.

Yes, but sandiegocal was referring to the fact that D-Link does not have a stellar track record of fixing know bugs -- bugs that are pretty serious such as losing permissions.  If we can't trust a company to fix problems like the permissions issue in a timely manner, how can we trust that same company to fix corruption issues. 
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: peas on April 08, 2009, 07:48:07 AM
Yes, this was tested with rtorrent because it was discovered with rtorrent, but I have no seen no evidence to suggest that it occurs in the absence of rtorrent - and if it did - there would probably have been a lot more about it in the search engines.
The 1.02 b5 firmware has a torrent client built-in.  Before that, some people were already downloading torrents through fun_plug.  The list of excuses is dwindling.

As Fatman mentioned, rtorrent was merely one way they found of reproducing the issue.  That does not mean it's the only way.

Being a kernel bug, this has the potential to affect every linux distro and embedded device running the affected kernel versions - surely someone else would have experienced it by now.
Erm.. someone has, not just on general Linux distros running the affected kernels, but on the DNS-321.  Sure it doesn't appear to be a common case, but I'd rather err on the side of caution when it comes to my data.

As I believe I've already said, what I'm against is the "whipping up of hysteria" that occurs in these cases - so to speak - do the due diligence rather than simply regurgitating someone else's unsubstantiated hype.
Sounds like the only hysterical one is yourself.  Facts have been presented, but you'd rather ignore them.  Fortunately it appears that the D-Link engineers are investigating this issue.  At the very least they should be able to understand the frequency of this problem.  And I for one am glad that the people at D-Link are more reasonably minded about this.

Perhaps we should all avoid linux because it has race conditions than can potentially corrupt our data ;)
Now don't be silly :P, all SW/FW have bugs, it's a matter of minimizing the number and severity.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: D-Link Multimedia on April 08, 2009, 10:11:20 AM
Just to put an end to this. There is NO data corruption in 1.01 firmware for the DNS-321. It has already been tested and verified in lab so if you are running 1.00 and don't want to risk it then move to 1.01.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: fordem on April 08, 2009, 05:19:50 PM
The 1.02 b5 firmware has a torrent client built-in.  Before that, some people were already downloading torrents through fun_plug.  The list of excuses is dwindling.

None of which are rtorrent - what's your point?

Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: peas on April 08, 2009, 10:24:38 PM
None of which are rtorrent - what's your point?
... what's yours?  Don't be ridiculous nitpicking over torrent clients.


To the D-Link technical engineer: Thank you for the update.  Was the kernel updated between 1.0 and 1.01?  Did you confirm the issue in the 1.0 firmware?
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: fordem on April 09, 2009, 05:34:26 AM
Having spent the time to read and understand the discussion between Linus Torvalds et al. - the corruption was caused by a very difficult to replicate (read - extremely rare) IO race condition occuring when a specific torrent client was used - there is no evidence of it occuring with any other application.

That torrent client is not used with this device and the possibility or probability of the IO race condition occuring is slim to nil.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: Fatman on April 10, 2009, 09:56:43 AM
I thought that is what we had concluded, back before we got our official word we were not effected.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: peas on April 16, 2009, 11:17:08 PM
Having spent the time to read and understand the discussion between Linus Torvalds et al. - the corruption was caused by a very difficult to replicate (read - extremely rare) IO race condition occuring when a specific torrent client was used - there is no evidence of it occuring with any other application.

That torrent client is not used with this device and the possibility or probability of the IO race condition occuring is slim to nil.
Gosh forden stop threadcrapping.  You don't "understand" the issue at all, you keep repeating one way that someone found to replicate it.  That is not the only way to recreate a timing issue.

But whatever..  What we (as customers) need to know is how the Dlink engineers confirmed that this problem has been resolved, and what their confidence level is with this issue.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: drick on April 23, 2009, 10:13:27 PM
Just to put an end to this. There is NO data corruption in 1.01 firmware for the DNS-321. It has already been tested and verified in lab so if you are running 1.00 and don't want to risk it then move to 1.01.


just to be clear did you guys also test this on the 323, or is that not needed because they run the same kernel? i'm on fw version 1.07 if that helps.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: D-Link Multimedia on April 24, 2009, 11:34:41 AM
just to be clear did you guys also test this on the 323, or is that not needed because they run the same kernel? i'm on fw version 1.07 if that helps.

It is being tested across the board. If there are any issues they will be resolved in a timely manner.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: drick on April 24, 2009, 11:36:57 AM
ok, thanks.

i assume you will post the results to both forums (321+323) as a sticky upon completion, and if there is an issue that will link to a firmware update?
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: nickOfTime on July 04, 2009, 11:59:34 PM
An individual found a data corruption bug by doing copy-compare to the DNS-321 :
http://www.amazon.com/review/R3I00GAEVWL1QZ/ref=cm_cr_pr_viewpnt#R3I00GAEVWL1QZ (http://www.amazon.com/review/R3I00GAEVWL1QZ/ref=cm_cr_pr_viewpnt#R3I00GAEVWL1QZ)

D-Link what is the status of this bug?  I'd hate to think that my data is being silently corrupted.


I recently bought a DNS-321 for a secondary backup.  I've owned a DNS-323 for about a year, which I also use for backup.  I read about this bug and set off trying to duplicate it.  I succeeded.  The 321 is running firmware 1.02, and the disk is a Samsung HD103SI.  The bug is exactly the same as described in the Amazon review -- last 76 bytes of a 4k block is all 0's.  This bug does not occur on my DNS-323 (firmware 1.04, WD WD5000AAKS-00YGA).  My PC is a 2-core AMD processor running XP (SP3).

I wrote two little java programs which replicate the bug.

One creates a bunch of files (400, to be exact), alternatively approximately 2MB and approximately 14MB. I used these sizes simply because I found the bug when backing up image files, and these were the sizes of jpeg and raw files, respectively.

The second copies the files in 4KB chunks.

I don't know if 1MB is the magic number that triggers the bug.  And I don't know what size chunks are needed either.  4KB hit the bug.  So did 1MB.  The java version I use is 1.6.0_12.

I put the source code up here: http://snicol.dyndns.org/dlink/ (http://snicol.dyndns.org/dlink/)
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: nickOfTime on July 05, 2009, 12:04:31 AM
BTW, in backing up about 75GB of image data, I hit this bug 150 times on the DNS-321.  Like I wrote in my previous post, it did not happen at all on my DNS-323.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: nrf on July 05, 2009, 05:02:09 AM
wow. this seems like something worth jumping up and down over!
I'm glad mine sits on the shelf until it becomes worthy of its advertised function.


nrf  :(

Title: Curiouser and Curiouser
Post by: nickOfTime on July 05, 2009, 07:42:53 AM
With CopyAndVerify writing files and reading files are interleaved (but separate -- write complete and closed before read happens):

copy: write destination file 1
verify: read destination file 1
copy: write destination file 2
verify: read destination file 2
...

If I split it up, so all copies happen first, then all verifies after, there is no corruption.  I have added Copy.java and Verify.java to my web site.

BTW, just to make things perfectly clear... in all tests, source directory is on the PC, destination on the DNS-321.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: nickOfTime on July 05, 2009, 07:45:15 AM
wow. this seems like something worth jumping up and down over!
I'm glad mine sits on the shelf until it becomes worthy of its advertised function.


nrf  :(



I've found an acceptable workaround, at least for me.  See posting above.  All I need to do is throw together a robust verifier that recurses through directories.  I'll keep my DNS-321 for now...
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: peas on July 05, 2009, 04:19:14 PM
nickoftime - thanks for reporting your findings and a way to recreate the issue.  I wish it weren't the case, but now that there's a verified simple way to trigger the bug, will Dlink prioritize a fix for it?

I hope the naysayers (fordem) will learn to eat their humble pie and not discount bugs just because they haven't encountered them.  Bugs have a habit of sneaking up on you and defying logic, at least until they are fully understood.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: nickOfTime on July 05, 2009, 05:40:14 PM
nickoftime - thanks for reporting your findings and a way to recreate the issue.  I wish it weren't the case, but now that there's a verified simple way to trigger the bug, will Dlink prioritize a fix for it?

I hope the naysayers (fordem) will learn to eat their humble pie and not discount bugs just because they haven't encountered them.  Bugs have a habit of sneaking up on you and defying logic, at least until they are fully understood.

I wish it weren't the case either, but it's better to try to verify it than just leaving it hang out there.  How did I find it?  I just wrote a little program that did as the reviewer on Amazon described -- copy then verify.

I don't blame fordem or D-Link employees for their reactions.  Having worked as a software developer for 25 years I've been there plenty of times.  Hopefully over time I've become less reactionary, although you'd have to ask my customers about that.  But generally with software or hardware, where there's smoke, there's fire.  As for the statement from a D-Link employee:

Quote
There is NO data corruption in 1.01 firmware for the DNS-321. It has already been tested and verified in lab

That's a strong statement, but impossible to back up.  All you can say is that corruption hasn't been encountered in the tests that were run.  You can't test and verify what you haven't reproduced.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: Fatman on July 06, 2009, 08:25:39 AM
Thanks for your work writing a real world (and hopefully reproducible) replication procedure (and doubly thanks for providing code so it can be peer reviewed, that is more helpful than some realize [though obviously you do]).

As for the comments you quoted, you are correct his wording was a bit strong, though I think (how could I know, I am not he) that his intention matched what you said, or it was close enough for government work.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: nickOfTime on July 06, 2009, 09:18:17 AM
Thanks for your work writing a real world (and hopefully reproducible) replication procedure

You're welcome.  I also had a trouble ticket opened (DLK400396514) just so that this gets on-record.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: Fatman on July 06, 2009, 10:05:47 AM
You're welcome.  I also had a trouble ticket opened (DLK400396514) just so that this gets on-record.

Thanks again for your thoroughness with this issue, though I am going to let you in on a little secret, if you call in it floats to the top quicker.  This isn't due to any particular bias, but it has to do with the number and quality of e-mails we get vs. the fact that if you call in and talk over the tech's head you will end up escalated, that doesn't work as well via e-mail.  It looks like your e-mail received a fairly generic response at first, I apologize.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: nickOfTime on July 06, 2009, 10:52:28 AM
Thanks again for your thoroughness with this issue, though I am going to let you in on a little secret, if you call in it floats to the top quicker.  This isn't due to any particular bias, but it has to do with the number and quality of e-mails we get vs. the fact that if you call in and talk over the tech's head you will end up escalated, that doesn't work as well via e-mail.  It looks like your e-mail received a fairly generic response at first, I apologize.

No problem with the generic replies, I'm used to it.  I only emailed to get a trouble ticket associated with the issue.  email seems to be best at getting things on-record, not transcribed by whoever took your call.  I find that forums are best at getting attention.

Thanks for getting this sent up to the right place.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: freakyg on July 09, 2009, 03:37:27 AM
soooo what's the "status of '321 data corruption caused by Linux kernel bug?"?.  The latest firmware for this thing is v1.01 dated 11/11/08 (aside from the 1.02 "Fix for Deskstar").  I think it's a little unreasonable that home or small business owners would have a decent product - as long as you can write (or use) copy/verification software.  Isn't that (or part of) what this does?
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: nickOfTime on July 09, 2009, 07:17:23 AM
soooo what's the "status of '321 data corruption caused by Linux kernel bug?"?.  The latest firmware for this thing is v1.01 dated 11/11/08 (aside from the 1.02 "Fix for Deskstar").  I think it's a little unreasonable that home or small business owners would have a decent product - as long as you can write (or use) copy/verification software.  Isn't that (or part of) what this does?

I haven't heard anything back, but I'm pretty sure they're working on it.  My web server logs shows about 100 hits that (assuming IP records are correct) work out to between 8:30am and 10:30am local time of the client (i.e. the people making requests of the web server) the first work day after I reported this.

I'm not trying to make excuses (and why should I?  I'm a customer, not an employee), but I can say from working in the industry that when a bug report that sounds serious but is pretty vague comes in, you assign somebody a bit of time to look at it and if nothing comes of it is marked as not reproduceable and closed.

I hit this because I verify data.  I doubt many people do.  Just look at async (aka nosync) transaction modes on databases to see how casual even businesses are to data corruption.  It's like no safety net tightrope walking.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: Fatman on July 09, 2009, 08:57:14 AM
Isn't that (or part of) what this does?

Not to get in nickOfTime's way (he seems to have this on lockdown), but it is also worth noting that bug or no bug this product does not do data verification.  It would be a good idea regardless.  I am not saying we don't have a bug or that we aren't going to fix it, just that if I cared about my data I would be already verifying it.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: youlian on July 14, 2009, 10:01:11 AM
Does anybody have an idea how likely is data corruption when just copying files from XP to dns-323 using windows explorer ?

I do not use special backup software I just copy files using windows explorer.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: nickOfTime on July 15, 2009, 08:38:59 AM
Does anybody have an idea how likely is data corruption when just copying files from XP to dns-323 using windows explorer ?

I do not use special backup software I just copy files using windows explorer.

Nobody can guarantee anything, but...

The bug appears to be related to constant switching from reading to writing in a very short time.  When you are only copying files (such as with windows explorer) between your PC and the DNS, then you are either reading (if copying from the DNS) or writing (copying to the DNS), not both.

I have only been able to reproduce the bug on a DNS-321.  I've owned a DNS-323 for almost a year and a half and have never had data corruption.  I am certain of this because I verify files.  After I reproduced the bug on the DNS-321, I tested my DNS-323 for a few days (solid, no breaks) and did not hit the bug.

Despite the obvious similarities between the DNS-321 and DNS-323, the hardware inside is quite different.  Since this appears to be a timing bug, different hardware can have different results.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: peas on July 21, 2009, 09:25:37 AM
Not to get in nickOfTime's way (he seems to have this on lockdown), but it is also worth noting that bug or no bug this product does not do data verification.  It would be a good idea regardless.  I am not saying we don't have a bug or that we aren't going to fix it, just that if I cared about my data I would be already verifying it.
Yes in general it's best to verify data.  In this specific case the act of verifying data to the DNS-321 causes data corruption, whereas not verifying is unlikely to cause corruption.  The typical procedure is to write and immediately verify that file, as opposed to writing all files then verifying from the beginning (more efficient code-wise not having to store the entire list or reload it).  I understand your sentiment but in this case what you're condoning will directly lead to data corruption.

Does DLink have a release scheduled to fix this?
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: nickOfTime on July 21, 2009, 11:52:54 AM
Yes in general it's best to verify data.  In this specific case the act of verifying data to the DNS-321 causes data corruption, whereas not verifying is unlikely to cause corruption.  The typical procedure is to write and immediately verify that file, as opposed to writing all files then verifying from the beginning (more efficient code-wise not having to store the entire list or reload it).  I understand your sentiment but in this case what you're condoning will directly lead to data corruption.

First of all, I agree with you to turn off verify.  If you are using backup software that you have no control over how verification works, you don't know if the way it verifies will trigger the bug or not.  In all likelyhood it will trigger the bug.

Having said that, while backup software that verifies afterwards is theoretically a little less efficient in terms of RAM use, but in real terms it doesn't make a difference.  It is no less time efficient.

Most backup software first scans the source disk before doing any create/modify/delete operations, and creates a list of the files in memory.    Verify is by hash (CRC32, MD5, or if you are really nuts SHA-512) and length. All you need to do to verify afterwards is to store the hash and length of the source file, then read and hash the destination files after all writing is complete.  In my case it adds a whopping 24 bytes (16 byte MD5 + 8 byte length) to each record.  Even if you are changing a million files. that's an extra 24MB of RAM consumed.  Not a huge deal when even cheap PCs ship with 4GB of RAM.

It took me about an hour to change and test my backup software after I replicated the bug.  It runs just as fast (or slow, considering it's writing to DNS-323 and DNS-321) as ever.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: nickOfTime on August 01, 2009, 11:40:10 AM
I've been testing the 1.03 firmware beta 9 and it seems to work.  Thanks!
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: peas on August 01, 2009, 11:43:23 PM
That is great news.  Kudos to D-Link for working on this issue.  That said, I wonder why the fix wasn't included in the original b9 release notes.  Did DLink update the kernel?  Or is there another Linux patch that addresses this?  I'd rest easier knowing that this was a targeted fix, and not merely a by-product of other changes.  The latter is ****e to bugs resurfacing in later revs.

Edit: lol the silly DLink censor thinks "p r o n e" is a bad word because the first 4 letters match a slang version of a "bad word".  Man I hope DLink's other programming is better than this web site.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: nickOfTime on August 01, 2009, 11:58:16 PM
That said, I wonder why the fix wasn't included in the original b9 release notes.

It is...

http://forums.dlink.com/index.php?topic=4747.0
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: peas on August 06, 2009, 02:38:26 AM
Ah, you're right.  I saw your post in the beta forum correcting the release notes (it wasn't listed initially in that post).  But that thread was updated after the main release notes thread, not the other way around, hence my confusion.

Turns out that 1.03 b9 is running kernel 2.6.22.7.  From a telnet console via fun_plug, use the command:
echo `uname -r`

It would have been helpful if DLM mentioned that the kernel had been upgraded beyond 2.6.19 (the last version with that data corruption bug).  In any case, I'll rest easier knowing that this particular issue is no longer a threat to my data.
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: bitbrain on December 22, 2009, 08:40:50 PM
After a small amount of research (Amazon.com user reviews), I became quite concerned about this issue.  I am a computer savvy individual, who intends to copy AND VERIFY my files to the NAS.  Please confirm this issue is fixed in the latest and greatest firmware.

thanks
-bb
Title: Re: status of '321 data corruption caused by Linux kernel bug?
Post by: mzpx on December 25, 2009, 08:39:45 AM
What confirmation do you expect? (And from whom?)
The box is running kernel 2.6.22.7, that I can confirm.
If the problem was resolved in that kernel (as it was stated) then you are covered.
Problems that exist in the 2.6.22.7 kernel, but not in later ones are probably included.
DLink is not a Linux kernel developer, I would not expect them to issue patches.
If you need that, you are better off with a mainstream distro on generic hardware.
Sorry.
p.s. If the problem exists, it does not seem to have affected a lot of users and this seems to be the only thread in this forum about this topic and it was not touched by anyone since august. I do not read people complaining about data loss either, but then - most of us does not verify the data.