• November 01, 2024, 08:36:45 AM
  • Welcome, Guest
Please login or register.

Login with username, password and session length
Advanced search  

News:

This Forum Beta is ONLY for registered owners of D-Link products in the USA for which we have created boards at this time.

Pages: [1] 2

Author Topic: NTP Time Server Reboot Loop - Detailed Bug Report  (Read 15859 times)

hey_tommy

  • Level 1 Member
  • *
  • Posts: 10
NTP Time Server Reboot Loop - Detailed Bug Report
« on: June 14, 2009, 07:01:39 PM »

Preamble for D-Link moderators / engineers:

This is a detailed and cause-isolated bug report (and a temporary workaround) that is replicable on both the DIR-655 *AND* the DIR-628. I will also post a link to this thread in the DIR-628 forum. I have NOT called support about this, so please forward this to the appropriate engineering team.

-----

After upgrading my DIR-655 (A3) to v1.31 (from v1.20), I reset to factory defaults and decided re-enter all my settings from scratch (which I keep very well documented) instead of restoring from a config file. At a certain point, the router rebooted on its own, worked for about 30 seconds, and rebooted again - rinse and repeat. I let it loop through rebooting over and over for about half an hour until I had enough and reset to factory defaults via the reset button.

I decided to re-flash just in case, then started entering in the settings again, and lo and behold, at a certain point the same crazy reboot loop started again - which called for the trusty reset button once more. The next time, I decided to pay closer attention to the order in which I was entering the settings, and made a point of saving & rebooting after each page of settings - and it was after saving the changes for the "Time" page when the whole loop began again.

So once again, I reset to factory defaults, determined to get to the bottom of this - and after some more of this nonsense, was able to isolate it to the NTP time server entry. I tried going and disabling it in the interface between reboots, but there just wasn't enough time to navigate to the page, make changes, and save before the next reboot occurred. I decided to Google the issue, and only found a SINGLE reference to someone having the same problem - which seemed a bit odd, since I was pretty confident that a sizeable chunk of people used time servers to keep their devices time-synchronized.

Something happened next that gave me a good clue to what was happening, and encouraged further diving in instead of just leaving the NTP field empty and calling it quits; for some reason, I had unplugged my WAN cable before I was about to use the reset button - and noticed that the router stopped rebooting!

Intrigued, I configured the syslog setting, and watched the output while I plugged in the WAN cable, and here's what I saw:

Code: [Select]
Requesting time from 208.80.96.70
Time server ca.pool.ntp.org is at IP address 208.80.96.70
Requesting time from 70.80.210.236
Time server ca.pool.ntp.org is at IP address 70.80.210.236
Requesting time from 206.248.190.142
Time server ca.pool.ntp.org is at IP address 206.248.190.142

After this, the router immediately rebooted, ad nauseum.

What piqued my interest was the fact that, for some reason, the router was contacting 3 different servers before rebooting - and on next boot, it was contacting yet another set of 3 IP addresses! The fact that it was contacting *THREE* servers instead of just one made no sense at all - but the fact that the IP addresses were different on each query was in fact the correct behaviour; you see, I was using ca.pool.ntp.org - which is the subdomain used to access the Canadian servers in the NTP pool.

For those unfamiliar with NTP pool (http://www.pool.ntp.org/), it uses round-robin DNS resolution to randomly return an IP address belonging to any of the NTP servers in the pool (for load balancing and increased availability purposes). So each time you query ca.pool.ntp.org (for servers geographically located in Canada) or us.pool.ntp.org (for the U.S.) or whatever geo-linked subdomain you wish, you nearly always get a different NTP server responding to your request.

The fact that it was being queried 3 times seemed to suggest that the query process was failing somewhere. So for the heck of it, I thought I'd give D-Link's NTP server a try (ntp1.dlink.com) - and to my surprise, using it didn't cause any reboots! The syslog showed a single query, and the router continued on its merry way!

I was somewhat satisfied with this workaround and the fact that I could test this issue's resolution with newer firmware releases without having to worry about a possible need to reset to factory defaults (since I could just temporarily unplug the WAN cable to stop the reboot loop, then disable NTP via the interface). By the way , I also tried out the v1.32 NA Beta 1 and got the exact same issue.

My guess would be that it arises from some conflict with DNS resolution and the way the NTP pool implements round-robin DNS and/or a variation in the NTP data format that is being returned by these servers.

So after all this detective work, I was about to go even deeper and connect a packet analyzer to a mirrored port on the switch so I could see what was actually being sent and returned, but the large unsorted laundry pile on my floor was calling me to take care of it instead. After all, it's probably a good idea to leave at least some of the mystery to you D-Link engineers, right? :D

Another helpful note is that the same problem occurs on the DIR-628 (A3) running v1.20 firmware.

Just this week, I was setting up my sister's new laptop and her DIR-628. I upgraded the firmware to v1.20 right out of the box, and configured & documented all her settings - including the ca.pool.ntp.org NTP server address.

Because I was working on it at my own place, I had the WAN cable unplugged all along. Once I brough it over to her apartment and connected it, it started going through the same reboot. I quickly put two and two together, disconnected the WAN, changed the NTP server to the D-Link-provided one, and everything was working fine since then. This problem still persists after upgrading to the newest DIR-628 firmware beta (1.20NAb07Beta01).

-----

So there you have it. I do hope you guys are able to fix this for the next firmware release, as the issue is quite serious; please realize that just because I got down to the source and found a temporary workaround, probably 80% of your users who use the NTP pool addresses WON'T - they will RMA the product, which will mean unnecessary drain on your resources.

And lastly, it is important to note that the NTP pool isn't some obscure service that is rarely used - according to stats on their website, the pool servers about 50,000 requests PER SECOND. Now I realize that since this is a consumer product, most won't even set an NTP server, but again, my point is that out of those who DO, the results are quite SEVERE - hence the sooner you fix it, the less RMAs you'll be receiving.

peace ya'll
« Last Edit: June 14, 2009, 10:14:01 PM by hey_tommy »
Logged

mackworth

  • Level 3 Member
  • ***
  • Posts: 204
Re: NTP Time Server Reboot Loop - Detailed Bug Report
« Reply #1 on: June 15, 2009, 05:10:22 AM »

There was a thread about this here:

http://forums.dlink.com/index.php?topic=5289.0

I also hit this issue.  Its nice to see dlink taking our bug reports and fixing them in the betas (not!).
Logged

Demonized

  • Level 4 Member
  • ****
  • Posts: 421
Re: NTP Time Server Reboot Loop - Detailed Bug Report
« Reply #2 on: June 15, 2009, 05:33:50 AM »

We should be careful not to take ourselves too seriously, expecting every outcry of 'not working, fix it' to be taken as top priority... I would say this issue is not that important because there are more NTP servers  that do work flawlessly. Only a really small number seems to have issues. Just being efficient....
Logged

mackworth

  • Level 3 Member
  • ***
  • Posts: 204
Re: NTP Time Server Reboot Loop - Detailed Bug Report
« Reply #3 on: June 15, 2009, 05:50:13 AM »

We should be careful not to take ourselves too seriously, expecting every outcry of 'not working, fix it' to be taken as top priority... I would say this issue is not that important because there are more NTP servers  that do work flawlessly. Only a really small number seems to have issues. Just being efficient....

yeah, it was more of a joke.

I wouldn't called this critical, but it is annoying.
Logged

Demonized

  • Level 4 Member
  • ****
  • Posts: 421
Re: NTP Time Server Reboot Loop - Detailed Bug Report
« Reply #4 on: June 15, 2009, 05:54:17 AM »

 ;D
Logged

hey_tommy

  • Level 1 Member
  • *
  • Posts: 10
Re: NTP Time Server Reboot Loop - Detailed Bug Report
« Reply #5 on: June 15, 2009, 06:57:05 AM »

We should be careful not to take ourselves too seriously, expecting every outcry of 'not working, fix it' to be taken as top priority... I would say this issue is not that important because there are more NTP servers  that do work flawlessly. Only a really small number seems to have issues. Just being efficient....

what, you're gonna get all EddieZ on me now? and do i need to clarify everything?

the reason why i said the issue is critical isn't because, oh god forbit, i can't use the NTP pool. of course there are tons of NTP servers - do you think i give to s**ts about which NTP server I use?

the criticality comes from the fact that the stupid router gets in a reboot loop where it's unusable - and gives no indication WHY it's doing it.

if you're just setting up your newly-bought router and enter in your favourite NTP server during your setup, for many of which this happens to be the NTP pool address, and your unit starts rebooting constantly, you're going to hard-reset it, try again, not knowing what caused the issue, and then get pissed off and return it to the store or RMA it.

for a business selling products, RMAs/returns are as critical as it gets.

although i must admit, most people who would enter in an NTP server, let alone use the NTP pool, are more likely to try and troubleshoot before giving up and throwing the f***er out of the window. but nonetheless, when a bug causes the product to become completely unusable, forcing a hard reset, it's pretty effin critical - even if the workaround is as simple as using another NTP server.
Logged

hey_tommy

  • Level 1 Member
  • *
  • Posts: 10
Re: NTP Time Server Reboot Loop - Detailed Bug Report
« Reply #6 on: June 15, 2009, 07:18:40 AM »

There was a thread about this here:

http://forums.dlink.com/index.php?topic=5289.0

I also hit this issue.  Its nice to see dlink taking our bug reports and fixing them in the betas (not!).

thanks for posting that link. glad to know others have reported it - hopefully that'll increase the chances of it getting fixed... although really, i couldn't care less which NTP server to use, now that I know where the problem lies.

i was just trying to be helpful so D-Link can avoid churn - but from that thread, the tech's response and defense of EddieZ's shows that these guys aren't too concerned.

in other companies, marketing would hang them by their nuts if they didn't prioritize issuing bugfixes for issues that have the greatest risk of causing product returns.
Logged

Demonized

  • Level 4 Member
  • ****
  • Posts: 421
Re: NTP Time Server Reboot Loop - Detailed Bug Report
« Reply #7 on: June 15, 2009, 07:20:16 AM »

Yes, please explain whyI you're putting so much stress on this already explorerd issue.
BTW, I can do stuff to routers that will put them in infinite loops. No matter what brand. So it's all quite relative. That's all I'm saying. And I guess there are more serious issues to be solved  ;)

I would like that 'EddieZ' I think  :o
Logged

hey_tommy

  • Level 1 Member
  • *
  • Posts: 10
Re: NTP Time Server Reboot Loop - Detailed Bug Report
« Reply #8 on: June 15, 2009, 07:34:29 AM »

Yes, please explain whyI you're putting so much stress on this already explorerd issue.
BTW, I can do stuff to routers that will put them in infinite loops. No matter what brand. So it's all quite relative. That's all I'm saying. And I guess there are more serious issues to be solved  ;)

I would like that 'EddieZ' I think  :o

look genius, i just explained in plain in english why it should be a big deal to d-link - i'm not repeating myself again. what is clear from your attitude is that you're neither a business guy nor an IT professional.

and your point about putting routers in infinite loops is moot. may i suggest you take a course on formal & informal logic - and read up a thing or two on fallacies while you're at it.

now shoo, go play somewhere else.
Logged

Demonized

  • Level 4 Member
  • ****
  • Posts: 421
Re: NTP Time Server Reboot Loop - Detailed Bug Report
« Reply #9 on: June 15, 2009, 07:53:45 AM »

Wow, mr. Self Proclaimed Expert is here!
If you want to compare years of experience: be my guest. You more than likely to lose on both subjects. But is that really important? Try to control your temper. This is not a competition. Unless you feel threatened by people with an other opinion. If so... Live with it  :D
Logged

hey_tommy

  • Level 1 Member
  • *
  • Posts: 10
Re: NTP Time Server Reboot Loop - Detailed Bug Report
« Reply #10 on: June 15, 2009, 08:16:54 AM »


If you want to compare years of experience: be my guest. You more than likely to lose on both subjects

This is not a competition.

bravo. *clap clap*

and you have yet to actually address any of my points, hence me getting lippy. as the late great notorious b.i.g. once said: you talk smack, and i'ma smack ya.

i'll give you one thing though; you're right, this isn't a pissing contest. so lets do us both a favor and stop wasting each others' times - i'm sure we should both probably get back to our respective jobs.
Logged

Demonized

  • Level 4 Member
  • ****
  • Posts: 421
Re: NTP Time Server Reboot Loop - Detailed Bug Report
« Reply #11 on: June 15, 2009, 08:33:27 AM »

Did you write something that can be addressed? Great to see you spend so much time on a trivial issue. There's really nothing to be added.
Logged

hey_tommy

  • Level 1 Member
  • *
  • Posts: 10
Re: NTP Time Server Reboot Loop - Detailed Bug Report
« Reply #12 on: June 15, 2009, 08:55:15 AM »

there's never a flyswatter when you need one...
Logged

mackworth

  • Level 3 Member
  • ***
  • Posts: 204
Re: NTP Time Server Reboot Loop - Detailed Bug Report
« Reply #13 on: June 15, 2009, 09:00:28 AM »

Did you write something that can be addressed? Great to see you spend so much time on a trivial issue. There's really nothing to be added.

Well, I wouldn't call this a trivial issue.  Its really on the fence I think.

If you look at it, the router reboot uncontrollably with only a reset fixing it.  However, there is a workaround just by using another time server.  Another issue that I don't think alot of users would understand that something was wrong with their time server that was causing it.  But on that same note, there probably aren't a lot of users using custom time servers that wouldn't know how to diagnose this issue.  

Having said all that, the router still gets stuck in a reboot loop.  I would consider this a serious but not critical issue given that there is a workaround.

Now, even if dlink had this as a serious priority, I think there are a couple critical issues that might be more important than this.  There are a lot of variables to take into account here, which is why my comment above was somewhat of a joke.
Logged

Demonized

  • Level 4 Member
  • ****
  • Posts: 421
Re: NTP Time Server Reboot Loop - Detailed Bug Report
« Reply #14 on: June 15, 2009, 11:33:38 AM »

The only way to put the monkey solely on Dlinks shoulder is when you know for sure that the NTP's causing the issue (why them?) are not the source of the issue in any way. Why are those specific servers causing it? I have NTP's that don't work at all in Windows and others work perfectly fine. Is that Microsofts problem?

So if you want a PRO approach on this issue, you gather more info before pointing fingers to the "usual suspect". Whether it is Dlink, Linksys or Eminent is really not important to me, if you do not come up with POC, eveidence etc you're not an IT pro, but only a n00b.  8)
Logged
Pages: [1] 2