Dovecot keeps stopping

21 posts / 0 new
Last post
#1 Thu, 06/30/2011 - 23:12
kyle787

Dovecot keeps stopping

Background: I just got a fresh install of ubuntu 10.04 on a new vps and installed Virtualmin. I have Virtualmin running flawlessly on another VPS from a different provided.

Problem: On the system information page and the service tab I keep noticing that dovecot has stopped and when I start it via virtualmin it quickly stops again. When I try to start it via SSH, I get this message. "Last died with error (see error log for more information): Time just moved backwards by 299 seconds. This might cause a lot of problems, so I'll just kill myself now. http://wiki.dovecot.org/TimeMovedBackwards If you have trouble with authentication failures, enable auth_debug setting. See http://wiki.dovecot.org/WhyDoesItNotWork This message goes away after the first successful login." I saw a little information in the forums about people with similar issues but I can't figure out how to fix it. What I understand is happening is ntpdate keeps rolling back or forward the time because the server has the wrong time, is that right?

Thu, 06/30/2011 - 23:24
andreychek

Hmm, so are you running ntpdate? That can happen if you're running that, but not running it frequently enough.

Using the NTP daemon rather than ntpdate can help, as it makes more gradual changes. You could also run ntpdate more frequently, perhaps once an hour or more, depending on how much your clock drifts.

-Eric

Fri, 07/01/2011 - 01:03
kyle787

Hey, servers aren't really my thing. I am a front end developer, I am running everything as the virtualmin gpl package was installed. I am rather sure it is ntpdate though, how can I check to know for sure? I think dovecot recommended using NTP to fix it, how do I install it and set it up correctly to work with virtualmin?

Thanks so much.

Fri, 07/01/2011 - 09:14
andreychek

Hmm, do you know who setup your server?

The ntpdate program isn't something that would be setup by default on a typical Linux distribution... that would normally be setup manually, and called by a cron job.

You could try going into Webmin -> System -> Scheduled Cron Jobs, and try searching for "ntpdate", and see if it finds anything there.

-Eric

Fri, 07/01/2011 - 09:43
kyle787

I just got the VPS and then reloaded ubuntu onto the server and set up virtualmin. How can I set up ntpdate. Why didn't I have to do this with my other server?

Fri, 07/01/2011 - 09:51
tpjthomson
Fri, 07/01/2011 - 10:01
andreychek

It's possible that your VPS provider had automatically set it up ntpdate for you, and that's what is causing the problem. You may want to look in the existing Cron jobs mentioned above to see if there's an existing entry for that.

Either that, or your system clock is so inaccurate that it's causing Dovecot to get confused :-)

If you can't find any existing ntpdate setup, you could try the link provided by tpjthomson above to get that setup.

-Eric

Fri, 07/01/2011 - 12:53
kyle787

I checked for cron jobs and didn't see anything.

I don't know if the following will be helpful it leads me to believe something is wrong..

root@server2:~# ntpdate 1 Jul 17:50:12 ntpdate[]: no servers can be used, exiting root@server2:~# ntpdate pool.ntp.org 1 Jul 17:50:30 ntpdate[]: step-systime: Operation not permitted

I have set the iptables rule If protocol is UDP and destination port is 123, but it still didn't help. I tried messing around with NTP and the guide provided by tpjthomson but it didn't fix it.

With ntp when ever I tried tried to get the peers it would say No association id's returned, and I added 4 new ones in the ntp configuration file.

Fri, 07/01/2011 - 14:18
tpjthomson

Maybe you need to change this setting to OFF or Detect Automatically?

Webmin>Hardware>System Time>Module Config>System Configuration>System supports hardware time

I had trouble getting the System Time/Sync working AT ALL until I had done that on my VPS, which affected other stuff on the box. It was set to yes after I chose my OS and the provider deployed the base install.

Toby

Fri, 07/01/2011 - 18:28
kyle787

Toby, the even weirder thing is under hardware the only thing I have is printer administration.

Fri, 07/01/2011 - 18:40
tpjthomson

If it is a recent install and you haven't done much to it, I think if I were in your shoes I'd consider re-installing from scratch as it sounds like it may be a dodgy install if there ought to be at least the basics:

Hardware GRUB Boot Loader Logical Volume Management Partitions on Local Disks Printer Administration System Time

I don't use LVM, but apparently it's needed by the weird VPS disk layout

Fri, 07/01/2011 - 18:54
kyle787

Yeah I have reloaded it before, I will try again later tonight. However my other working VPS doesn't have time settings under hardware either.

Sat, 07/02/2011 - 23:41
kyle787

Reloading it worked! Thanks all! I have another problem though.. if I should open another topic let me know.


Here is a quick brief. I have one domain lets call it a.com. I started of with server.a.com, ns1.a.com, and ns2.a.com, all point to one IP. I recently got server2.a.com, ns3.a.com, and ns4.a.com. Right now, server.a.com is set up correctly or so I think, in Virtualmin I have created A records for all of the ns and servers. However I recently added b.com to server2.a.com. The site b.com points to ns3.a.com and ns4.com and is hosted on server2.a.com. Server2.a.com and ns3.a.com have the same IP addresses, and ns4.a.com has a different one, differing by the last number. All of these are set as A records for a.com on server.a.com. When I use http://www.squish.net/dnscheck/ it says "100.0% of queries will end in failure at IP (ns3.a.com) - returned REFUSED code" I don't understand why this is happening. If I run a.com which is hosted on server.a.com everything checks out fine. Any ideas?

Sat, 07/02/2011 - 23:50
andreychek

Is BIND listening on your new IP address?

Try running "netstat -an | grep :53", and look at the UDP listings -- is your new IP address included there?

If not, you may need to add it. You can do that in Webmin -> Servers -> BIND DNS Server -> Addresses and Topology, and make sure that new IP address is listed under the Addresses section of "Ports and addresses".

-Eric

Sat, 07/02/2011 - 23:52
kyle787

Which server should I check on?

Sat, 07/02/2011 - 23:56
andreychek

Which server should I check on?

Well, the error you got said that it was "ns3.a.com" that was failing -- so you'd need to make sure BIND is running on whatever server hosts that IP address, and that it's correctly listening for incoming connections.

-Eric

Sun, 07/03/2011 - 00:08
kyle787

Okay when I do "netstat -an | grep :53" I have localhost show up and the IP of the ns1, ns2, and the server, but nothing for ns3, ns4, or server2. This shows up:

root@server:~# netstat -an | grep :53 tcp 0 0 xxx.xx.xxx.34:53 0.0.0.0:* LISTEN tcp 0 0 127.0.0.1:53 0.0.0.0:* LISTEN tcp6 0 0 :::53 :::* LISTEN udp 0 0 xxx.xx.xxx.34:53 0.0.0.0:* udp 0 0 127.0.0.1:53 0.0.0.0:* udp6 0 0 :::53 :::*
So I added the IP of ns3 to it, and restarted BIND, however now when I do "netstat -an | grep :53" this shows up:
root@server:~# netstat -an | grep :53
tcp6 0 0 :::53 :::* LISTEN
udp6 0 0 :::53 :::*

Sun, 07/03/2011 - 00:23
kyle787

Alright, I am not sure what I did or if I did anything it now reads, "100.0% of queries will end in failure at xxx.xx.xxx.34 (ns3.a.com) - query timed out"

Sun, 07/03/2011 - 00:28
kyle787

Also the IP for ns1, ns2, and server is xxx.xx.xxx.34 and the IP for ns3 and server is xxx.xx.xxx.148 and ns4 is xxx.xx.xxx.149. So why is it saying that the IP for ns3 is xxx.xx.xxx.34?

Sun, 07/03/2011 - 09:46
andreychek

Well, I'm starting to have a hard time following with the masked domain names and IP addresses :-)

However, it sounds like this is important:

Also the IP for ns1, ns2, and server is xxx.xx.xxx.34 and the IP for ns3 and server is xxx.xx.xxx.148 and ns4 is xxx.xx.xxx.149. So why is it saying that the IP for ns3 is xxx.xx.xxx.34?</cite>
 
I'm not sure why that would be, but that's something you're going to need to figure out :-)
 
Either your domain name registrar or the DNS on your server has that incorrect.  You'd probably want to check both and make sure both have the IP address correct for that particular domain name.
 
But this also sounds important:
 
<cite>Okay when I do "netstat -an | grep :53" I have localhost show up and the IP of the ns1, ns2, and the server, but nothing for ns3, ns4, or server2.</cite>
 
If you ran that command on "server1", and if server1 hosts ns1 and ns2 and not ns3 and ns4, than that's expected... the above command shows connections on it, not connections for other servers.  You'd need to run that on server2.
 
Again though, I'm getting a bit confused, so I'm not sure I fully understand your setup :-)
 
That said -- I think the key lies in you verifying that the IP addresses for all your NS records are correct on both your server and at your registrar, as well as verifying that BIND is listening on each of those IP addresses on each server.
 
  -Eric
Sun, 07/03/2011 - 11:43
kyle787

I was just about to get rid of the domain masking at everything, but you were right! It was in the registrar that things were messed up. Thank you so much for all of your help.

Topic locked