DNS problem

24 posts / 0 new
Last post
#1 Fri, 12/10/2010 - 08:31
loyalwhite

DNS problem

Hi all,

Until yesterday my server was running perfectly. Today I get a call from a colleague saying the site is down. Trying to surf to it, the browser returns the error "Could not connect: Unknown MySQL server host 'my-domain.co.uk'

Looks like a DNS problem, I thought, if the server can't resolve a domain that points to itself in order to connect to MySQL.

I checked my domain on Pingability, and got a succession of DNS errors, which imply that I have no DNS server running, even though VirtualMin assures me that BIND is running.

"Warning my-domain.co.uk does not have an IP Address (A) record." "Error None of this zone's name servers responded on the request for 'my-domain.co.uk' records. Giving up." SOA record shows as Unknown "Warning Did not find any IP Address (A) records for the name server 'ns1.my-domain.co.uk'. Normally the parent name server will list them. These name server A records are also called 'host records' and are usually set by the domain name registrar." "Information No glue records found at parent name servers for my-domain.co.uk"

Until yesterday this site was running perfectly. Is this an issue at the domain registrar, not providing the glue records to point to my server? I am no expert on DNS - any help would be sincerely appreciated.

Fri, 12/10/2010 - 09:15
andreychek

Howdy,

You may want to review the nameservers being used for that particular domain, and make sure they're correct (and haven't changed for some reason).

Also, you could try running a DNS report using intodns.com, which may give you some additional insight into what's going on.

-Eric

Fri, 12/10/2010 - 09:44
loyalwhite

Hi Eric,

Here is a link to the intodns output. It finds NS records at the parent server, but it says there is no DNS server running at the IP addresses to which those records point. But according to VirtualMin, BIND is up and running. Any ideas?

http://www.intodns.com/waos-online.co.uk

Many thanks,

Adam

Fri, 12/10/2010 - 09:49
andreychek

Howdy,

First off -- thanks for the DNS report, that does help in troubleshooting.

Doing a lookup at your nameservers -- it does indeed appear that BIND is unavailable to the outside world.

Since it sounds like it's running locally -- you may want to verify that you don't have a firewall blocking UDP port 53. It seems to hang, rather than reject immediately, which is often a sign of a firewall.

-Eric

Fri, 12/10/2010 - 10:06
loyalwhite

Hi Eric,

The Linux Firewall in Webmin is enabled, but third on the list is:

Accept If protocol is UDP and destination port is domain

Can I assume that "domain" means port 53?

Fri, 12/10/2010 - 10:09
andreychek

Howdy,

Yup! That firewall output all looks normal.

Could there be a router or firewall in front of your server causing the trouble?

Also, you may want to restart BIND, just in case it's running, but hung for some reason.

-Eric

Fri, 12/10/2010 - 10:24
loyalwhite

No, it's a dedicated server with a hosting provider. Nothing else in front of it.

I have restarted BIND serveral times from both VirtualMin and the command line, and indeed rebooted the whole machine, to no effect.

I am tearing my hair out!

Fri, 12/10/2010 - 10:59
andreychek

If you log into your server, and run "dig @localhost", do you receive a list of a bunch of root nameservers? Or do you see an error of some sort?

Also, when you restart BIND, take a peek in your logs in /var/log -- do any errors show up in there relating to BIND?

-Eric

Fri, 12/10/2010 - 11:14
loyalwhite

Hi Eric,

Sincere thanks for your time on this. Here is the output from dig @localhost:

; <<>> DiG 9.3.6-P1-RedHat-9.3.6-4.P1.el5_4.2 <<>> @localhost ; (1 server found) ;; global options: printcmd ;; connection timed out; no servers could be reached

I'm not quite where exactly within /var/log I should be looking - I don't see a file or folder for BIND. Can you advise?

Fri, 12/10/2010 - 11:18
andreychek

Well, where exactly depends on your distro... but from your dig output above, it looks like you may be on CentOS.

So, I'd take a peek in /var/log/messages.

First, restart BIND -- then afterwards, look in /var/log/messages for any errors that show up.

It does look like BIND isn't answering queries on your server, so it may very well explain why that is in those logs when you restart it.

-Eric

Fri, 12/10/2010 - 11:32
loyalwhite

Hi Eric,

Here is the content of /var/log/messages which is added when I restart BIND:

I am guessing the the "not listening on any interfaces" might be the crux of the problem?

Dec 10 17:29:30 server55711 named[14570]: shutting down: flushing changes Dec 10 17:29:30 server55711 named[14570]: stopping command channel on 127.0.0.1#953 Dec 10 17:29:30 server55711 named[14570]: stopping command channel on ::1#953 Dec 10 17:29:30 server55711 named[14570]: exiting Dec 10 17:29:31 server55711 named[15611]: starting BIND 9.3.6-P1-RedHat-9.3.6-4.P1.el5_4.2 -u named Dec 10 17:29:31 server55711 named[15611]: adjusted limit on open files from 1024 to 1048576 Dec 10 17:29:31 server55711 named[15611]: found 4 CPUs, using 4 worker threads Dec 10 17:29:31 server55711 named[15611]: using up to 4096 sockets Dec 10 17:29:31 server55711 named[15611]: loading configuration from '/etc/named.conf' Dec 10 17:29:31 server55711 named[15611]: using default UDP/IPv4 port range: [1024, 65535] Dec 10 17:29:31 server55711 named[15611]: using default UDP/IPv6 port range: [1024, 65535] Dec 10 17:29:31 server55711 named[15611]: /etc/named.conf:9: undefined ACL '83.170.79.9,83.170.78.155,83.170.78.156' Dec 10 17:29:31 server55711 named[15611]: not listening on any interfaces Dec 10 17:29:31 server55711 named[15611]: command channel listening on 127.0.0.1#953 Dec 10 17:29:31 server55711 named[15611]: command channel listening on ::1#953 Dec 10 17:29:31 server55711 named[15611]: the working directory is not writable Dec 10 17:29:31 server55711 named[15611]: zone waos-online.co.uk/IN: loaded serial 1289610790 Dec 10 17:29:31 server55711 named[15611]: zone registration.waos-online.co.uk/IN: loaded serial 1290852794 Dec 10 17:29:31 server55711 named[15611]: running Dec 10 17:29:31 server55711 named[15611]: zone registration.waos-online.co.uk/IN: sending notifies (serial 1290852794) Dec 10 17:29:31 server55711 named[15611]: zone waos-online.co.uk/IN: sending notifies (serial 1289610790)
Fri, 12/10/2010 - 11:35
andreychek

Hmm, it looks like there may be a syntax error in your /etc/named.conf file.

Is there any chance you could post the contents of it?

In particular, line 9 appears to be the problem, but having the full context would help.

-Eric

Fri, 12/10/2010 - 11:51
loyalwhite

Hi Eric

Sure, it's below, but the plot thickens.

I went into the BIND settings in Webmin and into Addresses and Topology. Under ports and address to listen on, I cleared my three IP addresses out, saved, restarted, added them again, saved restarted... straight away BIND came up. Checked on intoDNS - all green, no problems.

HOWEVER.. as soon as I surf to my site, I get an internal server error on a particular page.. and that crashes BIND. Which is obviously what caused the problem in the first place. Where would I find the log to tell me more about internal server errors? (I am using CentOS as you summised)

The named.conf file is as follows:

options { directory "/etc"; pid-file "/var/run/named/named.pid"; allow-recursion { localnets; 127.0.0.1; };     listen-on { 83.170.79.9,83.170.78.155,83.170.78.156; }; };   zone "." { type hint; file "/etc/db.cache"; };   zone "waos-online.co.uk" { type master; file "/var/named/waos-online.co.uk.hosts"; allow-transfer { 127.0.0.1; localnets; }; }; zone "registration.waos-online.co.uk" { type master; file "/var/named/registration.waos-online.co.uk.hosts"; allow-transfer { 127.0.0.1; localnets; }; };
Fri, 12/10/2010 - 11:52
loyalwhite

The text of the internal server error echo'd to the browser is:

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, root@localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.

Fri, 12/10/2010 - 12:10
andreychek

Hrm... in theory, a website shouldn't be causing BIND to crash. Unless it's somehow changing the BIND config and restarting it :-)

If you restart BIND again, do you see those same errors in /var/log/messages?

To see what error your website is producing, you can look in $HOME/logs/error_log.

-Eric

Fri, 12/10/2010 - 15:53
loyalwhite

OK, it seems that the DNS problem is now solved thanks to rectifying the problem in line 9.

However, the underlying server error turns out to be:

"(110)Connection timed out: mod_fcgid: ap_pass_brigade failed in handle_request function"

Having searched around extensively for information on this, it seems that I need to increase the FcgidMaxProcessesPerClass setting to something well above what is a fairly low VirtualMin default.

But I cannot for the life of me figure out how! Eric, in a previous forum posting, you say "Another option as well would be to go into Administration Options -> Edit Resource Limits, and to set "Max Number of Processes" for any Virtual Servers you'd like to have limits for."

I do not see "Edit resource limits" under Admin Options - is that because I am running GPL not Pro? If so, how do I change this setting?

Fri, 12/10/2010 - 18:00
andreychek

Howdy,

It likely is a Pro vs GPL thing. It was only recently that Virtualmin GPL began enabling you to configure FCGID, and I suspect the resource limits didn't make it in there yet.

You would need to manually edit the Apache config, and add set those values in the VirtualHost block for your domain.

Alternatively, you could always move away from FCGID to CGI, which may alleviate some of the problems you're seeing. To do that, you could go into Server Configuration -> Website Options, and set the PHP Execution Mode.

-Eric

Fri, 12/10/2010 - 18:28
loyalwhite

Eric,

I tried switching to CGI, and no error is generated but the PHP code still takes a ludicrous amount of time to process - up to a minute just to parse some RSS feeds. Ironically, until last week, I had this site on an older, slower server which was not running Virtualmin, and it parsed out this code in less than a second.

I will edit the Apache config, but can you tell me where? I tried going through Services/Configure Website/Edit Directives and adding FcgidMaxProcessesPerClass 100, but when I tried to restart Apache it returned the error

"Syntax error on line 1053 of /etc/httpd/conf/httpd.conf: Invalid command 'FcgidMaxProcessesPerClass', perhaps misspelled or defined by a module not included in the server configuration'

I just need to know exactly where to put that line, assuming that is the correct line.

Fri, 12/10/2010 - 19:24
helpmin

Did you check the settings in /home/yourdomain/fcgi-bin/php5.fcgi?

You could probably also check whether you have a swtune.conf file (if you are on a VPS)?

Fri, 12/10/2010 - 19:33
loyalwhite

Hey Snapmin. I did, but the problem I had there was that I could not mod that file, even when logged in as root. I'm sure I could find a way around that. I was using Transmit, which may have been part of the problem there. Is that where I should put the FcgidMaxProcessesPerClass directive?

It's not a VPS, it's a dedicated server.

Fri, 12/10/2010 - 19:57
loyalwhite

I just tried editing /home/yourdomain/fcgi-bin/php5.fcgi with nano as root, and couldn't save - I get a permission denied error. Anyone know why?

Fri, 12/10/2010 - 20:31
helpmin

In that case please check the attributes with lsattr and adjust accordingly.

Sat, 12/11/2010 - 05:06
Locutus

@loyal: The php5.fcgi file by default has the "immutable" bit set, which prevents even root from changing it. You can, as snapmin pointed out, list those attributes with lsattr and change them with chattr.

Also, to check out your PHP issue, it might be an idea to try the Apache mod_php variant in addition to CGI and FCGI, to see if the problems arise from the calling method, or from the PHP code itself.

Tue, 12/14/2010 - 09:32
loyalwhite

OK - I'm just doing a catchup here for anyone finding this in the future who might be confused by the meandering nature of this thread.

The problem here was not DNS - that was a separate issue and I misinterpreted it as being a symptom of the wider problem. The problem with bind was solved by changing a comma separated list of address on line 9 of /etc/named.conf file to "any" and restarting BIND.

Onto the mod_fcgid timing out issue. Using snapmin and locutus's tips about the immutable bit, I was able to add the line to /home/yourdomain/fcgi-bin/php5.fcgi to allow many more concurrent fcgi processes, and this did help. The page stopped timing out, but it still took about four seconds to execute and I decided that I was not happy with this from a user experience perspective.

The code in question looked at about 4 RSS feeds, parsed them, sorted them, checked if the associated image link was good, if so grabbed the image, and returned a nearly formatted array containing the content we needed. We've now re-written the code to cache this every two hours, so every two hours one user will get the four second delay, while everyone else will get the page instantly.

The lesson I have learned here is that in the face of a serious server error, it's better to address the source code of your site rather try and patch up your server to deliver code which is fundamentally too demanding to deliver.

Thanks for everyone's help and I hope anyone else encountering this can use this thread to solve their problems in the future.