Really high load, how to properly troubleshoot?

15 posts / 0 new
Last post
#1 Mon, 09/13/2010 - 16:05
lex

Really high load, how to properly troubleshoot?

Hi,

I've got a server with webmin and virtualmin set up. It all works pretty good. "Worked" I should say, as yesterday the load started to ascend and ascend, up to a level where all (5) sites on the server stopped responding.

I had been working on a particular site yesterday, upgrade a joomla component to a recent version. Some things went wrong and I had to do it manually. When afterwards, I saw the loads get higher and higher, I thought it might have to do with that. I disabled access to the component, by getting rid of menu items to it but to no avail.

It has to do with that particular site though. If in Joomla, I put the site 'offline' than that doesn't help, but if I disable that 'server' in virtualmin, then down goes my load to normal levels again.

So I checked with 'top', but didn't see anything really weird, apart from a lot of connections to apache. But if something doesn't respond, than that seems normal, no? Restarting mysql and apache2 does help, for a minute or two, then it all goes back up again.

Normally, the load on the server is around 1 or 2, now when it goes sky high it does go sky high: around 50.

I checked netstat, and it's quite a list, but it doesn't look like an attack from one point at least.

So, in other words, it has something to do with site "A", although other sites on the same server use mysql and php as well, there's even another joomla site on the server.

How do I know what exactly is bothering it? I didn't see anything really weird in the apache error log files either.

I'll attach the netstat output.

What strikes me as odd is that when the site id set to offline in Joomla, it still causes the high load. Only disabling it in Virtualmin really gets the load to normal levels. So to me it seems that something is trying to connect to site A so desperately that even when it's closed down by Joomla it still causes the server to ge beserk.

The stats (webalizer) show no extra traffic since yesterday at all.

Anyway, any hints appreciated as a 'disabled' site isn't really good for business, is it?

Thanks!

Mon, 09/13/2010 - 17:23
andreychek

Howdy,

During these periods of high load, you aren't by chance seeing a lot of emailing going out, are you? You can determine that by running "mailq" from the command line. It's possible Postfix wouldn't show up at the top of your "top" list, even if there's a lot of mail causing a lot of disk io.

But I'm curious if the cause of your high loads is related to outgoing email, or simply due to traffic going to your website.

-Eric

Mon, 09/13/2010 - 17:29
Krienas

if you didn't make that yet I would suggest to configure Webmin's logging to max and check logs out. You can do this at Webmin -> Webmin -> Webmin Configuration -> Logging.

Log files can be found at /var/webmin/miniserv.log and /var/webmin/webmin.log

Mon, 09/13/2010 - 17:45
ronald
ronald's picture

I suggest to remove the component you were working on completely, not only the links to it.
If it doesn't help, but joomla in debug mode.
if you see nothing odd, make a backup off your site and cut off access to the database by removing or commenting the entries to it from your configuration file..
if that doesn't help, remove joomla all together.

if that doesn't help you know it isn't the joomla site causing trouble.

putting the site in offline mode doesn't shut down the site. it only means your front page is different to the surfer. When you are logged in, you can still see the site when in offline mode...

Mon, 09/13/2010 - 18:37
lex

Thanks for all the answers people, will start working with all your suggestions now and report back.

Mon, 09/13/2010 - 19:02
lex

I don't know how to 'read' the response of 'mailq', I get this however:

-- 4464 Kbytes in 892 Requests.

I've attached it too.

Whenever the load goes back up (high again) (I've just enabled the site "A" and put it in online mode) I see (a lot) more "apache2" at command than I normally see, and restarting apache and mysql really does help for a bit.

Mon, 09/13/2010 - 19:07
andreychek

Okay, yeah, that means you have 883 messages sitting in your mail queue, waiting to go out.

Unless you're site has a mailing list, I have a suspicion you're seeing a problem with spam :-)

That could happen if you're using an older Joomla version, or if there's a component installed that has some sort of vulnerability.

A security vulnerability in your web app that has spam going out could certainly explain the high CPU usage you're seeing.

-Eric

Mon, 09/13/2010 - 19:12
lex

Cutting off the Joomla database brings the load down to normal levels.

However, that effectively deletes the whole site too, as on all pages you now only see the db connection error.

If it is that particular MySQL db, (I did repair and optimize) then what should I do next?

Mon, 09/13/2010 - 19:16
lex

Hi Eric, is there any way I can see where these mails are sent from?

Hmmm. Joomla is up to date.

Mon, 09/13/2010 - 19:19
lex

The mail addresses I see with mailq are nothing that could be on a mailinglist that come with the sites that are on this server, so it seems like spam indeed.

If I now knew what to do now that'd be great ;)

Thanks people, I really appreciate it a lot that you provide all this help, really!

Mon, 09/13/2010 - 19:25
andreychek

If you go into Webmin -> Servers -> Postfix Mail Server -> Mail Queue, you can view any of the emails in the queue.

If you select "View all headers" when looking at a particular message, the first header should give you some additional info about the email, including what userid generated the email.

Outside of what userid, it's not always possible to determine where exactly the email came from... that may involve some digging through the various files and applications in your public_html dir.

-Eric

Mon, 09/13/2010 - 19:29
lex

I've stopped Postfix to see what it'd do. I can still get the load high and make sure site A doesn't respond to page requests.

But I'm afraid I have to continue tomorrow as I literally can't keep my eyes open much longer. (Slept only 3 hours last nigtht.)

Thanks people, I'm sure we'll figure it out.

Tue, 09/14/2010 - 06:17
lex

I checked the mail in the que, and it was all garbage so i deleted the lot.

I then enabled the specific website again and saw the load increasing and increasing. However, the mailq is still empty. So I think it might not be the spam thing. As well, the messages in the que weren't all from yesterday or so, dates ranged further back as well.

Yep, while I typed this load has been going up to 10, mailq still empty. Load is increasing still, so I'll have to disable the site again.

Tue, 09/14/2010 - 15:01
lex

I've checked the mails in the mailq (now 97 (after 8 hours)) and they're almost all of this kind:

#

This is the mail system at host ....

I'm sorry to have to inform you that your message could not be delivered to one or more recipients. It's attached below.

For further assistance, please send mail to postmaster.

If you do so, please include this problem report. You can delete your own text from the attached returned message.

               The mail system

mfl@deining.org: user unknown. Command output: Invalid user specified.

#

So to me that seems the way postfix is trying to inform a spammer that an address doesn't exist at one of the domains on the server. Nothing to worry about, am I right?

Wed, 09/15/2010 - 02:43
lex

I think it's sorted.

That component that i upgraded (joomgallery) didn't work very well with joomla 1.5+ before, so I had almost 100 'external links' in my menu to the different galleries (categories). These didn't work with the new version (I could now create links in the joomla way to the different galleries) and I think the old links were creating loops now. I'm not sure about that, but that seems logical. (mod rewrite)

Anyway, I managed to change all those menu links last night and haven't had the problem since. And before, I had it within a minute more or less.

Thanks people, for all the suggestions and your time. It's good to know there's a place like this with people like you.

Topic locked