Virtualmin going full alert mode when crucial services aren't up for more than 30 seconds

4 posts / 0 new
Last post
#1 Sat, 03/04/2017 - 02:44
OliverF

Virtualmin going full alert mode when crucial services aren't up for more than 30 seconds

Hello,

Primarily, I come here because of an issue I had (I documented it here: https://virtualmin.com/comment/772386#comment-772386 ) in which, as long as virtual servers were in FPM execution mode for PHP, a bug with the current-state implementation of PHP-FPM on my server resulted in Apache2 being unable to restart after Virtualmin updated Debian Packages.

That morning, I had let Virtualmin successfully install a bunch of updates, and I simply moved on with my life and did other chores... I didn't think it would compromise web access, and it's only 3 hours later that I finally noticed all the websites hosted on my server were down, because Apache2 wouldn't start anymore.
And indeed, in the Status section of Virtualmin, Apache was marked as not being up.
After I saw the situation, and did some googling, I quickly went back inside virtualmin, switched from PHP-FPM back to Fcgid for all virtual domains, and could finally start Apache2.
Case closed.
But, still, slightly annoyed case, I somehow expected Virtualmin to go bonkers when Apache didn't work, I felt Virtualmin had a "moral obligation" of sorts to tell me with more insistence. I plead guilty, my job isn't to monitor servers 24/7, I don't have a paid third-party notification service contacting me on my phone and all.

Here's the idea: I believe Virtualmin should go "OH MY GOD ALERT ALERT ALERT" when crucial services are down for more than a few seconds, and
- (0) automatically try, by itself, to restart the services several times, and in case of failure, proceed to:
- (1) show very clearly in the dashboard that an issue must be adressed immediately, with a notification visible at all times
- (2) send an email alert to the root admin's email
- (3) have this "OMG ALERT" mode be the default option, as an opt-out feature, not as a opt-in feature that we'd possibly fail to notice once it is implemented

With a conservative approach, the crucial services to keep watch of would be Bind, Apache/NginX, Mysql, for instance.

Bullet list entry (1): if Virtualmin restarts one of the crucial services for whatever reason (package update, server config changes), and the service fails to restart, an alert should be shown in the Virtualmin pages requiring to click a little cross to dismiss it, this way we're sure the admin has seen it.

Bullet list entry (2): make use of Virtualmin's own engine to send an email alert to the root's account email address, hopefully the email will hit home faster than third-party notification services.
It's too bad that, in 2017, it still costs money to send SMS notifications, that, it would have been an AWESOME feature too.

Well, that was it for my suggestion.
Have a good day everyone!

Tue, 03/07/2017 - 14:12
Joe
Joe's picture

Hey Oliver,

I don't disagree, in principle. There are some logistical issues with some of it, but I think some of this is do-able.

As for text messages, it's possible to configure Webmin's System and Server Status module to send SMS via many mobile operators email-to-SMS gateway. At least, it used to be possible; not sure how it works these days, as I haven't used it in a while. You should probably take a poke around in that module, anyway; I think you'll be pleasantly surprised at what is already built in to Webmin. Virtualmin only presents a small piece of it in the dashboard. There's docs for that module here: http://doxfer.webmin.com/Webmin/System_and_Server_Status

We're considering adding hosted monitoring services to our offerings, which could include SMS (for people who pay for the service or have a Pro subscription, anyway).

I'll ask Ilia about the logistics of making status monitors more obvious in the UI.

Will also chat with Jamie about making Virtualmin more insistent about letting you know when a service fails to come back up after changes.

"- (0) automatically try, by itself, to restart the services several times, and in case of failure, proceed to:"

I have concerns about this one. If it fails to come up, there's a reason for it. One recovery restart attempt is reasonable...more than that, it probably just needs to notify and hope that the administrator responds quickly. I mean, what if the monitor is what is failing for some reason? Then we're just cycling the damned thing off and on for no reason, hurting service, etc.

Anyway, yeah, this is reasonable, and do-able. It's already (mostly) possible with the Status module, but I can see a pretty reasonable case for most of this being enabled by default.

--

Check out the forum guidelines!

Tue, 03/07/2017 - 14:14
Joe
Joe's picture

Also, PHP-FPM support is labeled beta for a reason! You should just assume it is going to break something, until we stop calling it beta and make it the default execution mode. ;-)

--

Check out the forum guidelines!

Tue, 03/28/2017 - 22:33
volk

What you are looking for is a watchdog. Here: https://linux.die.net/man/8/watchdog

Or you can put something like Monit inside your server (very small and light)

Now should this be built into Virtualmin? I think maybe it’s a better fit for Webmin where in the section where you can manage process and kill process, to monitor one as well:

http://doxfer.webmin.com/Webmin/Running_Processes

But you must take into consideration that any watchdog will potentially increase a bit more the resources the software is using, especially when you are monitoring may process. Is this the job for a control panel? Some may say yes, others not. Personally I think one of the things that Virtualmin has is that its light on a server, if they try to bundle to much stuff it could become bloated with time.

Most companies have monitoring systems already, you can use Nagios (free). I have PRTG (commercial) for some stuff, and Icinga2 (open source) for other things and then I also have a paid hosting Pingdom account. Most providers I know monitor services from outside the server and there are a few good reasons why you want to do this. Scripts and daemons that monitor the service from inside the server tend to have false positives in critical situations like when your server is unavailable and keep in mind that this will not work when its slow, hanging or having issues (exactly when you need to be alerted). If your server or services are down (or server freezes or kernel panicks) it can’t send you a notification in the first place. Therefore, it’s a bad idea to monitor critical things from inside the server with a script. There are also many false positives inside your own network like DNS and others that require external outside monitoring.

Now, some companies offer monitoring to their customers and some will never use this for that reason alone (most do monitor their servers in some way already). So building something that will use extra CPU time from time to time, even if it’s 1 second every 5 minutes, for a very % of users that will activate the feature may or may not make sense. Even people running one single local server usually at least tend to monitor 1 website (their main one) that alone would alert if you Apache has crashed. cPanel has this for example and they send you an email if a service is down and it tries to restart it 3 times if remember correctly. Then it just sends emails. (yes, it does have false positives when it can’t see the service) and personally while it does a good job restarting things that crash on its own, it’s not something I personally use a lot.

I’m not saying this is a bad idea. A watchdog for some process may be helpful, that can send the admin an email but it should be very light, very basic and as less intrusive as possible. (you don’t want Gmail marking your server as spam if alerts are send in a loop one after another). I don’t agree with this being implemented as a hosted service in Virtualmin. Some providers that do managed services or offer monitoring itself could see that as intruding into their services, similar to the backslash cPanel received when they tried to bundle hosted things that some companies where offering. Not only would that involve a cost but it’s also prompt to the SLA and service they can provide and to be honest there already great hosted services that do monitoring from multiple locations all over the world. (very hard to compete as some are even free for one or two checks).

It’s actually strange for someone not to notice his site is down for 3 hours because most people at least have a basic ping to their sites or some check. I don’t know anyone that does not, at least for commercial sites.

There are also many scripts you can install to accomplish this today. But maybe a daemon build into Webmin or Virtualmin that can watch all or selected process and try to restart and alert you by email would be useful. As for SMS, well, there are so many services offering this that it would be better just to have 2 options. Send an email or execute a HTTP push request. This way you can use the daemon to execute whatever you want including an external API to send a SMS or alert another monitoring system or just anything you like actually.

A light local watchdog would be cool but the only if it’s very light, runs locally and has the ability to push notifications to an external system (or email) in which you can then confirm or not the incident. This way it would be like an internal daemon inside servers.