php websites throwing 503 service unavailable errors after updates+reboot (seems related to incorrect default for "sub processes" in website options)

Hi guys, I have been having this problem for a few weeks and i am not sure how to resolve it. when i run virtualmin updates, and then follow the prompt to restart server (if there has been a kernel update), all of my php driven websites stop working throwing error 503 service temporarily unavailable messages in web browsers.

His is the log file from one of the virtual servers immediately after the update and reboot...

[Sat Aug 17 08:48:49.775404 2019] [proxy:error] [pid 1270:tid 140478783682304] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:8000 (*) failed
[Sat Aug 17 08:48:49.775840 2019] [proxy_fcgi:error] [pid 1270:tid 140478783682304] [client 66.249.79.202:36608] AH01079: failed to make connection to backend: localhost

and here is the webmin>apache server error log file for this morning (a few hours before and just after the update)

[Sat Aug 17 06:25:07.011851 2019] [core:notice] [pid 3222:tid 139886527545408] AH00094: Command line: '/usr/sbin/apache2'
[Sat Aug 17 08:46:15.450615 2019] [mpm_event:notice] [pid 3222:tid 139886527545408] AH00491: caught SIGTERM, shutting down
[Sat Aug 17 08:46:38.004321 2019] [:notice] [pid 639:tid 140479133544512] ModSecurity for Apache/2.9.1 (http://www.modsecurity.org/) configured.
[Sat Aug 17 08:46:38.004336 2019] [:notice] [pid 639:tid 140479133544512] ModSecurity: APR compiled version="1.5.2"; loaded version="1.5.2"
[Sat Aug 17 08:46:38.004341 2019] [:notice] [pid 639:tid 140479133544512] ModSecurity: PCRE compiled version="8.39 "; loaded version="8.43 2019-02-23"
[Sat Aug 17 08:46:38.004345 2019] [:warn] [pid 639:tid 140479133544512] ModSecurity: Loaded PCRE do not match with compiled!
[Sat Aug 17 08:46:38.004348 2019] [:notice] [pid 639:tid 140479133544512] ModSecurity: LUA compiled version="Lua 5.1"
[Sat Aug 17 08:46:38.004350 2019] [:notice] [pid 639:tid 140479133544512] ModSecurity: YAJL compiled version="2.1.0"
[Sat Aug 17 08:46:38.004353 2019] [:notice] [pid 639:tid 140479133544512] ModSecurity: LIBXML compiled version="2.9.4"
[Sat Aug 17 08:46:38.004395 2019] [:notice] [pid 639:tid 140479133544512] ModSecurity: StatusEngine call: "2.9.1,Apache/2.4.25,1.5.2/1.5.2,8.39/8.43 2019-02-23,Lua 5.1,2.9.4,46"
[Sat Aug 17 08:46:39.662828 2019] [:notice] [pid 639:tid 140479133544512] ModSecurity: StatusEngine call successfully sent. For more information visit: http://status.modsecurity.org/
[Sat Aug 17 08:46:39.665076 2019] [suexec:notice] [pid 639:tid 140479133544512] AH01232: suEXEC mechanism enabled (wrapper: /usr/lib/apache2/suexec)
[Sat Aug 17 08:46:40.016534 2019] [mpm_event:notice] [pid 1241:tid 140479133544512] AH00489: Apache/2.4.25 (Debian) OpenSSL/1.0.2s mod_fcgid/2.3.9 configured -- resuming normal operations
[Sat Aug 17 08:46:40.017410 2019] [core:notice] [pid 1241:tid 140479133544512] AH00094: Command line: '/usr/sbin/apache2'
[Sat Aug 17 08:53:51.984576 2019] [mpm_event:notice] [pid 1241:tid 140479133544512] AH00493: SIGUSR1 received.  Doing graceful restart
[Sat Aug 17 08:53:53.006070 2019] [mpm_event:notice] [pid 1241:tid 140479133544512] AH00489: Apache/2.4.25 (Debian) OpenSSL/1.0.2s mod_fcgid/2.3.9 configured -- resuming normal operations
[Sat Aug 17 08:53:53.006096 2019] [core:notice] [pid 1241:tid 140479133544512] AH00094: Command line: '/usr/sbin/apache2'
[Sat Aug 17 08:54:41.332756 2019] [mpm_event:notice] [pid 1241:tid 140479133544512] AH00493: SIGUSR1 received.  Doing graceful restart
[Sat Aug 17 08:54:42.005455 2019] [mpm_event:notice] [pid 1241:tid 140479133544512] AH00489: Apache/2.4.25 (Debian) OpenSSL/1.0.2s mod_fcgid/2.3.9 configured -- resuming normal operations
[Sat Aug 17 08:54:42.005481 2019] [core:notice] [pid 1241:tid 140479133544512] AH00094: Command line: '/usr/sbin/apache2'

here is the Kernel log at the time of update and restart...

Aug 17 08:46:35 server1 kernel: [    1.605255] [drm] vram aper at 0xFC000000
Aug 17 08:46:35 server1 kernel: [    1.605255] [drm] size 33554432
Aug 17 08:46:35 server1 kernel: [    1.605256] [drm] fb depth is 24
Aug 17 08:46:35 server1 kernel: [    1.605256] [drm]    pitch is 3072
Aug 17 08:46:35 server1 kernel: [    1.605300] fbcon: cirrusdrmfb (fb0) is primary device
Aug 17 08:46:35 server1 kernel: [    1.643026] Console: switching to colour frame buffer device 128x48
Aug 17 08:46:35 server1 kernel: [    1.663463] cirrus 0000:00:02.0: fb0: cirrusdrmfb frame buffer device
Aug 17 08:46:35 server1 kernel: [    1.664697] input: QEMU QEMU USB Tablet as /devices/pci0000:00/0000:00:01.2/usb1/1-1/1-1:1.0/0003:0627:0001.0001/input/input6
Aug 17 08:46:35 server1 kernel: [    1.664881] hid-generic 0003:0627:0001.0001: input,hidraw0: USB HID v0.01 Mouse [QEMU QEMU USB Tablet] on usb-0000:00:01.2-1/input0
Aug 17 08:46:35 server1 kernel: [    1.681086] [drm] Initialized cirrus 1.0.0 20110418 for 0000:00:02.0 on minor 0
Aug 17 08:46:35 server1 kernel: [    1.692545] EDAC MC: Ver: 3.0.0
Aug 17 08:46:35 server1 kernel: [    1.733723] tsc: Refined TSC clocksource calibration: 2593.897 MHz
Aug 17 08:46:35 server1 kernel: [    1.733734] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2563b6b0c4a, max_idle_ns: 440795254555 ns
Aug 17 08:46:35 server1 kernel: [    1.744545] Process accounting resumed
Aug 17 08:46:36 server1 kernel: [    2.272155] ip6_tables: (C) 2000-2006 Netfilter Core Team
Aug 17 08:46:36 server1 kernel: [    2.305909] Ebtables v2.0 registered
Aug 17 08:46:36 server1 kernel: [    2.320159] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
Aug 17 08:46:36 server1 kernel: [    2.625051] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
Aug 17 08:46:36 server1 kernel: [    2.711372] Netfilter messages via NETLINK v0.30.
Aug 17 08:46:36 server1 kernel: [    2.753716] ip_set: protocol 6

From Virtualmin dashboard, it appears all of the servers being monitored are running fine (ie mysql, apache, bind etc).

What i did was go into the first virtual server>website options, and change php from FPM to FCGI. It threw an error saying that the number of processes must be between 1 and 20 (it had changed this automatically to 75). I reset that figure back to 20, clicked on save, then went back and changed to FPM again and all is well again.

the funny thing is, i now cannot even see that "sub processes" item in the list anymore in website options. Why does this pop up with wrong entry, and then completely dissapear out of website options after i correct the entry and then hit save? (i can no longer even find it)

this must be some kind of bug

Status: 
Active

Comments

Title: php websites throwing 503 service unavailable errors after updates+reboot » php websites throwing 503 service unavailable errors after updates+reboot (seems related to incorrect default for "sub processes" in website options)

what i dont understand is

  1. why editing just one Virtual Server and resetting the sub processes value seems to fix all of the virtual servers running php websites?

  2. what does sub processes for FCGI even have to do with FPM? (I am running FPM on all Virtual Servers)

  3. Is the subprocesses incorrect value error simply coincidence and not actually related to the update problem and that the reason for websites working again after changing from FPMM to FCGI and back again fixing the problem because its restarting FPM? If so, why isnt FPM being restarted as part of the server reboot process?

Actually, i have now found the problem on my server. Whenever i restart the system, webmin is not restarting any of the php-fpm services. See link for image showing none of them running immediately after restart...

https://drive.google.com/file/d/1YjlBtK_GzTvbgpeNSf8_7jOsFULvk0dh/view?u...

I have to go into Webmin>System>bootup and shutdown and manually start php-fpm. once i do that, everything is fine again.

Now the question is "why isnt php-fpm restarting with a server reboot?"

EDIT...

I also notice (when looking at image in the link i provided), php-fpm is not set to "automatically start at boot". I have changed this and imagine my issues should now go away. I am not sure how or why this has happened in the first place as i would have thought that in Virtualmin, if one decides to change over to php-fpm, this option should automatically be set to "yes" , however it appears that on my system at least, this has not happened.

I think there is some problem with the FPM version detection and the 'port clash for PHP-FPM' test that causes any update or 'Re-check Configuration' to change the FPM port number of each website (that no longer matches the website config). I have only one FPM version installed (7.2.3), but Virtualmin detects it as two versions: The following PHP-FPM versions are available on this system : 7.2 7.2.3 Then it changes the perfectly good and correct port numbers (to avoid a "clash" between two versions), actually breaking all the website. My example is below.

I have a fresh vanilla install of Virtualmin 6.07 on Ubuntu 18.04 with just one version of PHP (7.2.19) with PHP-FPM (7.2.3). It works fine until I run 'Re-check Configuration' and after that all PHP-FPM sites are broken.

Initially all is fine, the website config says:

<FilesMatch \.php$>
SetHandler proxy:fcgi://localhost:8000
</FilesMatch>

and /etc/php/7.2/fpm/pool.d/12345678901234.conf says:

listen = 8000

The website works great. Life is good. But every we time I run 'Re-Check Configuration' it says the following and all the PHP sites stop working:

The following PHP versions are available : 7.2.19 (/usr/bin/php-cgi7.2)

The following PHP-FPM versions are available on this system : 7.2 7.2.3

Fixing port clash for PHP-FPM version 7.2.3

Fixing port clash for PHP-FPM version 7.2.3

Restarting PHP-FPM server ..
.. done

Now the website config still says:

<FilesMatch \.php$>
SetHandler proxy:fcgi://localhost:8000
</FilesMatch>

But virtualmin has changed /etc/php/7.2/fpm/pool.d/12345678901234.conf to a different port!

listen = 8001

If I run 'Re-Check Configuration' it changes it to port 8002, and so on. Every time it changes the PHP-FPM port but does not update the website config.

If I manually change 800X back to 8000 and restart php-fpm that the website works again.

Seems like the FPM version detection and 'port clash for PHP-FPM' clash detection is buggy?

Oh god, I have a server where all the websites on it are down and this seems to be the problem (Re-check configuration throws up port clash warnings).

How do you get them back up?!

I found that restarting php-fpm resolved it for me. Haven't a clue if it will help you or not.

Thanks Adam,

in the end I got things back up, but it was long-winded and messy and I don't remember everything I had to do. restarting php-fpm didn't work as it was complaining about ports already in use. But... whatever it was I did it finally came back.

Same problem here. Support 7 days no reply wao.

If an FPM restart reports a port clash, try SSHing into the system as root and running virtualmin check-config

FPM crashed is not the issue here;

Active: active (running) since Sun 2020-06-28 00:58:50 EDT; 8h ago

Ilia's picture
Submitted by Ilia on Sun, 06/28/2020 - 18:24

We are sorry for not replying sooner.

At the moment we're finishing working on a new Virtualmin release that is going to address the issue you're having.

For now, you could use CGI execution mode by setting it up in Website Options.

Thanks for the reply.

Hi Ilia, thanks for the reply and the good news!

Will the new release move away from the problematic port-based pools to named pipes? That would be more efficient, more secure, and avoid the whole port collision issue. I understand file permissions need to be set correctly for pipes, but the file permissions is what provides the security lacking with the port model, so supporting permissioned named pipes provides several benefits over port-based.

Note that switching to CGI execution is not possible on servers running HTTP/2. You would have to disable HTTP/2 (assuming you can take the performance hit of the extra connections), reconfigure the Apache threading model, and then switch each website to CGI. (Probably the CGI execution model should be deprecated and removed these days.)

"Note that switching to CGI execution is not possible on servers running HTTP/2" - this. We run everything http2 because that's a meaningful improvement to end users and SEO. Stepping back to HTTP1 is not something we'd ever want to do.

Here's why I believe it's vital core functionality (and, frankly, a far more sensible default) to have FPM working instead of FCGI:

Apple's Jiten Mehta reports 79% of requests sent by Safari use HTTP/2, and it's 1.8x faster. Great numbers on IPv6 and TLS 1.3 too.

https://developer.apple.com/videos/play/wwdc2020/10111/

HTTP/2 requires FPM. Right now by default, VirtualMin PHP sites are going to be slower than 79% of other websites.