collectinfo.pl using 100% memory after update

6 posts / 0 new
Last post
#1 Tue, 07/12/2016 - 16:17
maxslug

collectinfo.pl using 100% memory after update

Hi Friendly Virtualminders,

After years of near perfect uptime with Virtualmin I have a little hiccup. I just did a couple things at once : deleted a few large virtual domains, and updated to the latest Virtualmin via apt-get (Ubuntu 14.x LTS).

Unfortunately when the webmin service starts it kicks off a collectinfo.pl job that uses 100% CPU and slowly (takes about 1-2 min) eats 100% memory. then the kernel OOM starts munching things like apache and mysql. Quick work-around is to disable webmin (service webmin stop).

Is there a verbose / trace / debug flag I can give the script to see what it's hanging up on ? I didn't see anything obvious in the script itself.

Thanks, -m

Sat, 07/16/2016 - 01:42
maxslug

any ideas? Unfortunately I have had to disable webmin in the mean time and would like to restore it.

Thanks, -m

Mon, 07/18/2016 - 03:25
maxslug

I have one more clue -- the OOM killer in Linux snipped seems to be killing 'lookup-domain.pl'. [406668.960658] /usr/share/webm invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 [407449.740591] lookup-domain.p invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 [407564.540507] lookup-domain.p invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0

ideas?

Sun, 07/31/2016 - 14:42
maxslug

Getting desperate here -- my letsencrypt certificates are now not renewing because they are installed through virtualmin.

Thanks, -m

Sun, 07/31/2016 - 15:22
maxslug

OK, I found the problem, but could use some help root-causing it for others. First I saw this post : https://www.virtualmin.com/node/40877 from @andreychek which implied a possible problem in a file in /etc/webmin/virtual-server/domains/* I didn't see any obvious ones, but in a process of elimination I found the bad file that was causing problems. It's a subdomain that was ported over from cpanel to virtualmin many many years ago. It was disabled, so that is probably what was causing problems.
I made a little script to compare a good file and the bad one and here are the differences in terms of keys in the domains/* file.

Bad missing backup_encpass
Bad missing bw_notify
Bad missing bw_usage_mail
Bad missing bw_usage_only_mail
Bad missing bw_usage_only_web
Bad missing bw_usage_web
Bad missing cgi_bin_correct
Bad missing db
Bad missing ftp
Bad missing limit_virtualmin-awstats
Bad missing limit_virtualmin-dav
Bad missing logrotate
Bad missing mysql
Bad missing mysql_enc_pass
Bad missing mysql_user
Bad missing no_mysql_db
Bad missing postgres
Bad missing spam
Bad missing stats_pass
Bad missing virtalready
Bad missing virtualmin-awstats
Bad missing virtualmin-dav
Bad missing virtualmin-mailman
Bad missing virus
Bad missing webalizer
Bad missing webmin
Good missing backup_parent_dom
Good missing backup_subdom_dom
Good missing disabled
Good missing disabled_oldpass
Good missing disabled_reason
Good missing disabled_time
Good missing disabled_why
Good missing dns_ip
Good missing dns_submode
Good missing reseller
Good missing ssl_cert
Good missing ssl_key
Good missing subdom
Good missing subprefix

Any ideas? Thanks, -m

Sun, 07/31/2016 - 20:28
andreychek

Howdy,

Do you just have that one bad file? That is, if you temporarily remove it, do you no longer have problems with collectinfo running?

However, you could always go into System Settings -> Virtualmin Config -> Status Collection, and there you can increase the time in between status collection runs, or disable it altogether.

-Eric

Topic locked