We have seen that in some cases (we do not know exactly when) lookup-domain.pl is causing a lot of I/O. Probably it is when it is processing email messages.
When this happens we see couple (no more than 5-6) groups of process like this one:
\_ /usr/bin/procmail-wrapper -o -a s3.trafficplanethosting.com -d user.name
21778 ? S 0:00 | \_ /usr/bin/procmail-wrapper -o -a s3.trafficplanethosting.com -d
21779 ? RN 0:00 | \_ /usr/bin/perl /usr/libexec/webmin/virtual-server/lookup-domain.pl --exitcode 73 user.name
And at this time the I/O is over 50% when these processes done their work the I/O fall to normal and the load of the server also return to normal.
I have tried to restart the lookup-domain-daemon.pl but when i execute: /etc/init.d/lookup-domain restart it returns Failed to bind to localhost port 11000 at /usr/libexec/webmin/virtual-server/lookup-domain-daemon.pl line 49.
I tried to stop it with /etc/init.d/lookup-domain stop
and to see if it actually stop but it appears to be active:
/etc/init.d/lookup-domain stop [root@s3 ~]# ps fax |grep lookup-domain 9754 pts/1 S+ 0:00 | _ grep lookup-domain 26657 ? SNs 0:10 /usr/libexec/webmin/virtual-server/lookup-domain-daemon.pl
Probably that is the cause of the error but why it is not stopping.
Int the /var/webmin/lookup-domain-daemon.log
last thousand errors are:
Too many child processes are running already
And it is continuing to fill the log with the same message.
This condition is on 2 servers already.