My production server periodically has load average spikes of up to 100 - 200. These last for about 3-4 minutes and then the system either goes completely south requiring a reboot or it settles back down to it's norm of about 1.
I notice that these events always start about 20-25 minutes past the hour and that collectinfo.pl is always in the "run" state from a ps taken during this time.
So I guess I'm looking to know a few things: 1) Are there any known issues with collectinfo.pl that could cause it to spin out of control and/or spawn processes erroneously?
2) What would be the impact if I turned it off?
Looking to find root cause since this has devastating effects on my customers ecommerce sites. Their customers start experiencing timeouts and the the phones start ringing.
Please help if you can.