Affects Version/s: None
Fix Version/s: None
On our Linux machines cf-execd randomly terminates shortly after start.
This is bad of course, because cf-execd is essential for CFEngine.
Obviously this program kills itself, as our analysis shows.
Here is the setup to reproduce the problem:
- Add a promise which calls "/sbin/shutdown -P +1" (power down) at midnight (Hr00.Min00_05).
- All Linux machines with this promise will power down shortly after midnight.
- Start those machines in the morning (we use wake on LAN, but that does not matter).
- cf-execd, cf-serverd and cf-monitord will be started automatically.
- cf-execd sees an expired lock and kills the corresponding process using its pid.
That pid was valid before the shutdown, but is no longer valid because
the machine was restarted. Nevertheless cf-execd will use that pid to
kill any process which now happens to have this pid. Rather often (maybe in
one of 50 cases = 2 %) cf-execd will kill itself.
This bug is noticed in our scenario because machines without cf-execd no longer
power off at midnight, so they have an uptime larger than a day. In those cases
we always see that cf-execd is no longer running, and the last entry in the
runlog is something like this:
/var/cfengine/cf3.lc16.runlog.4:Sun Mar 29 07:23:18 2015:Lock expired, process killed:pid=852:server_cfengine.-lc16:_var_cfengine_inputs_promises_cf
/var/cfengine/cf3.lc16.runlog.4:Sun Mar 29 07:23:18 2015:Lock expired, process killed:pid=879:monitor_cfengine.-lc16:the_monitor_daemon
We believe that all versions of CFEngine show this bug, maybe with a higher propability on Linux because Windows uses more complex process identifiers.
Any process ids which are older that the system's uptime ought to be invalidated.
Partially fixed by #7244. In particular see comment number "36":https://dev.cfengine.com/issues/7075#note-36 in this ticket for what remains.