Uploaded image for project: 'CFEngine Community'
  1. CFEngine Community
  2. CFE-1871

cf-execd wrongly kills itself (and maybe other processes) after system restart



    • Type: Bug
    • Status: Open
    • Priority: High
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: cf-execd
    • Labels:


      On our Linux machines cf-execd randomly terminates shortly after start.

      This is bad of course, because cf-execd is essential for CFEngine.
      Obviously this program kills itself, as our analysis shows.

      Here is the setup to reproduce the problem:

      • Add a promise which calls "/sbin/shutdown -P +1" (power down) at midnight (Hr00.Min00_05).
      • All Linux machines with this promise will power down shortly after midnight.
      • Start those machines in the morning (we use wake on LAN, but that does not matter).
      • cf-execd, cf-serverd and cf-monitord will be started automatically.
      • cf-execd sees an expired lock and kills the corresponding process using its pid.
        That pid was valid before the shutdown, but is no longer valid because
        the machine was restarted. Nevertheless cf-execd will use that pid to
        kill any process which now happens to have this pid. Rather often (maybe in
        one of 50 cases = 2 %) cf-execd will kill itself.

      This bug is noticed in our scenario because machines without cf-execd no longer
      power off at midnight, so they have an uptime larger than a day. In those cases
      we always see that cf-execd is no longer running, and the last entry in the
      runlog is something like this:

      /var/cfengine/cf3.lc16.runlog.4:Sun Mar 29 07:23:18 2015:Lock expired, process killed:pid=852:server_cfengine.-lc16:_var_cfengine_inputs_promises_cf
      /var/cfengine/cf3.lc16.runlog.4:Sun Mar 29 07:23:18 2015:Lock expired, process killed:pid=879:monitor_cfengine.-lc16:the_monitor_daemon

      We believe that all versions of CFEngine show this bug, maybe with a higher propability on Linux because Windows uses more complex process identifiers.

      Any process ids which are older that the system's uptime ought to be invalidated.


      Partially fixed by #7244. In particular see comment number "36":https://dev.cfengine.com/issues/7075#note-36 in this ticket for what remains.


          Issue Links



              • Assignee:
                Aleksei Aleksei Shpakovskii
                stweil Stefan Weil
              • Votes:
                0 Vote for this issue
                3 Start watching this issue


                • Created:

                  Time Tracking

                  Original Estimate - 2 days, 5 hours
                  2d 5h
                  Not Specified
                  Time Not Required
                  Not Specified