Uploaded image for project: 'CFEngine Community'
  1. CFEngine Community
  2. CFE-1871

cf-execd wrongly kills itself (and maybe other processes) after system restart

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Priority: High
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: cf-execd
    • Labels:
      None

      Description

      On our Linux machines cf-execd randomly terminates shortly after start.

      This is bad of course, because cf-execd is essential for CFEngine.
      Obviously this program kills itself, as our analysis shows.

      Here is the setup to reproduce the problem:

      • Add a promise which calls "/sbin/shutdown -P +1" (power down) at midnight (Hr00.Min00_05).
      • All Linux machines with this promise will power down shortly after midnight.
      • Start those machines in the morning (we use wake on LAN, but that does not matter).
      • cf-execd, cf-serverd and cf-monitord will be started automatically.
      • cf-execd sees an expired lock and kills the corresponding process using its pid.
        That pid was valid before the shutdown, but is no longer valid because
        the machine was restarted. Nevertheless cf-execd will use that pid to
        kill any process which now happens to have this pid. Rather often (maybe in
        one of 50 cases = 2 %) cf-execd will kill itself.

      This bug is noticed in our scenario because machines without cf-execd no longer
      power off at midnight, so they have an uptime larger than a day. In those cases
      we always see that cf-execd is no longer running, and the last entry in the
      runlog is something like this:

      <pre>
      /var/cfengine/cf3.lc16.runlog.4:Sun Mar 29 07:23:18 2015:Lock expired, process killed:pid=852:server_cfengine.-lc16:_var_cfengine_inputs_promises_cf
      /var/cfengine/cf3.lc16.runlog.4:Sun Mar 29 07:23:18 2015:Lock expired, process killed:pid=879:monitor_cfengine.-lc16:the_monitor_daemon
      </pre>

      We believe that all versions of CFEngine show this bug, maybe with a higher propability on Linux because Windows uses more complex process identifiers.

      Any process ids which are older that the system's uptime ought to be invalidated.

      Update:

      Partially fixed by #7244. In particular see comment number "36":https://dev.cfengine.com/issues/7075#note-36 in this ticket for what remains.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Aleksei Aleksei Shpakovskii
                Reporter:
                stweil Stefan Weil
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - 2 days, 5 hours
                  2d 5h
                  Remaining:
                  Not Specified
                  Logged:
                  Time Not Required
                  Not Specified