Uploaded image for project: 'CFEngine Community'
  1. CFEngine Community
  2. CFE-1707

promise type services broken

    XMLWordPrintable

    Details

    • Platform:
      Linux
    • Found in version (details):
      CFEngine Core 3.6.1

      Description

      starting with 3.6.1 promise type services is broken at least in ubuntu environments >= 12.04

      error description:
      To ensure a service is running I used services->servicename-> service_policy "start"

      • in cfengine 3.6.0 this worked perfect.
      • in 3.6.1 an error was produced for some (see below) services that where already running
        if the service wasn't running it was started and no error was thrown.
      • in 3.6.2 no error is thrown, the service is NOT started as it should be if it's not running

      here the code to test this on 3.6.0 and 3.6.1
      to use it for 3.6.2 you have to adjust the changed library files that should be included.

      <pre>
      body common control
      {
      bundlesequence =>

      { "main", }

      ;

      inputs => {
      "${sys.inputdir}/lib/3.6/common.cf",
      "${sys.inputdir}/lib/3.6/edit_xml.cf",
      "${sys.inputdir}/lib/3.6/monitor.cf",
      "${sys.inputdir}/lib/3.6/packages.cf",
      "${sys.inputdir}/lib/3.6/files.cf",
      "${sys.inputdir}/lib/3.6/guest_environments.cf",
      "${sys.inputdir}/lib/3.6/examples.cf",
      "${sys.inputdir}/lib/3.6/storage.cf",
      "${sys.inputdir}/lib/3.6/reports.cf",
      "${sys.inputdir}/lib/3.6/bundles.cf",
      "${sys.inputdir}/lib/3.6/services.cf",
      "${sys.inputdir}/lib/3.6/feature.cf",
      "${sys.inputdir}/lib/3.6/commands.cf",
      "${sys.inputdir}/lib/3.6/processes.cf",
      "${sys.inputdir}/lib/3.6/stdlib.cf",
      "${sys.inputdir}/lib/3.6/paths.cf",
      };
      }

      bundle agent main
      {
      methods:
      "any" usebundle => test;
      }

      bundle agent test
      {
      services:
      any::
      "ssh"
      service_policy => "start",
      comment => "ensure ssh is running";
      }
      </pre>

      And now the output:
      calling the script with -I gives the following output:

      in 3.6.0:
      <pre>
      2014-10-01T14:57:19+0200 info: /default/main/methods/'any'/default/test/services/'ssh'/default/standard_services/commands/'/usr/sbin/service ssh start'[0]: Executing 'no timeout' ... '/usr/sbin/service ssh start'
      2014-10-01T14:57:19+0200 notice: /default/main/methods/'any'/default/test/services/'ssh'/default/standard_services/commands/'/usr/sbin/service ssh start'[0]: Q: "...in/service ssh ": start: Job is already running: ssh

      2014-10-01T14:57:19+0200 info: /default/main/methods/'any'/default/test/services/'ssh'/default/standard_services/commands/'/usr/sbin/service ssh start'[0]: Last 1 quoted lines were generated by promiser '/usr/sbin/service ssh start'
      2014-10-01T14:57:19+0200 info: /default/main/methods/'any'/default/test/services/'ssh'/default/standard_services/commands/'/usr/sbin/service ssh start'[0]: Completed execution of '/usr/sbin/service ssh start'
      R: standard_services: using System V service / Upstart layer to start ssh
      </pre>

      and here the output in 3.6.1
      <pre>
      2014-10-01T14:44:13+0200 info: /default/main/methods/'any'/default/test/services/'ssh'/default/standard_services/commands/'/usr/sbin/service ssh start'[0]: Executing 'no timeout' ... '/usr/sbin/service ssh start'
      2014-10-01T14:44:13+0200 info: /default/main/methods/'any'/default/test/services/'ssh'/default/standard_services/commands/'/usr/sbin/service ssh start'[0]: Command related to promiser '/usr/sbin/service ssh start' returned code not defined as promise kept, not kept or repaired; setting to failed: 1
      2014-10-01T14:44:13+0200 notice: /default/main/methods/'any'/default/test/services/'ssh'/default/standard_services/commands/'/usr/sbin/service ssh start'[0]: Q: "...in/service ssh ": start: Job is already running: ssh

      2014-10-01T14:44:13+0200 info: /default/main/methods/'any'/default/test/services/'ssh'/default/standard_services/commands/'/usr/sbin/service ssh start'[0]: Last 1 quoted lines were generated by promiser '/usr/sbin/service ssh start'
      2014-10-01T14:44:13+0200 info: /default/main/methods/'any'/default/test/services/'ssh'/default/standard_services/commands/'/usr/sbin/service ssh start'[0]: Completed execution of '/usr/sbin/service ssh start'
      R: standard_services: using System V service / Upstart layer to start ssh
      2014-10-01T14:44:13+0200 error: /default/main/methods/'any'/default/test/services/'ssh'[0]: Method 'standard_services' failed in some repairs
      2014-10-01T14:44:13+0200 error: /default/main/methods/'any'[0]: Method 'test' failed in some repairs
      </pre>

      the source of the problem (as far as I understand the 3.6 services.cf)
      is that the services promise seems to rely on the return value of the systems service start system
      (kept_returncodes).
      At least on ubuntu that is not possible due to incoherent return values like this:

      <pre>

      1. ~: all="acpid apparmor apt-cacher atop bacula-fd cfengine3 cron nagios-nrpe-server nfs-kernel-server nscd ntp postfix resolvconf rpcbind rsyslog snmpd sysstat tftpd-hpa vnstat xinetd"
      1. ~: for i in $all;do echo e "--------------\n$i"; service $i start; echo $?;done

      ---------------
      acpid
      start: Job is already running: acpid
      1
      ---------------
      apparmor

      • Starting AppArmor profiles [OK ]
        0
        ---------------
        apt-cacher
        0
        ---------------
        atop
        0
        ---------------
        bacula-fd
      • Starting Bacula File daemon... bacula-fd [ OK ]
        0
        ---------------
        cfengine3
        Starting cf-execd: /var/cfengine/bin/cf-execd already running.
        1
        ---------------
        cron
        start: Job is already running: cron
        1
        ---------------
        nagios-nrpe-server
      • Starting nagios-nrpe nagios-nrpe [ OK ]
        0
        ---------------
        nfs-kernel-server
      • Exporting directories for NFS kernel daemon... [ OK ]
      • Starting NFS kernel daemon [ OK ]
        0
        ---------------
        nscd
      • Starting Name Service Cache Daemon nscd [ OK ]
        0
        ---------------
        ntp
      • Starting NTP server ntpd [ OK ]
        0
        ---------------
        postfix
      • Starting Postfix Mail Transport Agent postfix [ OK ]
        0
        ---------------
        resolvconf
        start: Job is already running: resolvconf
        1
        ---------------
        rpcbind
        start: Job is already running: rpcbind
        1
        ---------------
        rsyslog
        start: Job is already running: rsyslog
        1
        ---------------
        snmpd
      • Starting network management services:
        0
        ---------------
        sysstat
      • Starting the system activity data collector sadc [ OK ]
        0
        ---------------
        tftpd-hpa
        start: Job is already running: tftpd-hpa
        1
        ---------------
        vnstat
      • Starting vnStat daemon vnstatd [ OK ]
        0
        ---------------
        xinetd
        start: Job is already running: xinetd
        1
        </pre>

      proposed solution:
      change the (action ? and )evaluation of the services promise outcome
      to depend on the process list rather than return values (ps ax|egrep <services_name>)

        Attachments

          Activity

            People

            • Assignee:
              a10042 Nick Anderson
              Reporter:
              robur314@gmail.com tim robur taler
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Summary Panel