Uploaded image for project: 'CFEngine Community'
  1. CFEngine Community
  2. CFE-2638

cf-execd does not start the agent if takes too long to check networks

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Need more Info
    • Priority: High
    • Resolution: Unresolved
    • Affects Version/s: 3.6.5, 3.10.1
    • Fix Version/s: None
    • Component/s: cf-execd
    • Labels:
      None
    • Environment:
      Found it on AIX system but could be reproduced on any system with network latency

      Description

      I had an issue where the agent stops to run periodically.

      That agent is scheduled to run every two hours, at minute 0 (with a splay time less than 10 minutes), but after a few runs (4-6) after agent starts, no runs happen until a few days.

      If I restart the agent during that period, it starts to run again but It will not run again in a few runs.

      To understand what was happening i ran cf-execd in verbose and found what was preventing runs to start. cf-execd preliminary checks took 15 seconds (mostly on checking a network interface), which results in a delay between the time when cf-execd starts and when it finally checks that he has to run at the end of cf-execd( start at minute 00 second 50, ends a minute 01 second 05, which is not in minute 00 anymore so does not start the agent).

      Here is an extract of a working run (delay between interface 4 and 5):

      2017-05-12T00:00:19DFT  verbose: CFEngine Core 3.6.5
      2017-05-12T00:00:19DFT  verbose: Host name is: <my_hostname>
      2017-05-12T00:00:19DFT  verbose: Operating System Type is aix
      2017-05-12T00:00:19DFT  verbose: Operating System Release is 6.1
      2017-05-12T00:00:19DFT  verbose: Architecture = powerpc
      2017-05-12T00:00:19DFT  verbose: Using internal soft-class aix for host UP0TC022
      2017-05-12T00:00:19DFT  verbose: The time is now Fri May 12 00:00:19 2017
      2017-05-12T00:00:19DFT  verbose: Additional hard class defined as: 32_bit
      2017-05-12T00:00:19DFT  verbose: Additional hard class defined as: aix_6_1
      2017-05-12T00:00:19DFT  verbose: Additional hard class defined as: aix_powerpc
      2017-05-12T00:00:19DFT  verbose: Additional hard class defined as: aix_powerpc_6_1
      2017-05-12T00:00:19DFT  verbose: GNU autoconf class from compile time: compiled_on_aix5_3
      2017-05-12T00:00:19DFT  verbose: Address given by nameserver: xxx
      2017-05-12T00:00:19DFT  verbose: No interface exception file /var/rudder/cfengine-community/inputs/ignore_interfaces.rx
      2017-05-12T00:00:19DFT  verbose: Interface 1: en1
      2017-05-12T00:00:19DFT  verbose: Interface 2: en1
      2017-05-12T00:00:19DFT  verbose: IP address of host set to xxx
      2017-05-12T00:00:19DFT  verbose: Interface 3: en0
      2017-05-12T00:00:19DFT  verbose: Interface 4: en0
      2017-05-12T00:00:34DFT  verbose: Interface 5: lo0
      2017-05-12T00:00:34DFT  verbose: Interface 6: lo0
      2017-05-12T00:00:34DFT  verbose: Interface 7: lo0
      2017-05-12T00:00:34DFT  verbose: Trying to locate my IPv6 address
      2017-05-12T00:00:34DFT  verbose: Looking for environment from cf-monitord...
      2017-05-12T00:00:34DFT  verbose: Unable to detect environment from cf-monitord
      2017-05-12T00:00:34DFT  verbose: Found 16 processors
      2017-05-12T00:00:34DFT  verbose: Reference time set to 'Fri May 12 00:00:34 2017'
      2017-05-12T00:00:34DFT  verbose: Waking up the agent at Fri May 12 00:00:34 2017 ~ Hr00.Min00
      2017-05-12T00:00:34DFT  verbose: Sleeping for splaytime 554 seconds

      two hours later, agent was not started:

      2017-05-12T02:00:49DFT  verbose: CFEngine Core 3.6.5
      2017-05-12T02:00:49DFT  verbose: Host name is: <_hostname>
      2017-05-12T02:00:49DFT  verbose: Operating System Type is aix
      2017-05-12T02:00:49DFT  verbose: Operating System Release is 6.1
      2017-05-12T02:00:49DFT  verbose: Architecture = powerpc
      2017-05-12T02:00:49DFT  verbose: Using internal soft-class aix for host UP0TC022
      2017-05-12T02:00:49DFT  verbose: The time is now Fri May 12 02:00:49 2017
      2017-05-12T02:00:49DFT  verbose: Additional hard class defined as: 32_bit
      2017-05-12T02:00:49DFT  verbose: Additional hard class defined as: aix_6_1
      2017-05-12T02:00:49DFT  verbose: Additional hard class defined as: aix_powerpc
      2017-05-12T02:00:49DFT  verbose: Additional hard class defined as: aix_powerpc_6_1
      2017-05-12T02:00:49DFT  verbose: GNU autoconf class from compile time: compiled_on_aix5_3
      2017-05-12T02:00:49DFT  verbose: Address given by nameserver: xxx
      2017-05-12T02:00:49DFT  verbose: No interface exception file /var/rudder/cfengine-community/inputs/ignore_interfaces.rx
      2017-05-12T02:00:49DFT  verbose: Interface 1: en1
      2017-05-12T02:00:49DFT  verbose: Interface 2: en1
      2017-05-12T02:00:49DFT  verbose: IP address of host set to xxx
      2017-05-12T02:00:49DFT  verbose: Interface 3: en0
      2017-05-12T02:00:49DFT  verbose: Interface 4: en0
      2017-05-12T02:01:04DFT  verbose: Interface 5: lo0
      2017-05-12T02:01:04DFT  verbose: Interface 6: lo0
      2017-05-12T02:01:04DFT  verbose: Interface 7: lo0
      2017-05-12T02:01:04DFT  verbose: Trying to locate my IPv6 address
      2017-05-12T02:01:04DFT  verbose: Looking for environment from cf-monitord...
      2017-05-12T02:01:04DFT  verbose: Unable to detect environment from cf-monitord
      2017-05-12T02:01:04DFT  verbose: Found 16 processors
      2017-05-12T02:01:04DFT  verbose: Reference time set to 'Fri May 12 02:01:04 2017'
      2017-05-12T02:01:04DFT  verbose: Nothing to do at Fri May 12 02:01:04 2017
      2017-05-12T02:01:04DFT  verbose: Sleeping for pulse time 60 seconds...

      That happen on only one agent in my hundreds agent but i guess it could happen anywhere at anytime, and it's quite important as it makes the agent unreliable, and quite hard to understand!

      Maybe the check should be made with cf-execd start date instead of current date, what do you think of  it and would that be possible?

      I workaround it by adding slow interfaces into ignore_interfaces.rx

      (On a side note, cf-execd checks are made 1 out of 2 runs, i guess it's because of  classes that persists one minute (exactly the time between two cf-execd run, Should I open a bug with this ?)

        Attachments

          Activity

            People

            Assignee:
            a10003 Eystein Maloy Stenberg
            Reporter:
            macbuche Vincent Membré
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated: