Uploaded image for project: 'CFEngine Community'
  1. CFEngine Community
  2. CFE-2306

Possible performance issue with many variables

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Priority: (None)
    • Resolution: Fixed
    • Affects Version/s: 3.7.2
    • Fix Version/s: None
    • Component/s: Evaluation
    • Labels:
      None

      Description

      Hello,

      In the latest LTS version (3.7.2) a least, I believe there is a performance issue as soon as thousands of variables are used with arrays.
      I'm not sure the culprit is either the number of variables or the utilization of arrays.

      Both cf-promises and cf-agent seem affected, because the variables are parsed for both validation and execution steps.

      This is really a concern for us, because the overall CFEngine agent execution (cf-promises validation, cf-agent parser through several passes) lasts nearly 2min in our setup.

      I wrote a small Python script to generate a policy file with many variables, the generated policy file looks like:

      <pre>

      body common control
      {
      bundlesequence =>

      {"test" };
      }

      bundle agent test
      {
      vars:

      "repo[array_key0][array_key0]" string => "array_value0";
      "repo[array_key0][array_key1]" string => "array_value1";
      "repo[array_key0][array_key2]" string => "array_value2";
      "repo[array_key0][array_key3]" string => "array_value3";
      "repo[array_key0][array_key4]" string => "array_value4";

      "repo[array_key1][array_key0]" string => "array_value1";
      "repo[array_key1][array_key1]" string => "array_value2";
      "repo[array_key1][array_key2]" string => "array_value3";
      "repo[array_key1][array_key3]" string => "array_value4";
      "repo[array_key1][array_key4]" string => "array_value5";

      (...)
      </pre>

      Here are below the timings on a low-end server for several COUNT blocks:

      <pre>
      $ for C in $(seq 100 100 1000); do echo -e "\nCOUNT=${C}" && ./array.py $C && time ~/tmp/cfe372/bin/cf-promises -f ./array.cf ;done

      COUNT=100

      real 0m1.059s
      user 0m1.041s
      sys 0m0.019s

      COUNT=200

      real 0m3.880s
      user 0m3.850s
      sys 0m0.029s

      COUNT=300

      real 0m8.963s
      user 0m8.923s
      sys 0m0.039s

      COUNT=400

      real 0m17.354s
      user 0m17.306s
      sys 0m0.049s

      COUNT=500

      real 0m28.650s
      user 0m28.604s
      sys 0m0.045s

      COUNT=600

      real 0m44.907s
      user 0m44.863s
      sys 0m0.039s

      COUNT=700

      real 1m5.090s
      user 1m5.022s
      sys 0m0.063s

      COUNT=800

      real 1m29.016s
      user 1m28.952s
      sys 0m0.061s

      COUNT=900

      real 2m6.468s
      user 2m6.394s
      sys 0m0.065s

      COUNT=1000

      real 2m46.527s
      user 2m46.414s
      sys 0m0.105s
      </pre>

      <pre>
      #!/usr/bin/python3
      #coding: utf-8

      import sys


      CF_FILE = "array.cf"

      try:
      COUNT = int(sys.argv[1])
      except:
      COUNT = 100


      def write_header(fh):

      fh.write("""
      body common control
      {
      bundlesequence => {"test" }

      ;
      }

      bundle agent test

      { vars: """) def write_footer(fh): fh.write(""" }

      """)

      def write_content(fh, count):
      """ Write count time 5 lines
      """

      for i in range(COUNT):

      1. if i % 10 == 0:
      2. fh.write("classe{}::\n".format)
        for j in range(5):
        line = '\t"repo[array_key{}][array_key{}]" string => "array_value{}";\n'.format(i, j, i+j)
        fh.write(line)
        fh.write("\n")

      if _name_ == '_main_':
      with open(CF_FILE, 'w') as fh:
      write_header(fh)
      write_content(fh, COUNT)
      write_footer(fh)

      </pre>

      It looks like the time complexity is exponential.

      Not sure, but a quick analysis with Valgrind (callgrind) shows that the cost of malloc()/free() appears to be quite huge.
      A memory pool should really help?

      If this is something expected, maybe cf-promises or cf-agent should notify the user about the costs of processing a huge number of variables?

      Thanks

        Attachments

        1. array.py
          0.8 kB
        2. callgrind.out.9406
          454 kB
        3. valgrind.jpg
          valgrind.jpg
          41 kB

          Issue Links

            Activity

              People

              • Assignee:
                a10042 Nick Anderson
                Reporter:
                loic06 Loic Pefferkorn
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel