Uploaded image for project: 'CFEngine Community'
  1. CFEngine Community
  2. CFE-1965

cf-monitord spams syslog with errors about missing thermal devices

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.7.0
    • Component/s: cf-monitord
    • Labels:
      None

      Description

      On my client VM for testing, I see blocks like this, all with the same timestamp, once every two and a half minutes: <pre>
      Jun 1 17:29:41 deb64client cf-monitord[27645]: CFEngine(monitor) cf-monitord Couldn't open '/sys/devices/virtual/thermal/thermal_zone0/temp'
      Jun 1 17:29:41 deb64client cf-monitord[27645]: CFEngine(monitor) cf-monitord Couldn't open '/sys/devices/virtual/thermal/thermal_zone1/temp'
      Jun 1 17:29:41 deb64client cf-monitord[27645]: CFEngine(monitor) cf-monitord Couldn't open '/sys/devices/virtual/thermal/thermal_zone2/temp'
      Jun 1 17:29:41 deb64client cf-monitord[27645]: CFEngine(monitor) cf-monitord Couldn't open '/sys/devices/virtual/thermal/thermal_zone3/temp'
      </pre> Each line is for a different thermal_zone%d/.
      I've got default logging (no -v or similar) so I must suppose this was logged as error or worse.
      I observe: <pre>
      root@deb64client:~/# list /sys/devices/virtual/thermal/
      cooling_device0/ cooling_device1/
      </pre>
      The file's name is surely short for "temperature" but it's called "temp" which leads to a quite different first impression about the significance of the file.
      All the same, it's only reasonable to demand the file's presence if its directory actually exists; on a machine with less than four thermal zones (e.g. a VM with none), there shall be less than four of these directories and we shouldn't spam syslog about the lack of temp(erature) files in them !
      (My non-virtual laptop has one, thermal_zone0/, which does indeed contain a temp file.)
      It might make some sense to INFO that the directory is missing, but even that feels like VERBOSE to me.

      The message comes from <pre>
      static bool GetSysThermal(double *cf_this)
      </pre> in core/cf-monitord/mon_temp.c, added by Stefan Weil in commit ca7ae0ef2ce this January.
      (I am baffled that we fgets() a line into a buffer and then sscanf() a number, when we could just fscanf() the number straight from the file and avoid potential problems with the buffer.)
      In cf3.defs.h, enum observables hard-codes the fact that there are four thermal devices.
      The related GetLMSensors() sets the four observables in question to 0.0 before it starts; however, all other code-paths through MonTempGatherData(), especially its non-Linux no-op stub, leave these data uninitialised, so I hope the caller has done their own initialisation; MonTempGatherData() has no return value, so the caller has no way of knowing whether it set these observables.

      I suggest that we should test for the directory's existence and not spam syslog with errors about missing temp files when the directory doesn't exist at all.
      Rather than looping count up to 4, we could loop count upwards until the directory doesn't exist (and maybe a few beyond that, if there's any reason to believe there might be gaps in numbering); however, that would need some more flexible arrangement for storing the answer than the enum observables and a CF_THIS array indexed by it permit.

        Attachments

          Activity

            People

            • Assignee:
              a10038 jimis (Dimitrios Apostolou)
              Reporter:
              a10050 Edward Welbourne (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Summary Panel

                  Time Tracking

                  Estimated:
                  Original Estimate - 2 hours
                  2h
                  Remaining:
                  Not Specified
                  Logged:
                  Time Not Required
                  Not Specified