Uploaded image for project: 'CFEngine Community'
  1. CFEngine Community
  2. CFE-1043

cf-serverd broken behavior in mixed IPv4/IPv6 environments

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Priority: High
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Network
    • Labels:
      None
    • Found in version (details):
      3.5.3

      Description

      Related to #2419 and known issue https://cfengine.com/docs/3.5/getting-started-known-issues.html#on-windows-platforms-cf-serverd-listens-only-to-ipv6-interface - tested environment is mixed FreeBSD 8.x-REL, 9.2-REL, and 10-BETA4 using 3.5.2 and 3.5.3. (Not related to binding to multiple interfaces #2922.) Distinct in that this demonstrates cases where a configuration will still pass tests when it should not.

      With a mixed v4/v6 environment, the following breakage is present:

      cf-serverd:

      • Can and does pass DNS tests in misconfigured environments, e.g. IPv4 DNS missing for v4 listener but v6 DNS present passes.
      • Does not test DNS against correct listener stack. e.g. v6 listener passes with no AAAA if IPv4 A is present.
      • Incorrectly treats IPv6 link-local as a usable IPv6 address even when no other IPv6 addresses are present.
      • $(sys.policy_hub) returns IPv4 vs IPv6 inconsistently, which may break promises using it.
      • Returns as IPv6 even with IPv4 bindtointerface explicitly configured.

      It is difficult and can be substantially risky to fully disable IPv6 on FreeBSD 9.0 and later as well; it requires rebuilding the kernel with IPv6 support removed. (And force disabling link-local has serious ramifications for OS stability, which is to say, it removes all hope of having any.) So I would classify these issues as 'no workaround available.' A related cf-agent issue is at #3872 with a fix submitted.

      To confirm that $(sys.policy_hub) is also behaving brokenly, I set up a fairly simple promise:
      <pre>
      bundle agent test_copy
      {
      files:
      "/tmp/testcopy"
      comment => "Testing sys.policy_hub"
      copy_from => secure_cp("/tmp/testcopy","$(sys.policy_hub)");
      reports:
      "$(sys.policy_hub) ran testfile on $(this.host)";
      }
      </pre>

      Result from Policy Hub and Client
      <pre>
      root@teldrassil ~ # getalladdr teldrassil.dn.INT
      192.168.1.35, ffff:fff:ff:f::aa:35

      root@teldrassil ~ # netstat -al |grep cfeng
      tcp6 0 0 .cfengine *. LISTEN

      2013-12-14T20:35:38-0500 notice: R: --> I'm a policy hub.
      2013-12-14T20:35:38-0500 error: Unable to establish any connection with server.
      2013-12-14T20:35:38-0500 notice: R: 192.168.1.35 ran testfile on teldrassil
      </pre>
      Note above how the policy hub has gone ahead and decided (and appears to be succeeding) to connect using IPv4 address. Even though netstat -al and telnet both confirm that it is absolutely not listening on any IPv4 address whatsoever. It also didn't bother to use DNS even though it's configured as teldrassil.dn.INT and A/AAAA line up.

      Client 1:
      <pre>
      root@dawnstrider ~ # findptr /etc/named/slave/* "teldrassil"
      teldrassil.dn.INT / 192.168.1.35
      teldrassil.dn.INT / ffff:fff:ff:f::aa:35
      hb.teldrassil.dn.INT / 10.0.100.35

      2013-12-14T20:36:21-0500 notice: R: --> ffff:fff:ff:f::aa:35 is my policy hub.
      2013-12-14T20:36:21-0500 notice: R: --> CFEngine is running on dawnstrider.dn.INT
      2013-12-14T20:36:21-0500 notice: R: ffff:fff:ff:f::aa:35 ran testfile on dawnstrider
      </pre>

      Now, a misconfigured environment that doesn't fail where it absolutely should.
      <pre>
      root@rivermane ~ # netstat -al |grep cfeng
      Active Internet connections (including servers)
      Proto Recv-Q Send-Q Local Address Foreign Address (state)
      tcp6 0 0 .cfengine *. LISTEN

      root@rivermane ~ # grep -R bindtoint /var/cfengine/masterfiles/* |wc -l
      0

      root@rivermane ~ # ifconfig -a |grep "::"
      inet6 fe80::20c:29ff:AAAA:AAAA%vmx3f0 prefixlen 64 scopeid 0x1
      inet6 ::1 prefixlen 128
      inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
      (Obviously link-local only, and CFEngine should be totally unreachable via IPv4.)

      root@rivermane ~ # cf-agent --dry-run -B rivermane.dn.INT
      2013-12-14T20:49:53-0500 error: Unable to establish any connection with server.
      ...
      2013-12-14T20:49:53-0500 notice: Bootstrap to 'ffff:fff:ff:f::aa:22' completed successfully!
      </pre>
      Yes, ffff:fff:ff:f::aa:22 is the correct AAAA for rivermane.dn.INT but as you can see, it is not configured on the host and the network is not reachable from the host. And it cannot possibly be bound to an address which is not even reachable on the host.

      Now I'm really going to break it; I'm going to take away DNS for the IPv6 address (remove the AAAA) but preserve the IPv4 address and I'm going to use bindtointerface for the v6 address...
      <pre>
      root@rivermane ~ # getalladdr rivermane.dn.INT
      192.168.1.22

      root@rivermane ~ # rgrep "bindtoint" /var/cfengine/masterfiles/*
      /var/cfengine/masterfiles/controls/cf_serverd.cf:
      bindtointerface => "ffff:fff:ff:f::aa:22";

      root@rivermane ~ # netstat -al |grep cfeng
      tcp6 0 0 rivermane.dn.INT.cfengine . LISTEN

      root@rivermane ~ # cf-agent --dry-run -B rivermane.dn.INT
      ...
      2013-12-14T23:35:34-0500 notice: Bootstrap to '192.168.1.22' completed successfully!
      </pre>
      Now we flip it the other way - take away IPv4 address, remove bindtointerface, test it...
      <pre>
      root@rivermane ~ # getalladdr rivermane.dn.INT
      ffff:fff:ff:f::aa:22

      root@rivermane ~ # cf-agent --dry-run -B rivermane.dn.INT
      ...
      2013-12-14T23:38:34-0500 notice: Bootstrap to '2001:470:8:1281::aa:22' completed successfully!
      </pre>

      And now we add "bindtointerface => "192.168.1.22";" to it and set up a configuration which should fail a DNS check. There is no A for rivermane.dn.INT, only an AAAA. And it's explicitly using IPv4.
      <pre>
      root@rivermane ~ # rgrep "bindtoint" /var/cfengine/masterfiles/*
      /var/cfengine/masterfiles/controls/cf_serverd.cf:
      bindtointerface => "192.168.1.22";

      root@rivermane ~ # netstat -al |grep cfeng
      tcp4 0 0 rivermane.dn.INT.cfengine . LISTEN

      root@rivermane ~ # getalladdr rivermane.dn.INT
      ffff:fff:ff:f::aa:22

      root@rivermane ~ # getaddr rivermane
      AF_UNSPEC: ffff:fff:ff:f::aa:22
      AF_INET: EAI_NONAME

      root@rivermane ~ # cf-agent --dry-run -B rivermane.dn.INT
      ...
      2013-12-15T01:08:34-0500 notice: Bootstrap to '2001:470:8:1281::aa:22' completed successfully!
      </pre>

      This is obviously pretty badly broken behavior.

      So, about cf-serverd/cf-serverd-functions.c, OpenReceiverChannel():
      There's been discussion over on #2419 on this and honestly, I'd consider this just plain broken at current, since absent explicit override it automatically binds to any IPv6 address and doesn't actually fail on getaddrinfo() at 490. That check actually can't work at current. See above where host binds to link-local and doesn't error out on AAAA/PTR mismatch or the fact that it's not bound to the address.

      As far as technical arguments go against the use of AF_UNSPEC go, this usage should never return an IPv4 address in a mixed stack environment. That's entirely normal and expected behavior; if you want the IPv4 you need to set ai_family = AF_INET to get it. Otherwise it only returns IPv6. However, it will contain both IPv4 and IPv6 information.

      So part of the reason it isn't throwing an error is because CFEngine is making bad use of AF_UNSPEC here. AF_UNSPEC means that getaddrinfo() should pass because even though cf-serverd is binding AF_INADDR6? The PTR matches an available interface and the A matches the PTR. So even if the v6 fails, the v4 can pass - which the user doesn't see because AF_INET is not set so only v6 data is returned.
      Thus when it hits 497, because ai_family = AF_UNSPEC the v4 is not visible, but ap != NULL. So you get the v6 data for a v4 check that passes. (Linux does the same with a test program.) That also means that the test at 490 will and does pass even in misconfigured environments - v6 listener with only v4 DNS and v4 listener with only v6 DNS both passed. Because it's not actually testing against the listening address, this test will almost never fail, even when it should.

      There is also no test for a non-link-local IPv6 address. If the only IPv6 address available is link-local (IN6_IS_ADDR_LINKLOCAL - RFC3493) this should be treated as an error condition as far as IPv6 is concerned, and ai_family should be set AF_INET as though force_ipv4 were true. #2922 should render this moot, but at current, the absence guarantees a misconfigured host (AAAA but no A, bindtoaddress IPv4, OS has IPv6 enabled but address is not configured or SLAAC/DHCPv6 not completed) will pass all tests successfully but actually be broken.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                a10038 jimis (Dimitrios Apostolou)
                Reporter:
                rootwyrm Phillip Jaenke
              • Votes:
                1 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Summary Panel