Uploaded image for project: 'CFEngine Community'
  1. CFEngine Community
  2. CFE-1966

Client can't bootstrap on master (and hub reports errors while doing so)

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Rejected
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Bootstrap
    • Labels:
      None

      Description

      I installed a master package from
      http://10.100.250.45/packages/testing-enterprise-master/jenkins-bootstrap-enterprise-master-374/
      on my local test-hub VM (having first cleaned it out, as usual)
      and bootstrapped as usual: <pre>
      root@deb64hub:~/# cf-agent --bootstrap 192.168.122.198
      notice: Q: ".../cf-execd"": error: Can't stat file '/var/cfengine/inputs/lib/3.8/stdlib.cf' for parsing. (stat: No such file or directory)
      Q: ".../cf-execd"": error: Policy failed validation with command '"/var/cfengine/bin/cf-promises" -c "/var/cfengine/inputs/promises.cf"'
      Q: ".../cf-execd"": error: CFEngine was not able to get confirmation of promises from cf-promises, so going to failsafe
      Q: ".../cf-execd"":
      notice: Q: "...f-serverd"": error: Can't stat file '/var/cfengine/inputs/lib/3.8/stdlib.cf' for parsing. (stat: No such file or directory)
      Q: "...f-serverd"": error: Policy failed validation with command '"/var/cfengine/bin/cf-promises" -c "/var/cfengine/inputs/promises.cf"'
      Q: "...f-serverd"": error: CFEngine was not able to get confirmation of promises from cf-promises, so going to failsafe
      Q: "...f-serverd"":
      R: This host assumes the role of policy server
      R: Updated local policy from policy server
      R: Started the server
      R: Started the scheduler
      notice: Bootstrap to '192.168.122.198' completed successfully!
      root@deb64hub:~/# ls -ld /var/cfengine/masterfiles/lib/3.8/stdlib.cf
      rw------ 1 root root 3254 Jun 2 17:12 /var/cfengine/masterfiles/lib/3.8/stdlib.cf
      </pre> As you can see, the file reported missing from inputs/ is present in masterfiles.
      All the usual expected processes were running, despite the errors, so I carried on with trying to bootstrap clients: <pre>
      root@deb64client:~/# cf-agent --bootstrap 192.168.122.198
      notice: Bootstrap mode: implicitly trusting server, use --trust-server=no if server trust is already established
      notice: Trusting new key: SHA=ce62d7f6a314bc6331a41027506431e451d4a8a48f74d63ca1799b2f47f3ed56
      error: Connection unexpectedly closed. SSL_read: socket closed
      error: Connection was hung up while receiving line:
      error: Connection was hung up during identification! (3)
      error: No suitable server found
      error: Connection unexpectedly closed. SSL_read: socket closed
      error: Connection was hung up while receiving line:
      error: Connection was hung up during identification! (3)
      error: No suitable server found
      R: This autonomous node assumes the role of voluntary client
      R: Failed to copy policy from policy server at 192.168.122.198:/var/cfengine/masterfiles
      Please check

      • cf-serverd is running on 192.168.122.198
      • CFEngine version on the policy hub is 3.6.0 or latest - otherwise you need to tweak the protocol_version setting
      • network connectivity to 192.168.122.198 on port 5308
      • masterfiles 'body server control' - in particular allowconnects, trustkeysfrom and skipverify
      • masterfiles 'bundle server' -> access: -> masterfiles -> admit/deny
        It is often useful to restart cf-serverd in verbose mode (cf-serverd -v) on 192.168.122.198 to diagnose connection issues.
        When updating masterfiles, wait (usually 5 minutes) for files to propagate to inputs on 192.168.122.198 before retrying.
        R: Did not start the scheduler
        notice: Q: ".../cf-agent" -f u": error: There is no readable input file at '/var/cfengine/inputs/update.cf'. (stat: No such file or directory)
        Q: ".../cf-agent" -f u": error: CFEngine was not able to get confirmation of promises from cf-promises, so going to failsafe
        Q: ".../cf-agent" -f u": error: Connection unexpectedly closed. SSL_read: socket closed
        Q: ".../cf-agent" -f u": error: Connection was hung up while receiving line:
        Q: ".../cf-agent" -f u": error: Connection was hung up during identification! (3)
        Q: ".../cf-agent" -f u": error: No suitable server found
        Q: ".../cf-agent" -f u": error: Connection unexpectedly closed. SSL_read: socket closed
        Q: ".../cf-agent" -f u": error: Connection was hung up while receiving line:
        Q: ".../cf-agent" -f u": error: Connection was hung up during identification! (3)
        Q: ".../cf-agent" -f u": error: No suitable server found
        Q: ".../cf-agent" -f u": R: Failed to copy policy from policy server at 192.168.122.198:/var/cfengine/masterfiles
        Q: ".../cf-agent" -f u": Please check
        Q: ".../cf-agent" -f u": * cf-serverd is running on 192.168.122.198
        Q: ".../cf-agent" -f u": * CFEngine version on the policy hub is 3.6.0 or latest - otherwise you need to tweak the protocol_version setting
        Q: ".../cf-agent" -f u": * network connectivity to 192.168.122.198 on port 5308
        Q: ".../cf-agent" -f u": * masterfiles 'body server control' - in particular allowconnects, trustkeysfrom and skipverify
        Q: ".../cf-agent" -f u": * masterfiles 'bundle server' -> access: -> masterfiles -> admit/deny
        Q: ".../cf-agent" -f u": It is often useful to restart cf-serverd in verbose mode (cf-serverd -v) on 192.168.122.198 to diagnose connection issues.
        Q: ".../cf-agent" -f u": When updating masterfiles, wait (usually 5 minutes) for files to propagate to inputs on 192.168.122.198 before retrying.
        Q: ".../cf-agent" -f u": R: Did not start the scheduler
        Q: ".../cf-agent" -f u": notice: Q: ".../cf-agent" -f u": error: There is no readable input file at '/var/cfengine/inputs/update.cf'. (stat: No such file or directory)
        Q: ".../cf-agent" -f u": Q: ".../cf-agent" -f u": error: CFEngine was not able to get confirmation of promises from cf-promises, so going to failsafe
        Q: ".../cf-agent" -f u": Q: ".../cf-agent" -f u":
        Q: ".../cf-agent" -f u":
        error: Bootstrapping failed, no input file at '/var/cfengine/inputs/promises.cf' after bootstrap
        </pre>
        Which is Bad.

      Let's run through the things it told me to check.
      First, cf-serverd is running (and not wedged) on 192.168.122.198: <pre>
      root@deb64hub:~/# ifconfig eth0
      eth0 Link encap:Ethernet HWaddr 52:54:00:d4:2a:22
      inet addr:192.168.122.198 Bcast:192.168.122.255 Mask:255.255.255.0
      inet6 addr: fe80::5054:ff:fed4:2a22/64 Scope:Link
      UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
      RX packets:242449 errors:0 dropped:6 overruns:0 frame:0
      TX packets:127868 errors:0 dropped:0 overruns:0 carrier:0
      collisions:0 txqueuelen:1000
      RX bytes:233697840 (222.8 MiB) TX bytes:21645057 (20.6 MiB)

      root@deb64hub:~/# psg cf-serverd
      USER PID PPID PGID %CPU %MEM NI SZ VSZ RSS NLWP STIME ELAPSED TIME COMMAND
      root 28317 1 28317 0.0 0.8 0 36032 144128 8332 1 17:15 06:13 00:00:00 /var/cfengine/bin/cf-serverd
      root@deb64hub:~/# gdb --batch -se /var/cfengine/bin/cf-serverd -p 28317 -f -ex 'info thread' -ex dis
      [Thread debugging using libthread_db enabled]
      Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
      0x00007fdffa30e2b3 in select () at ../sysdeps/unix/syscall-template.S:81
      81 ../sysdeps/unix/syscall-template.S: No such file or directory.
      Id Target Id Frame

      • 1 Thread 0x7fdffc472700 (LWP 28317) "cf-serverd" 0x00007fdffa30e2b3 in select () at ../sysdeps/unix/syscall-template.S:81
        root@deb64hub:~/# gdb --batch -se /var/cfengine/bin/cf-serverd -p 28317 -f -ex 'bt' -ex dis
        [Thread debugging using libthread_db enabled]
        Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
        0x00007fdffa30e2b3 in select () at ../sysdeps/unix/syscall-template.S:81
        81 ../sysdeps/unix/syscall-template.S: No such file or directory.
        #0 0x00007fdffa30e2b3 in select () at ../sysdeps/unix/syscall-template.S:81
        #1 0x00000000004088fd in WaitForIncoming (sd=<optimized out>) at cf-serverd-functions.c:761
        #2 StartServer (ctx=<optimized out>, policy=<optimized out>, config=<optimized out>) at cf-serverd-functions.c:875
        #3 0x000000000040786b in main (argc=<optimized out>, argv=<optimized out>) at cf-serverd.c:68
        </pre> Yup.

      CFEngine version on the policy hub is 3.6.0 or latest (I guess that should say later !): <pre>
      root@deb64hub:~/# cf-agent --version
      CFEngine Core 3.8.0a1.008c0a0
      CFEngine Enterprise 3.8.0a1.98499c7
      </pre> Yup; looks like latest to me.

      Third: network connectivity to 192.168.122.198 on port 5308 <pre>
      root@deb64client:~/# ping 192.168.122.198
      PING 192.168.122.198 (192.168.122.198) 56(84) bytes of data.
      64 bytes from 192.168.122.198: icmp_seq=1 ttl=64 time=0.209 ms
      64 bytes from 192.168.122.198: icmp_seq=2 ttl=64 time=0.218 ms
      64 bytes from 192.168.122.198: icmp_seq=3 ttl=64 time=0.188 ms
      64 bytes from 192.168.122.198: icmp_seq=4 ttl=64 time=0.184 ms
      ^C
      — 192.168.122.198 ping statistics —
      4 packets transmitted, 4 received, 0% packet loss, time 2999ms
      rtt min/avg/max/mdev = 0.184/0.199/0.218/0.022 ms
      root@deb64client:~/# telnet 192.168.122.198 5308
      Trying 192.168.122.198...
      Connected to 192.168.122.198.
      Escape character is '^]'.
      ^]
      telnet> Connection closed.
      </pre> Yup.

      Fourth: masterfiles 'body server control' - in particular allowconnects, trustkeysfrom and skipverify
      hmm ... it should probably tell me which file to look in for that ... but I find it in masterfiles/controls/3.8/cf_serverd.cf; of course, I'm using the default, but let's check anyway:

      • allowconnects includes @def.acl which includes "$(sys.policy_hub)/16"; my client is 192.168.122.165 and the policy hub is 192.168.122.198, so I think we should be good; unless getvalues("override_data_acl") got used instead, in which case I have no clue (and I have doubts about the user being able to find out);
      • trustkeysfrom has "0.0.0.0/0" as an entry, unless getvalues("override_data_trustkeysfrom") prevented that;
      • skipverify is nowhere to be seen. Is that good or bad ? The help message didn't tell me.

      Fifth: masterfiles 'bundle server' -> access: -> masterfiles -> admit/deny
      The same file has a bundle server, so let's go with that.
      Its access: -> masterfiles has admit =>

      { @(def.acl) }

      ; and no deny.
      So I'm left with no idea why the client can't bootstrap.

      Server and client quoted above are both Debian 8 using the Debian 6 and 4 (respectively) nova packages.
      I've also tried bootstrapping a CentOS 6.5 client using the RedHat 4 nova package, with similar results.

        Attachments

          Activity

            People

            • Assignee:
              a10038 jimis (Dimitrios Apostolou)
              Reporter:
              a10050 Edward Welbourne (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Summary Panel