Uploaded image for project: 'Mender'
  1. Mender
  2. MEN-5340

Could not reboot host: signal: terminated

    XMLWordPrintable

    Details

    • Story Points:
      1
    • Days in progress:
      0

      Description

      While rebooting the device, the Mender client gets terminated by systemd. However, if the reboot command itself hasn't finished yet, that process also get terminated by systemd (because it's in the same process group as the Mender client). This leads to a race condition causing builds to sometimes fail with this log:

      Dec 21 12:52:49 nobi-19d97246-1d2c-459c-b9ff-f4529298e25e mender[5272]: time="2021-12-21T12:52:49Z" level=info msg="Rebooting device(s)"
      Dec 21 12:52:49 nobi-19d97246-1d2c-459c-b9ff-f4529298e25e mender[5272]: Failed to fire hook: Unix syslog delivery error
      Dec 21 12:52:49 nobi-19d97246-1d2c-459c-b9ff-f4529298e25e mender[5272]: time="2021-12-21T12:52:49Z" level=error msg="transient error: Could not reboot host: signal: terminated"
      Dec 21 12:52:49 nobi-19d97246-1d2c-459c-b9ff-f4529298e25e systemd[1]: Stopping Mender OTA update service...
      Dec 21 12:52:54 nobi-19d97246-1d2c-459c-b9ff-f4529298e25e mender[5272]: time="2021-12-21T12:52:49Z" level=info msg="State transition: reboot [ArtifactReboot_Enter] -> rollback [ArtifactRollback]"
      Dec 21 12:52:56 nobi-19d97246-1d2c-459c-b9ff-f4529298e25e mender[5272]: time="2021-12-21T12:52:49Z" level=info msg="Daemon terminated with SIGTERM"
      Dec 21 12:52:56 nobi-19d97246-1d2c-459c-b9ff-f4529298e25e systemd[1]: mender-client.service: Succeeded.

      What happens is the following:

      1. The update triggers a reboot
      2. Systemd kills the reboot process first
      3. Mender notices that the reboot process got killed and goes in rollback mode
      4. Mender is killed
      5. The device reboots to the new version
      6. Now Mender is very confused and reports failure, though the device is now running the new version

      We see this happening on 10-20% of our updates.

      Possible fixes:

      • Make sure systemd doesn't kill the reboot command using SIGTERM. This could be done by setting KillMode=mixed in the systemd service
      • Make sure Mender waits a while before noticing a failed 'reboot'

       

        Attachments

          Activity

            People

            Assignee:
            a10040 Kristian Amlie
            Reporter:
            nielsavonds Niels Avonds
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Zendesk Support