Uploaded image for project: 'Mender'
  1. Mender
  2. MEN-2061

Occasional kernel panic and test failure in test_reboot_recovery[simulate_powerloss_artifact_install_enter-test_set0]

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Priority: (None)
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:

      Description

      This may be one of those bugs we will never get to the bottom of, but the log was interesting so I thought a report was in order.

      It looks like the kernel occasionally panics instead of booting correctly when we are simulating a powerloss. I have attached the log from the incident in Jenkins; notice how the client log abruptly stops at the panic.

      ...
      mender-client_1         | [    2.206994] EXT4-fs (mmcblk0p2): couldn't mount as ext3 due to feature incompatibilities
      mender-client_1         | [    2.224294] random: fast init done
      mender-client_1         | [    2.233068] EXT4-fs (mmcblk0p2): INFO: recovery required on readonly filesystem
      mender-client_1         | [    2.238877] EXT4-fs (mmcblk0p2): write access will be enabled during recovery
      mender-client_1         | [    2.440121] JBD2: Invalid checksum recovering block 2 in log
      mender-client_1         | [    2.516621] JBD2: recovery failed
      mender-client_1         | [    2.520742] EXT4-fs (mmcblk0p2): error loading journal
      mender-client_1         | [    2.529380] VFS: Cannot open root device "mmcblk0p2" or unknown-block(179,2): error -5
      mender-client_1         | [    2.533118] Please append a correct "root=" boot option; here are the available partitions:
      mender-client_1         | [    2.537779] 1f00          131072 mtdblock0 
      mender-client_1         | [    2.537823]  (driver?)
      mender-client_1         | [    2.545085] 1f01           32768 mtdblock1 
      mender-client_1         | [    2.545103]  (driver?)
      mender-client_1         | [    2.551792] b300          614400 mmcblk0 
      mender-client_1         | [    2.551827]  driver: mmcblk
      mender-client_1         | [    2.558827]   b301           16384 mmcblk0p1 b4329424-01
      mender-client_1         | [    2.558860] 
      mender-client_1         | [    2.565948]   b302          221184 mmcblk0p2 b4329424-02
      mender-client_1         | [    2.565962] 
      mender-client_1         | [    2.573318]   b303          221184 mmcblk0p3 b4329424-03
      mender-client_1         | [    2.573339] 
      mender-client_1         | [    2.584947]   b304          131072 mmcblk0p4 b4329424-04
      mender-client_1         | [    2.584964] 
      mender-client_1         | [    2.592335] VFS: Unable to mount root fs on unknown-block(179,2)
      mender-client_1         | [    2.595921] User configuration error - no valid root filesystem found
      mender-client_1         | [    2.599878] Kernel panic - not syncing: Invalid configuration from end user prevents continuing
      mender-client_1         | [    2.603996] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.26-yocto-standard #1
      mender-client_1         | [    2.607742] Hardware name: ARM-Versatile Express
      mender-client_1         | [    2.612761] [<c0015da8>] (unwind_backtrace) from [<c00129ac>] (show_stack+0x10/0x14)
      mender-client_1         | [    2.616731] [<c00129ac>] (show_stack) from [<c0261e4c>] (dump_stack+0x88/0x9c)
      mender-client_1         | [    2.621124] [<c0261e4c>] (dump_stack) from [<c00aa780>] (panic+0xdc/0x248)
      mender-client_1         | [    2.625375] [<c00aa780>] (panic) from [<c065d390>] (mount_block_root+0x288/0x294)
      mender-client_1         | [    2.629343] [<c065d390>] (mount_block_root) from [<c065d49c>] (mount_root+0x100/0x108)
      mender-client_1         | [    2.633196] [<c065d49c>] (mount_root) from [<c065d5f4>] (prepare_namespace+0x150/0x198)
      mender-client_1         | [    2.637161] [<c065d5f4>] (prepare_namespace) from [<c065cec0>] (kernel_init_freeable+0x284/0x294)
      mender-client_1         | [    2.641055] [<c065cec0>] (kernel_init_freeable) from [<c04f1630>] (kernel_init+0x8/0xf0)
      mender-client_1         | [    2.644874] [<c04f1630>] (kernel_init) from [<c000f818>] (ret_from_fork+0x14/0x3c)
      mender-client_1         | [    2.649161] ---[ end Kernel panic - not syncing: Invalid configuration from end user prevents continuing
      

      If the filesystem driver is not behaving correctly, there is not much we can do, but this seems a bit unlikely given how extremely widely used it is. A couple of reasons I can think of that are alternative explanations:

      1. We are not using a good method for simulating powerloss.
        • I think it's /proc../reboot-something we are using, right? I think it should be the best one, but maybe not?
      2. There is an actual bug in our implementation, and we are not handling powerloss correctly and corrupting something.
        • Not sure what it would be, but can't be ruled out.
      3. We are somehow corrupting the partition table, which would explain why mmcblk0p2 would also be corrupted.

      This problem is happening semi-frequently, so worth keeping and eye on this and track any findings in this ticket.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                oleorhagen Ole Petter Orhagen
                Reporter:
                a10040 Kristian Amlie
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Zendesk Support

                    Summary Panel