Uploaded image for project: 'Mender'
  1. Mender
  2. MEN-5421

[mender-client] gets rate-limited by the mender-server

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Priority: (None)
    • Resolution: Fixed
    • Affects Version/s: 3.2.0
    • Fix Version/s: 3.2.1
    • Labels:
    • Story Points:
      13
    • Backlog:
      yes
    • Days in progress:
      0

      Description

      At current, the client gets rate-limited by hosted-mender due to the calls to deployments/next.

      Tldr; The client will have to gracefully handle 429's from the server, when it is rate-limited by the server, due to moving to quickly.

      Note: This happens only when the client is rate-limited, which right now is hosted-mender only (unless the on-prem customer has configured this themselves).

      The issue stems from the changes in https://tracker.mender.io/browse/MEN-5096 which changed the client to poll for update-control-maps from the server upon each opportunity.

      With this change, since hosted-mender is rate-limited to 5/s on the endpoint. Therefore, when the client is too quick to go from one state to another (like in from download_leave -> install), the server will give a HTTP 429 on the POST deployments/next from the client. Then the client falls back to the POST v1 endpoint, and gets a 204, `deployment aborted from the server`.

      Relevant log-lines from a failing deployment.

      2022-01-28 11:51:16 +0000 UTC info: State transition: update-after-store [Download_Leave] -> mender-update-control-refresh-maps [none]

      2022-01-28 11:51:17 +0000 UTC info: State transition: update-install [ArtifactInstall] -> mender-update-control-refresh-maps [none]

      2022-01-28 11:51:18 +0000 UTC debug: request not accepted by the server: (POST https://hosted.mender.io/api/devices/v2/deployments/device/deployments/next): Response code: 429
      2022-01-28 11:51:18 +0000 UTC debug: Connecting to server http://localhost:46331
      2022-01-28 11:51:18 +0000 UTC debug: Request: "" "" "https" "hosted.mender.io" "/api/devices/v1/deployments/device/deployments/next"
      2022-01-28 11:51:19 +0000 UTC debug: Successful (authorized) request: (POST https://hosted.mender.io/api/devices/v1/deployments/device/deployments/next): Response code: 204
      2022-01-28 11:51:19 +0000 UTC debug: Received response:204 No Content
      2022-01-28 11:51:19 +0000 UTC debug: No update available
      2022-01-28 11:51:19 +0000 UTC error: transient error: The deployment was aborted from the server
      2022-01-28 11:51:19 +0000 UTC info: State transition: mender-update-control-refresh-maps [none] -> rollback [ArtifactRollback]
      2022-01-28 11:51:19 +0000 UTC debug: Transitioning to error state

      Reference discussion on slack: https://northern-tech.slack.com/archives/C0XM0KX9C/p1643374074791979

      The client will have to be changed in two ways:

      • Deal with being rate-limited, not treat it as any other error code.
      • Special handling when the update is running with update-control-maps. There is no point in falling back to the v1 POST endpoint when the client is using control-maps (204).

      Acceptance criteria:

      • client must gracefully handle 429's
      • client must have special handling of update polling when already going through an update with update control maps.
      • integration tests for the new functionality.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              oleorhagen Ole Petter Orhagen
              Reporter:
              oleorhagen Ole Petter Orhagen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: