While each state in the client is executing, the timer for the Update Control Maps is still running, and isn't refreshed. This means that if the entire interval has elapsed before the state is finished, it won't be refreshed, but expired, even if a refresh would have worked.
This affects the demo more severely than production, since the intervals there are short.
Steps to reproduce:
- Start a rootfs update which will take at least a few minutes (the download needs to take longer than UpdateControlMapExpirationTimeSeconds).
- Set a pause in ArtifactCommit.
- The update fails in ArtifactCommit instead of pausing.
Unfortunately, fixing this is going to be a bit tricky. Right now we are refreshing the map using explicit states, that can only run in between other states, in other words not in parallel with operations taking place inside states. But the expiration timer runs in a Go routine, so this can still fire even though an operation is in progress. And then we don't refresh again.
I can think of a couple of solutions:
- Instead of using a timer, use a timestamp, and calculate it when needed, and in particular, after the refresh.
- Do the state operations in a Go routine (which should be relatively safe, they are pretty isolated from the rest of the code), and run the refresh in the main Go routine while we are waiting for the state operations to finish.
Of the two approaches, I suspect that number 1 is a bit simpler, but I don't know for sure, especially the part about reordering the refresh and timeout.