[Bug] When using Async IO Engine pending ops cause resume to freeze

# Describe the bug

When the Async IO Engine is used for the rootfs filesystem and there is a lot of io happening during pause and snapshot creation, there might be some pending operations completions (write/read completion) between the pause and snapshot FC (pending ops). When the vm is resumed later on, the kernel freezes - it's waiting for IO which never finishes.

This issue doesn't seem to happen when using the Sync IO Engine.

## To Reproduce

Here is a test case that reproduces the issue most of the time with added debug messages for the pending ops: https://github.com/e2b-dev/firecracker/pull/6/files#diff-d960bea365831acfb0eb3b1b548e6d22293710c9ed558f5dfbf68e016457870dR595

Example error output:
```
Starting iteration 1/100 - Testing for non-zero async I/O drain
================================================================================
Free space on sandbox start: 18G
DRAIN: pending_ops=17
Restoring from snapshot...
(frozen, nothing happens after)
```

## Expected behavior

The resume will succeed even when there are pending ops during the FC pause/snapshot.

## Environment

- Firecracker version: 1.13.1
- Host and guest kernel versions: **Guest kernels** provided by the test suite, **Host**: Linux codespaces-2de7d3 6.8.0-1030-azure #35~22.04.1-Ubuntu SMP Mon May 26 18:08:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
- Rootfs used: Provided by the tests suite (extended size to fit the operations)
- Architecture: x86_64
- Any other relevant software versions: -

## Additional context

How has this bug affected you? The resume is occasionally failing.

What are you trying to achieve? Resume a VM that has been paused previously.

Do you have any idea of what the solution might be? Not yet. My guess would be that the completions are not properly acknowledged for the guest OS.

## Checks

- [x] Have you searched the Firecracker Issues database for similar problems?
- [x] Have you read the existing relevant Firecracker documentation?
- [ ] Are you certain the bug being reported is a Firecracker issue? Not fully sure. It might be related to the io_uring.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] When using Async IO Engine pending ops cause resume to freeze #5554

Describe the bug

To Reproduce

Expected behavior

Environment

Additional context

Checks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] When using Async IO Engine pending ops cause resume to freeze #5554

Description

Describe the bug

To Reproduce

Expected behavior

Environment

Additional context

Checks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions