[PATCH v3 0/8] drm: Introduce sparse GEM shmem
Alyssa Rosenzweig
alyssa at rosenzweig.io
Fri Apr 11 13:52:30 UTC 2025
> 2. Device Lost
> --------------
>
> At this point we're left with no other choice than to kill the context.
> And userspace should be able to cope with VK_DEVICE_LOST (hopefully zink
> does), but it will probably not cope well with an entire strom of these
> just to get the first frame out.
>
> Here comes the horrible trick:
>
> We'll keep rendering the entire frame by just smashing one single reserve
> page (per context) into the pte every time there's a fault. It will result
> in total garbage, and we probably want to shot the context the moment the
> VS stages have finished, but it allows us to collect an accurate estimate
> of how much memory we'd have needed. We need to pass that to the vulkan
> driver as part of the device lost processing, so that it can keep that as
> the starting point for the userspace dynamic memory requirement
> guesstimate as a lower bound. Together with the (scaled to that
> requirement) gpu driver memory pool and the core mm watermarks, that
> should allow us to not hit a device lost again hopefully.
This doesn't work if vertex stages are allowed to have side effects
(which is required for adult-level APIs and can effectively get hit with
streamout on panfrost). Once you have anything involving side effects,
you can't replay work, there's no way to cope with that. No magic Zink
can do either.
More information about the lima
mailing list