[PATCH] drm/doc: Start documenting aspects specific to tile-based renderers
Alyssa Rosenzweig
alyssa at rosenzweig.io
Mon Apr 28 13:45:26 UTC 2025
> BTW, is there a piece of doc explaining the rational behind this
> dma_fence contract, or is it just the usual informal knowledge shared
> among DRM devs over IRC/email threads :-) ?
>
> To be honest, I'm a bit unhappy with this "it's part of the dma_fence
> contract" explanation, because I have a hard time remembering all the
> details that led to these set of rules myself, so I suspect it's even
> harder for new comers to reason about this. To me, it's one of the
> reasons people fail to understand/tend to forget what the
> problems/limitations are, and end up ignoring them (intentionally or
> not).
>
> FWIW, this is what I remember, but I'm sure there's more:
>
> 1. dma_fence must signal in finite time, so unbounded waits in the
> fence signalling path path is not good, and that's what happens with
> GFP_KERNEL allocations
> 2. if you're blocked in your GPU fault handler, that means you can't
> process further faults happening on other contexts
> 3. GPU drivers are actively participating in the memory reclaim
> process, which leads to deadlocks if the memory allocation in the
> fault handler is waiting on the very same GPU job fence that's
> waiting for its memory allocation to be satisfied
>
> I'd really love if someone (Sima, Alyssa and/or Christian?) could sum it
> up, so I can put the outcome of this discussion in some kernel doc
> entry (or maybe it'd be better if this was one of you submitting a
> patch for that ;-)). If it's already documented somewhere, I'll just
> have to eat my hat and accept your RTFM answer :-).
https://www.kernel.org/doc/html/next/driver-api/dma-buf.html#dma-fence-cross-driver-contract
Specifically
Drivers are allowed to call dma_fence_wait() from their shrinker
callbacks. This means any code required for fence completion cannot
allocate memory with GFP_KERNEL.
Concretely:
* Job requires memory allocation to signal a fence
* We're in a low memory situation, so the shrinker is invoked
* The shrinker can't free memory until the job finishes
* Deadlock!
Possibly we could relax the contract to let us reclaim non-graphics
memory, but that's not my department.
More information about the lima
mailing list