[Linaro-mm-sig] [PATCH 1/2] dma-buf.rst: Document why indefinite fences are a bad idea

Wed Jul 22 13:12:05 UTC 2020

On 2020-07-22 14:41, Daniel Vetter wrote:
>
> Ah I think I misunderstood which options you want to compare here. I'm
> not sure how much pain fixing up "dma-fence as memory fence" really
> is. That's kinda why I want a lot more testing on my annotation
> patches, to figure that out. Not much feedback aside from amdgpu and
> intel, and those two drivers pretty much need to sort out their memory
> fence issues anyway (because of userptr and stuff like that).
>
> The only other issues outside of these two drivers I'm aware of:
> - various scheduler drivers doing allocations in the drm/scheduler
> critical section. Since all arm-soc drivers have a mildly shoddy
> memory model of "we just pin everything" they don't really have to
> deal with this. So we might just declare arm as a platform broken and
> not taint the dma-fence critical sections with fs_reclaim. Otoh we
> need to fix this for drm/scheduler anyway, I think best option would
> be to have a mempool for hw fences in the scheduler itself, and at
> that point fixing the other drivers shouldn't be too onerous.
>
> - vmwgfx doing a dma_resv in the atomic commit tail. Entirely
> orthogonal to the entire memory fence discussion.

With vmwgfx there is another issue that is hit when the gpu signals an 
error. At that point the batch might be restarted with a new meta 
command buffer that needs to be allocated out of a dma pool. in the 
fence critical section. That's probably a bit nasty to fix, but not 
impossible.

>
> I'm pretty sure there's more bugs, I just haven't heard from them yet.
> Also due to the opt-in nature of dma-fence we can limit the scope of
> what we fix fairly naturally, just don't put them where no one cares
> :-) Of course that also hides general locking issues in dma_fence
> signalling code, but well *shrug*.
Hmm, yes. Another potential big problem would be drivers that want to 
use gpu page faults in the dma-fence critical sections with the 
batch-based programming model.

/Thomas