[PATCH] nouveau/fence: handle cross device fences properly.

Wed Jan 8 01:49:03 UTC 2025

On 8/1/25 11:04, Dave Airlie wrote:

> On Wed, 8 Jan 2025 at 02:02, Danilo Krummrich <dakr at kernel.org> wrote:
>> On Tue, Jan 07, 2025 at 03:58:46PM +1000, Dave Airlie wrote:
>>> From: Dave Airlie <airlied at redhat.com>
>>>
>>> If we have two nouveau controlled devices and one passes a dma-fence
>>> to the other, when we hit the sync path it can cause the second device
>>> to try and put a sync wait in it's pushbuf for the seqno of the context
>>> on the first device.
>>>
>>> Since fence contexts are vmm bound, check the if vmm's match between
>>> both users, this should ensure that fence seqnos don't get used wrongly
>>> on incorrect channels.
>> The fence sequence number is global, i.e. per device, hence checking the vmm
>> context seems too restrictive.
>>
>> Wouldn't it be better to ensure that `prev->cli->drm == chan->cli->drm`?
> Can you prove that? I thought the same and I've gone around a few
> times yesterday/today and convinced myself what I wrote is right.
I think Danilo is right.  Using the VMM would prevent synchronisation 
between clients on the same device, which was one of the intended purposes.
>
> dma_fence_init gets passed the seqno which comes from fctx->sequence,
> which is nouveau_fence_chan, which gets allocated for each channel.

All this code is really old and horrible, especially after not receiving 
much attention through many many DRM changes over the years.  But - all 
channels share the semaphore buffer, each with their own (fixed, based 
on channel id) offset.  There are indeed per-channel GPU VA mappings of 
the buffer in the fctx, but they all point at the same underlying memory.

The "new" exec submission path doesn't use nouveau_fence_sync() at all.  
This isn't the worst idea in the world, given various shortcomings in 
how it's currently implemented, but I've never felt confident 
*something* wouldn't regress by removing its use in the older paths (or 
buffer moves).

>
> So we should hit this path if we have 2 userspace submits, one with
> say graphics, the one with copy engine contexts, otherwise we should
> wait on the CPU.
>
>>>   drivers/gpu/drm/nouveau/nouveau_fence.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
>>> index ee5e9d40c166f..5743c82f4094b 100644
>>> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
>>> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
>>> @@ -370,7 +370,8 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan,
>>>
>>>                                rcu_read_lock();
>>>                                prev = rcu_dereference(f->channel);
>>> -                             if (prev && (prev == chan ||
>>> +                             if (prev && (prev->vmm == chan->vmm) &&
>>> +                                 (prev == chan ||
>> Maybe better break it down a bit, e.g.
>>
>> bool local = prev && (prev->... == chan->...);
>>
>> if (local && ...) {
>> ...
>> }
> I'll update that once we resolve the above.
>
> Dave.