[PATCH] drm/amdgpu: Attach exclusive fence to prime exported bo's. (v3)

Mon Nov 7 19:20:53 UTC 2016

On 11/07/2016 08:55 AM, Christian König wrote:
> Am 07.11.2016 um 04:29 schrieb Michel Dänzer:
>> On 07/11/16 11:47 AM, Mario Kleiner wrote:
>>> External clients which import our bo's wait only
>>> for exclusive dmabuf-fences, not on shared ones,
>>> so attach fences on exported buffers as exclusive
>>> ones, if the buffers get imported by an external
>>> client.
>>>
>>> See discussion in thread:
>>> https://lists.freedesktop.org/archives/dri-devel/2016-October/122370.html
>>>
>>>
>>> Tested on Intel iGPU + AMD Tonga dGPU as DRI3/Present
>>> Prime render offload, and with the Tonga standalone as
>>> primary gpu.
>>>
>>> v2: Add a wait for all shared fences before prime export,
>>>      as suggested by Christian Koenig.
>>>
>>> v3: - Mark buffer prime_exported in amdgpu_gem_prime_pin,
>>>      so we only use the exclusive fence when exporting a
>>>      bo to external clients like a separate iGPU, but not
>>>      when exporting/importing from/to ourselves as part of
>>>      regular DRI3 fd passing.
>>>
>>>      - Propagate failure of reservation_object_wait_rcu back
>>>      to caller.
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95472
>>> (v1) Tested-by: Mike Lothian <mike at fireburn.co.uk>
>> FWIW, I think the (v1) is usually added at the end of the line, not at
>> the beginning.

Ok.

>>
>> [...]
>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
>>> index 7700dc2..c4494d0 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
>>> @@ -81,6 +81,20 @@ int amdgpu_gem_prime_pin(struct drm_gem_object *obj)
>>>   {
>>>       struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
>>>       int ret = 0;
>>> +    long lret;
>>> +
>>> +    /*
>>> +     * Wait for all shared fences to complete before we switch to
>>> future
>>> +     * use of exclusive fence on this prime_exported bo.
>>> +     */
>>> +    lret = reservation_object_wait_timeout_rcu(bo->tbo.resv, true,
>>> false,
>>> +                           MAX_SCHEDULE_TIMEOUT);
>>> +    if (unlikely(lret < 0)) {
>>> +        DRM_DEBUG_PRIME("Fence wait failed: %li\n", lret);
>>> +        return lret;
>>> +    }
>>> +
>>> +    bo->prime_exported = true;
>> We should probably clear bo->prime_exported in amdgpu_gem_prime_unpin.
>>
>> Also, I think we should set bo->prime_exported (prime_shared?) in
>> amdgpu_gem_prime_import_sg_table as well, so that we'll also set
>> exclusive fences on BOs imported from other GPUs.
> Yes, really good idea.
>

Ok. I guess we don't need to wait for fences there before setting the 
flag/counter, as we can't have done anything yet with the bo just 
created from the imported sg_table?

> Additional to that are we sure that amdgpu_gem_prime_pin() is only
> called once for each GEM object? Could in theory be that somebody
> exports a BO to another GFX device as well as a V4L device for example.
>
> In this case we would need a counter, but I need to double check that as
> well.
>

If we want to clear the prime_exported flag in amdgpu_gem_prime_unpin() 
as an optimization if all clients detach from the buffer, then we need a 
prime_shared_counter instead of a prime_exported flag afaics. The dmabuf 
api allows multiple clients to import and attach to an exported dmabuf, 
each one calling dma_buf_attach which will call amdgpu_gem_prime_pin.

> In general I would protected the flag/counter by the BO being reserved
> to avoid any races. In other word move those lines a bit down after the
> amdgpu_bo_reserve().

Ok.

-mario

>
> Regards,
> Christian.