[PATCH] drm/amdgpu: fix amdgpu_amdkfd_remove_eviction_fence

Christian König ckoenig.leichtzumerken at gmail.com
Thu Aug 16 18:18:56 UTC 2018


Am 16.08.2018 um 18:50 schrieb Felix Kuehling:
> On 2018-08-16 02:43 AM, Christian König wrote:
> [SNIP]
>> I mean it could be that in the worst case we race and stop a KFD
>> process for no good reason.
> Right. For a more practical example, a KFD BO can get evicted just
> before the application decides to unmap it. The preemption happens
> asynchronously, handled by an SDMA job in the GPU scheduler. That job
> will have an amdgpu_sync object with the eviction fence in it.
>
> While that SDMA job is pending or in progress, the application decides
> to unmap the BO. That removes the eviction fence from that BO's
> reservation. But it can't remove the fence from all the sync objects
> that were previously created and are still in flight. So the preemption
> will be triggered, and the fence will eventually signal when the KFD
> preemption is complete.
>
> I don't think that's something we can prevent. The worst case is that a
> preemption happens unnecessarily if an eviction gets triggered just
> before removing the fence. But removing the fence will prevent future
> evictions of the BO from triggering a KFD process preemption. That's the
> best we can do.

It's true that you can't drop the SDMA job which wants to evict the BO, 
but at this time the fence signaling is already underway and not 
stoppable anymore.

Replacing the fence with a new one would just be much more cleaner and 
fix quite a bunch of corner cases where the KFD process would be 
preempted without good reason.

It's probably quite a bit of more CPU overhead of doing so, but I think 
that this would still be the more fail prove option.

Regards,
Christian.


>
> Regards,
>    Felix
>



More information about the amd-gfx mailing list