[PATCH 1/1] drm/ttm: return -EBUSY if waiting for busy BO fails
Felix.Kuehling at amd.com
Wed Jun 26 07:04:07 UTC 2019
On 2019-06-26 2:54 a.m., Koenig, Christian wrote:
> Am 26.06.19 um 08:40 schrieb Kuehling, Felix:
>> Returning -EAGAIN prevents ttm_bo_mem_space from trying alternate
>> placements and can lead to live-locks in amdgpu_cs, retrying
>> indefinitely and never succeeding.
>> Fixes: cfcc52e477e4 ("drm/ttm: fix busy memory to fail other user v10")
>> CC: Christian Koenig <Christian.Koenig at amd.com>
>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
> Crap, I feared that this could live-lock under some circumstances, but
> hoped that this would be a rather rare case.
> How did you reproduce this?
kfdtest --gtest_filter=KFDEvictTest.* --gtest_repeat=10
It runs two processes, both of which do graphics CS and KFD compute
queues at the same time with enough memory pressure to cause frequent
KFD evictions. It's meant to test KFD eviction code paths, but ended up
finding a problem the graphics CS code path. :/
I was able to reproduce it right after your changes. With the latest
version of the branch I can't reproduce it any more. Some other commit
must have changed things enough to avoid the live lock.
I also tried writing a test that reproduced it only with amdgpu_cs calls
(without KFD), but no luck yet.
> Anyway patch is Reviewed-by: Christian König <christian.koenig at amd.com>
> for now.
>> drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index c7de667d482a..58c403eda04e 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -827,7 +827,7 @@ static int ttm_mem_evict_wait_busy(struct ttm_buffer_object *busy_bo,
>> if (!r)
>> - return r == -EDEADLK ? -EAGAIN : r;
>> + return r == -EDEADLK ? -EBUSY : r;
>> static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
More information about the amd-gfx