[PATCH 1/1] drm/ttm: return -EBUSY if waiting for busy BO fails
Michel Dänzer
michel at daenzer.net
Wed Jun 26 10:03:18 UTC 2019
On 2019-06-26 9:04 a.m., Kuehling, Felix wrote:
> On 2019-06-26 2:54 a.m., Koenig, Christian wrote:
>> Am 26.06.19 um 08:40 schrieb Kuehling, Felix:
>>> Returning -EAGAIN prevents ttm_bo_mem_space from trying alternate
>>> placements and can lead to live-locks in amdgpu_cs, retrying
>>> indefinitely and never succeeding.
>>>
>>> Fixes: cfcc52e477e4 ("drm/ttm: fix busy memory to fail other user v10")
>>> CC: Christian Koenig <Christian.Koenig at amd.com>
>>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
>> Crap, I feared that this could live-lock under some circumstances, but
>> hoped that this would be a rather rare case.
>>
>> How did you reproduce this?
>
> kfdtest --gtest_filter=KFDEvictTest.* --gtest_repeat=10
>
> It runs two processes, both of which do graphics CS and KFD compute
> queues at the same time with enough memory pressure to cause frequent
> KFD evictions. It's meant to test KFD eviction code paths, but ended up
> finding a problem the graphics CS code path. :/
>
> I was able to reproduce it right after your changes. With the latest
> version of the branch I can't reproduce it any more. Some other commit
> must have changed things enough to avoid the live lock.
Probably just luck, unless this was a very recent change. I'd also been
seeing live-locks between memory-heavy piglit tests, last time just this
Monday. But it didn't happen every time.
I'd been meaning to report this, but kept getting distracted by other
stuff. Thanks for beating me to it, and for even coming up with a solution!
--
Earthling Michel Dänzer | https://www.amd.com
Libre software enthusiast | Mesa and X developer
More information about the amd-gfx
mailing list