[PATCH] drm/ttm: partial revert "cleanup ttm_tt_(unbind|destroy)" v2
felix.kuehling at amd.com
Wed Jul 27 21:27:41 UTC 2016
We're also looking into a hang with a KFD unit test that allocates lots
of memory and fragments it deliberately, without mapping it all at once.
It's a new problem for us as we're rebasing on amd-staging-4.6.
Something weird seems to be happening with evictions, but I haven't been
able to figure it out.
I was able to see that SDMA page table updates stop working at some
point, though SDMA fences are still signaling. If I let the test run
longer, SDMA and CP hang. I dumped the SDMA IBs and didn't see anything
suspicious. My guess was that maybe the SDMA IBs or the ring are getting
corrupted, or maybe the GART table entries for the IBs or ring are
corrupted. But I haven't been able to prove that or track it down to a
root cause. We're now trying to reimplement the test using libdrm-amdgpu
APIs so we can bisect on the amd-staging-4.6 branch without KFD.
On 16-07-26 10:26 PM, Michel Dänzer wrote:
> On 22.07.2016 22:10, Christian König wrote:
>> From: Christian König <christian.koenig at amd.com>
>> We still need to unbind explicitely during a move.
> This change fixed a hang for me when running the piglit test
> max-texture-size with the radeon driver on Kaveri.
> However, there's still a similar hang left when letting the piglit test
> tex3d-maxsize run concurrently with other tests (running tex3d-maxsize
> alone doesn't hang, but fails due to running out of GPU memory; that's a
> recent radeonsi regression). There are
> [TTM] Buffer eviction failed
> messages in dmesg shortly before the hang.
> I haven't seen such hangs with older kernels. Any ideas offhand what the
> problem could be? If not, I'll try bisecting.
More information about the amd-gfx