[PATCH] drm/amdgpu: fix ib test hang with gfxoff enabled

Christian König christian.koenig at amd.com
Fri Jun 1 10:09:35 UTC 2018


Am 01.06.2018 um 11:29 schrieb Huang Rui:
> On Fri, Jun 01, 2018 at 05:13:49PM +0800, Christian König wrote:
>> Am 01.06.2018 um 08:41 schrieb Huang Rui:
>>> After defer the execution of gfx/compute ib tests. However, at that time, the
>>> gfx already go into "mid state" of gfxoff.
>>>
>>> PWR_MISC_CNTL_STATUS: PWR_GFXOFF_STATUS field (2:1 bits)
>>> 0 = GFXOFF.
>>> 1 = Transition out of GFXOFF state.
>>> 2 = Not in GFXOFF.
>>> 3 = Transition into GFXOFF.
>>>
>>> If hit the mid state (1 or 3), the doorbell writing interrupt cannot wake up the
>>> gfx back successfully. And the field value is 1 when we issue the ib test at
>>> that, so we got the hang. This is the root cause that we encountered the issue.
>>>
>>> Meanwhile, we cannot set clockgating of GFX after gfx is already in "off" state.
>>> So here we should move the gfx powergating and gfxoff enabling behavior at the
>>> end of initialization behind ib test and clockgating.
>> Mhm, that still looks like a only halve backed solution:
>>
>> 1. What prevents this bug from happening during "normal" IB submission
>> from userspace?
>>
>> 2. Shouldn't we poll the PWR_MISC_CNTL_STATUS register to make sure we
>> are not in any transition phase instead?
>>
> Yes, right. How about also add polling of PWR_MISC_CNTL_STATUS in
> amdgpu_ring_commit() behind set_wptr that confirm the status as "0" or "2"?

You could add an end_use() callback for that, but I think we rather need 
to do this in gfx_v9_0_ring_set_wptr_gfx() before we write the doorbell.

Christian.

>
> Thanks,
> Ray



More information about the amd-gfx mailing list