[PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV

Liu, Monk Monk.Liu at amd.com
Wed Mar 28 04:36:34 UTC 2018


> 
The SDMA is not directly connected to the GFXHUB, so even if the SDMA would provide a single command for this the write/wait would still be executed as two operations.

I don't understand this point, more details may be ??

For SDMA from v148 ucode, it'll ignore PREEMPT command when it is doing SRBM_WRITE and POLL_MEM_REG on registers, so as long as SDMA is dong vm invalidate the world switch will not 
Interrupt it 

/Monk

-----Original Message-----
From: Koenig, Christian 
Sent: 2018年3月28日 0:30
To: Alex Deucher <alexdeucher at gmail.com>
Cc: Deng, Emily <Emily.Deng at amd.com>; Liu, Monk <Monk.Liu at amd.com>; amd-gfx list <amd-gfx at lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV

Am 27.03.2018 um 17:52 schrieb Alex Deucher:
> [SNIP]
>>> 2. add the new callback implementation to gfx9 and gfx8 (I think 
>>> gfx8 will need this as well since we support sr-iov there too)
>>
>> gfx8 doesn't have the hardware bug which seems to make this 
>> necessary, not does it have the same VMHUB design as gfx9.
> Oh, right, in this case it's the req/ack engines which were new for 
> soc15.  We may want the same fix for sdma4 though.

And exactly that is one of the reasons why this workaround doesn't work correctly.

The SDMA is not directly connected to the GFXHUB, so even if the SDMA would provide a single command for this the write/wait would still be executed as two operations.

In other words we can again run into the problem and the same thing applies for CPU based updates.

The only real workaround would be to write the request, read the register back and if the write didn't succeeded write it again.

But seriously remember that this issue is not limited to the VMHUB registers. Do you want to write and read back every register to make sure the write succeeded?

Regards,
Christian.


More information about the amd-gfx mailing list