[PATCH] drm/amdgpu: fix a kcq hang issue for SRIOV

Alex Deucher alexdeucher at gmail.com
Tue Mar 27 16:56:04 UTC 2018


On Tue, Mar 27, 2018 at 12:30 PM, Christian König
<christian.koenig at amd.com> wrote:
> Am 27.03.2018 um 17:52 schrieb Alex Deucher:
>>
>> [SNIP]
>>>>
>>>> 2. add the new callback implementation to gfx9 and gfx8 (I think gfx8
>>>> will need this as well since we support sr-iov there too)
>>>
>>>
>>> gfx8 doesn't have the hardware bug which seems to make this necessary,
>>> not
>>> does it have the same VMHUB design as gfx9.
>>
>> Oh, right, in this case it's the req/ack engines which were new for
>> soc15.  We may want the same fix for sdma4 though.
>
>
> And exactly that is one of the reasons why this workaround doesn't work
> correctly.
>
> The SDMA is not directly connected to the GFXHUB, so even if the SDMA would
> provide a single command for this the write/wait would still be executed as
> two operations.

I'm not sure I follow.  I think there are two issues: the hw bug you
are referring to and the SR-IOV requirement that the req and the ack
can't be split by a world switch.  I believe the world switch happens
at at least packet granularity so I think for the SR-IOV requirement
using a single packet should handle it.

>
> In other words we can again run into the problem and the same thing applies
> for CPU based updates.

yeah, CPU based updates could indeed be an issue for the SR-IOV
requirement, but in that case it's easier to read back and retry.

Alex


>
> The only real workaround would be to write the request, read the register
> back and if the write didn't succeeded write it again.
>
> But seriously remember that this issue is not limited to the VMHUB
> registers. Do you want to write and read back every register to make sure
> the write succeeded?
>
> Regards,
> Christian.


More information about the amd-gfx mailing list