[PATCH 1/2] dmr/amdgpu: Avoid HW GPU reset for RAS.

Kuehling, Felix Felix.Kuehling at amd.com
Thu Aug 29 19:09:29 UTC 2019

On 2019-08-29 1:21 p.m., Grodzovsky, Andrey wrote:
> On 8/29/19 12:18 PM, Kuehling, Felix wrote:
>> On 2019-08-29 10:08 a.m., Grodzovsky, Andrey wrote:
>>> Agree, the placement of amdgpu_amdkfd_pre/post _reset in
>>> amdgpu_device_lock/unlock_adev is a bit wierd.
>> amdgpu_device_reset_sriov already calls amdgpu_amdkfd_pre/post_reset
>> itself while it has exclusive access to the GPU.
> So in that case amdgpu_amdkfd_pre/post_reset gets called twice - once
> from amdgpu_device_lock/unlock_adev and second time from
> amdgpu_device_reset_sriov, no ? Why is it ?

No, it's not called twice because the bare metal case has conditions if 
(!amdgpu_sriov_vf(adev)). If you don't move the 
amdgpu_amdkfd_pre/post_reset calls into a bare-metal-specific code-path 
(such as amdgpu_do_asic_reset), you'll need to keep those conditions.

>> It would make sense to
>> move the same calls into amdgpu_do_asic_reset for the bare-metal case.
> Problem is i am skipping amdgpu_do_asic_reset totally in this case as
> there is no HW reset here so i will just extract it from
> amdgpu_device_lock/unlock_adev



> Andrey
>> Regards,
>>      Felix
>>> Andrey
>>> On 8/29/19 10:06 AM, Koenig, Christian wrote:
>>>>> Felix advised that the way to stop all KFD activity is simply to NOT
>>>>> call amdgpu_amdkfd_post_reset so that why I added this. Do you mean you
>>>>> prefer amdgpu_amdkfd_post_reset to be outside of amdgpu_device_unlock_adev ?
>>>> Yes, exactly. It doesn't seems to be related to the unlock operation in
>>>> the first place, but rather only signals the KFD that the reset is
>>>> completed.
>>>> Christian.
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

More information about the amd-gfx mailing list