[PATCH] drm/amd/amdgpu: move kfd post_reset out of reset_sriov function

Mon Nov 22 15:40:28 UTC 2021

Am 2021-11-18 um 11:57 a.m. schrieb shaoyunl:
> For sriov XGMI  configuration, the host driver will handle the hive reset,
> so in guest side, the reset_sriov only be called once on one device. This will
> make kfd post_reset unblanced with kfd pre_reset since kfd pre_reset already
> been moved out of reset_sriov function. Move kfd post_reset out of reset_sriov
> function to make them balance .
>
> Signed-off-by: shaoyunl <shaoyun.liu at amd.com>

Please change the headline prefix to "drm/amdgpu: ". The extra "/amd" is
redundant. And I'd also add a tag

Fixes: 9f4f2c1a3524 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in
sriov")

Note that the commit hash is the one from the drm-next branch, which is
what will get merged into master eventually. With those changes, the
patch is

Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 10c8008d1da0..9a9d5493c676 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4308,7 +4308,6 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev,
>  
>  	amdgpu_irq_gpu_reset_resume_helper(adev);
>  	r = amdgpu_ib_ring_tests(adev);
> -	amdgpu_amdkfd_post_reset(adev);
>  
>  error:
>  	if (!r && adev->virt.gim_feature & AMDGIM_FEATURE_GIM_FLR_VRAMLOST) {
> @@ -5081,7 +5080,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>  
>  	tmp_vram_lost_counter = atomic_read(&((adev)->vram_lost_counter));
>  	/* Actual ASIC resets if needed.*/
> -	/* TODO Implement XGMI hive reset logic for SRIOV */
> +	/* Host driver will handle XGMI hive reset for SRIOV */
>  	if (amdgpu_sriov_vf(adev)) {
>  		r = amdgpu_device_reset_sriov(adev, job ? false : true);
>  		if (r)
> @@ -5141,8 +5140,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>  
>  skip_sched_resume:
>  	list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
> -		/* unlock kfd: SRIOV would do it separately */
> -		if (!need_emergency_restart && !amdgpu_sriov_vf(tmp_adev))
> +		/* unlock kfd */
> +		if (!need_emergency_restart)
>  	                amdgpu_amdkfd_post_reset(tmp_adev);
>  
>  		/* kfd_post_reset will do nothing if kfd device is not initialized,