[v3 2/5] drm/amdgpu: Add ring reset support for VCN v5.0.1

Thu Aug 21 15:38:49 UTC 2025

On 8/20/2025 8:33 AM, Jesse.Zhang wrote:
> Implement the ring reset callback for VCN v5.0.1 to properly handle
> hardware recovery when encountering GPU hangs. The new functionality:
> 
> 1. Adds vcn_v5_0_1_ring_reset() function that:
>    - Prepares for reset using amdgpu_ring_reset_helper_begin()
>    - Performs VCN instance reset via amdgpu_dpm_reset_vcn()
>    - Re-initializes hardware through vcn_v5_0_1_hw_init_inst()
>    - Restarts DPG mode with vcn_v5_0_1_start_dpg_mode()
>    - Completes reset with amdgpu_ring_reset_helper_end()
> 
> 2. Hooks the reset function into the unified ring functions via:
>    - Adding .reset = vcn_v5_0_1_ring_reset to vcn_v5_0_1_unified_ring_vm_funcs
> 
> 3. Maintains existing behavior for SR-IOV VF cases by checking RRMT status
> 
> This provides proper hardware recovery capabilities for VCN 5.0.1 IP block
> during fault conditions, matching functionality available in other VCN versions.
> 
> Signed-off-by: Jesse Zhang <Jesse.Zhang at amd.com>
> Signed-off-by: Ruili Ji <ruiliji2 at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c | 29 +++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c
> index 1b5d44fa2b57..779043eac827 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c
> @@ -1284,6 +1284,34 @@ static void vcn_v5_0_1_unified_ring_set_wptr(struct amdgpu_ring *ring)
>  	}
>  }
>  
> +static int vcn_v5_0_1_ring_reset(struct amdgpu_ring *ring,
> +				 unsigned int vmid,
> +				 struct amdgpu_fence *timedout_fence)
> +{
> +	int r = 0;
> +	int vcn_inst;
> +	struct amdgpu_device *adev = ring->adev;
> +	struct amdgpu_vcn_inst *vinst = &adev->vcn.inst[ring->me];
> +
> +	amdgpu_ring_reset_helper_begin(ring, timedout_fence);
> +
> +	vcn_inst = GET_INST(VCN, ring->me);
> +	r = amdgpu_dpm_reset_vcn(adev, 1 << vcn_inst);
> +
> +	if (r) {
> +		DRM_DEV_ERROR(adev->dev, "VCN reset fail : %d\n", r);
> +		return r;
> +	}
> +
> +	/* This flag is not set for VF, assumed to be disabled always */
> +	if (RREG32_SOC15(VCN, GET_INST(VCN, 0), regVCN_RRMT_CNTL) & 0x100)
> +		adev->vcn.caps |= AMDGPU_VCN_CAPS(RRMT_ENABLED);

This is not required. The assumption is settings is common across all
instances, hence only the first instance's setting is taken. So if vcn
instance 2 or 3 is reset, this doesn't matter.

> +	vcn_v5_0_1_hw_init_inst(adev, ring->me);
> +	vcn_v5_0_1_start_dpg_mode(vinst, adev->vcn.inst[ring->me].indirect_sram);

You could use vinst->indirect_sram. That said, it seems there is no need
to pass this as an extra parameter.

Thanks,
Lijo
> +
> +	return amdgpu_ring_reset_helper_end(ring, timedout_fence);
> +}
> +
>  static const struct amdgpu_ring_funcs vcn_v5_0_1_unified_ring_vm_funcs = {
>  	.type = AMDGPU_RING_TYPE_VCN_ENC,
>  	.align_mask = 0x3f,
> @@ -1312,6 +1340,7 @@ static const struct amdgpu_ring_funcs vcn_v5_0_1_unified_ring_vm_funcs = {
>  	.emit_wreg = vcn_v4_0_3_enc_ring_emit_wreg,
>  	.emit_reg_wait = vcn_v4_0_3_enc_ring_emit_reg_wait,
>  	.emit_reg_write_reg_wait = amdgpu_ring_emit_reg_write_reg_wait_helper,
> +	.reset = vcn_v5_0_1_ring_reset,
>  };
>  
>  /**