[PATCH 6/7] drm/amdgpu: Check gmc requirement for reset on init

Xu, Feifei feifxu at amd.com
Fri Sep 27 03:56:55 UTC 2024


 >>I guess you are referring to the below corner case

>>  1) Place NPS request

>>  2) Unload Driver

>>  3) Reinstall driver with a different TOS (possible but quite unlikely)

>> 4) Driver reload

>> 5) Driver checks TOS version first and goes for a reset

>> 6) reset_flag of GMC is not set, hence it doesn't refresh the NPS range.

 >>I think changing the order in soc15_need_reset_on_init() to check for 
NPS request before TOS version check will solve this.

Yes, I was thinking of  reset_flag and tOS reloading 
(adev->init_lvl->level set to AMDGPU_INIT_LEVEL_MINIMAL_XGMI) changing 
at the same time. And NPS refresh will be ignored. Though might be 
likely in debugging or regression isolation cases which changing driver 
packaged with different TOS.  And yes making NPS refresh checking before 
TOS version checking will solve this.

And if we do not return ahead when checking NPS request before tOS 
version change in soc15_need_reset_on_init(), we can drop 
(adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) check in below 
refresh checking:

+     refresh = (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) &&
+               (adev->gmc.reset_flags & AMDGPU_GMC_INIT_RESET_NPS);

refresh = (adev->gmc.reset_flags & AMDGPU_GMC_INIT_RESET_NPS);

Thanks
Feifei

On 9/26/2024 5:01 PM, Xu, Feifei wrote:
>>> +     refresh = (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) &&
>>> +               (adev->gmc.reset_flags & AMDGPU_GMC_INIT_RESET_NPS);
> Is there a corner case that reloading with a different version tos and refreshing nps change co-exist?
>
> Thanks,
> Feifei
>
> -----Original Message-----
> From: amd-gfx<amd-gfx-bounces at lists.freedesktop.org>  On Behalf Of Lijo Lazar
> Sent: Tuesday, September 24, 2024 1:57 PM
> To:amd-gfx at lists.freedesktop.org
> Cc: Zhang, Hawking<Hawking.Zhang at amd.com>; Deucher, Alexander<Alexander.Deucher at amd.com>; Koenig, Christian<Christian.Koenig at amd.com>; Bhardwaj, Rajneesh<Rajneesh.Bhardwaj at amd.com>; Errabolu, Ramesh<Ramesh.Errabolu at amd.com>
> Subject: [PATCH 6/7] drm/amdgpu: Check gmc requirement for reset on init
>
> Add a callback to check if there is any condition detected by GMC block for reset on init. One case is if a pending NPS change request is detected. If reset is done because of NPS switch, refresh NPS info from discovery table.
>
> Signed-off-by: Lijo Lazar<lijo.lazar at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 13 ++++++++++++-  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h |  5 +++++
>   drivers/gpu/drm/amd/amdgpu/soc15.c      |  2 ++
>   3 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 21f1e65c9dc9..011fe3a847d0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -1261,12 +1261,15 @@ int amdgpu_gmc_get_nps_memranges(struct amdgpu_device *adev,
>          struct amdgpu_gmc_memrange *ranges;
>          int range_cnt, ret, i, j;
>          uint32_t nps_type;
> +       bool refresh;
>
>          if (!mem_ranges)
>                  return -EINVAL;
>
> +       refresh = (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) &&
> +                 (adev->gmc.reset_flags & AMDGPU_GMC_INIT_RESET_NPS);
>          ret = amdgpu_discovery_get_nps_info(adev, &nps_type, &ranges,
> -                                           &range_cnt, false);
> +                                           &range_cnt, refresh);
>
>          if (ret)
>                  return ret;
> @@ -1392,3 +1395,11 @@ void amdgpu_gmc_prepare_nps_mode_change(struct amdgpu_device *adev)
>                          adev->dev,
>                          "NPS mode change request done, reload driver to complete the change\n");  }
> +
> +bool amdgpu_gmc_need_reset_on_init(struct amdgpu_device *adev) {
> +       if (adev->gmc.gmc_funcs->need_reset_on_init)
> +               return adev->gmc.gmc_funcs->need_reset_on_init(adev);
> +
> +       return false;
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> index b13d6adb5efd..d4cd247fe574 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> @@ -78,6 +78,8 @@ enum amdgpu_memory_partition {
>           BIT(AMDGPU_NPS3_PARTITION_MODE) | BIT(AMDGPU_NPS4_PARTITION_MODE) | \
>           BIT(AMDGPU_NPS6_PARTITION_MODE) | BIT(AMDGPU_NPS8_PARTITION_MODE))
>
> +#define AMDGPU_GMC_INIT_RESET_NPS  BIT(0)
> +
>   /*
>    * GMC page fault information
>    */
> @@ -169,6 +171,7 @@ struct amdgpu_gmc_funcs {
>          /* Request NPS mode */
>          int (*request_mem_partition_mode)(struct amdgpu_device *adev,
>                                            int nps_mode);
> +       bool (*need_reset_on_init)(struct amdgpu_device *adev);
>   };
>
>   struct amdgpu_xgmi_ras {
> @@ -314,6 +317,7 @@ struct amdgpu_gmc {
>          const struct amdgpu_gmc_funcs   *gmc_funcs;
>          enum amdgpu_memory_partition    requested_nps_mode;
>          uint32_t supported_nps_modes;
> +       uint32_t reset_flags;
>
>          struct amdgpu_xgmi xgmi;
>          struct amdgpu_irq_src   ecc_irq;
> @@ -468,5 +472,6 @@ int amdgpu_gmc_get_nps_memranges(struct amdgpu_device *adev,  int amdgpu_gmc_request_memory_partition(struct amdgpu_device *adev,
>                                          int nps_mode);
>   void amdgpu_gmc_prepare_nps_mode_change(struct amdgpu_device *adev);
> +bool amdgpu_gmc_need_reset_on_init(struct amdgpu_device *adev);
>
>   #endif
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 619933f252aa..97ca4931a7ef 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -833,6 +833,8 @@ static bool soc15_need_reset_on_init(struct amdgpu_device *adev)
>
>          if (amdgpu_psp_tos_reload_needed(adev))
>                  return true;
> +       if (amdgpu_gmc_need_reset_on_init(adev))
> +               return true;
>          /* Just return false for soc15 GPUs.  Reset does not seem to
>           * be necessary.
>           */
> --
> 2.25.1
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20240927/9365f584/attachment-0001.htm>


More information about the amd-gfx mailing list