[PATCH 6/7] drm/amdgpu: Check gmc requirement for reset on init
Xu, Feifei
feifxu at amd.com
Fri Sep 27 03:56:55 UTC 2024
>>I guess you are referring to the below corner case
>> 1) Place NPS request
>> 2) Unload Driver
>> 3) Reinstall driver with a different TOS (possible but quite unlikely)
>> 4) Driver reload
>> 5) Driver checks TOS version first and goes for a reset
>> 6) reset_flag of GMC is not set, hence it doesn't refresh the NPS range.
>>I think changing the order in soc15_need_reset_on_init() to check for
NPS request before TOS version check will solve this.
Yes, I was thinking of reset_flag and tOS reloading
(adev->init_lvl->level set to AMDGPU_INIT_LEVEL_MINIMAL_XGMI) changing
at the same time. And NPS refresh will be ignored. Though might be
likely in debugging or regression isolation cases which changing driver
packaged with different TOS. And yes making NPS refresh checking before
TOS version checking will solve this.
And if we do not return ahead when checking NPS request before tOS
version change in soc15_need_reset_on_init(), we can drop
(adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) check in below
refresh checking:
+ refresh = (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) &&
+ (adev->gmc.reset_flags & AMDGPU_GMC_INIT_RESET_NPS);
refresh = (adev->gmc.reset_flags & AMDGPU_GMC_INIT_RESET_NPS);
Thanks
Feifei
On 9/26/2024 5:01 PM, Xu, Feifei wrote:
>>> + refresh = (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) &&
>>> + (adev->gmc.reset_flags & AMDGPU_GMC_INIT_RESET_NPS);
> Is there a corner case that reloading with a different version tos and refreshing nps change co-exist?
>
> Thanks,
> Feifei
>
> -----Original Message-----
> From: amd-gfx<amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Lijo Lazar
> Sent: Tuesday, September 24, 2024 1:57 PM
> To:amd-gfx at lists.freedesktop.org
> Cc: Zhang, Hawking<Hawking.Zhang at amd.com>; Deucher, Alexander<Alexander.Deucher at amd.com>; Koenig, Christian<Christian.Koenig at amd.com>; Bhardwaj, Rajneesh<Rajneesh.Bhardwaj at amd.com>; Errabolu, Ramesh<Ramesh.Errabolu at amd.com>
> Subject: [PATCH 6/7] drm/amdgpu: Check gmc requirement for reset on init
>
> Add a callback to check if there is any condition detected by GMC block for reset on init. One case is if a pending NPS change request is detected. If reset is done because of NPS switch, refresh NPS info from discovery table.
>
> Signed-off-by: Lijo Lazar<lijo.lazar at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 13 ++++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 5 +++++
> drivers/gpu/drm/amd/amdgpu/soc15.c | 2 ++
> 3 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 21f1e65c9dc9..011fe3a847d0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -1261,12 +1261,15 @@ int amdgpu_gmc_get_nps_memranges(struct amdgpu_device *adev,
> struct amdgpu_gmc_memrange *ranges;
> int range_cnt, ret, i, j;
> uint32_t nps_type;
> + bool refresh;
>
> if (!mem_ranges)
> return -EINVAL;
>
> + refresh = (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) &&
> + (adev->gmc.reset_flags & AMDGPU_GMC_INIT_RESET_NPS);
> ret = amdgpu_discovery_get_nps_info(adev, &nps_type, &ranges,
> - &range_cnt, false);
> + &range_cnt, refresh);
>
> if (ret)
> return ret;
> @@ -1392,3 +1395,11 @@ void amdgpu_gmc_prepare_nps_mode_change(struct amdgpu_device *adev)
> adev->dev,
> "NPS mode change request done, reload driver to complete the change\n"); }
> +
> +bool amdgpu_gmc_need_reset_on_init(struct amdgpu_device *adev) {
> + if (adev->gmc.gmc_funcs->need_reset_on_init)
> + return adev->gmc.gmc_funcs->need_reset_on_init(adev);
> +
> + return false;
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> index b13d6adb5efd..d4cd247fe574 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> @@ -78,6 +78,8 @@ enum amdgpu_memory_partition {
> BIT(AMDGPU_NPS3_PARTITION_MODE) | BIT(AMDGPU_NPS4_PARTITION_MODE) | \
> BIT(AMDGPU_NPS6_PARTITION_MODE) | BIT(AMDGPU_NPS8_PARTITION_MODE))
>
> +#define AMDGPU_GMC_INIT_RESET_NPS BIT(0)
> +
> /*
> * GMC page fault information
> */
> @@ -169,6 +171,7 @@ struct amdgpu_gmc_funcs {
> /* Request NPS mode */
> int (*request_mem_partition_mode)(struct amdgpu_device *adev,
> int nps_mode);
> + bool (*need_reset_on_init)(struct amdgpu_device *adev);
> };
>
> struct amdgpu_xgmi_ras {
> @@ -314,6 +317,7 @@ struct amdgpu_gmc {
> const struct amdgpu_gmc_funcs *gmc_funcs;
> enum amdgpu_memory_partition requested_nps_mode;
> uint32_t supported_nps_modes;
> + uint32_t reset_flags;
>
> struct amdgpu_xgmi xgmi;
> struct amdgpu_irq_src ecc_irq;
> @@ -468,5 +472,6 @@ int amdgpu_gmc_get_nps_memranges(struct amdgpu_device *adev, int amdgpu_gmc_request_memory_partition(struct amdgpu_device *adev,
> int nps_mode);
> void amdgpu_gmc_prepare_nps_mode_change(struct amdgpu_device *adev);
> +bool amdgpu_gmc_need_reset_on_init(struct amdgpu_device *adev);
>
> #endif
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 619933f252aa..97ca4931a7ef 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -833,6 +833,8 @@ static bool soc15_need_reset_on_init(struct amdgpu_device *adev)
>
> if (amdgpu_psp_tos_reload_needed(adev))
> return true;
> + if (amdgpu_gmc_need_reset_on_init(adev))
> + return true;
> /* Just return false for soc15 GPUs. Reset does not seem to
> * be necessary.
> */
> --
> 2.25.1
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20240927/9365f584/attachment-0001.htm>
More information about the amd-gfx
mailing list