[PATCH] drm/amdgpu: Update irq disable flow during unload

Christian König christian.koenig at amd.com
Mon Jan 8 08:21:37 UTC 2024


Am 08.01.24 um 09:13 schrieb Kamal, Asad:
> [AMD Official Use Only - General]
>
> Hi Christian,
>
> Thank you for the comment.
>
> This is not normal reset, it is reset done during unload for smu v_13_0_2.

Yeah, but this doesn't explain the rational for this.

IRQ enable/disable should be balanced in hw_init()/hw_fini(), 
independent of what else you do.

I'm not sure what you are trying to solve but this here is a complete no-go.

Regards,
Christian.

>
> Thanks & Regards
> Asad
>
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig at amd.com>
> Sent: Monday, January 8, 2024 1:33 PM
> To: Kamal, Asad <Asad.Kamal at amd.com>
> Subject: Re: [PATCH] drm/amdgpu: Update irq disable flow during unload
>
> Am 05.01.24 um 16:21 schrieb Asad Kamal:
>> In certain special cases, e.g device reset before module unload, irq
>> gets disabled as part of reset sequence and won't get enabled back.
>> Add special check to cover such scenarios
> Well complete NAK to that. Resets shouldn't affect the IRQ state at all!
>
> If this is an issue then something else is broken.
>
> Regards,
> Christian.
>
>> Signed-off-by: Asad Kamal <asad.kamal at amd.com>
>> Suggested-by: Lijo Lazar <lijo.lazar at amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 12 ++++++++++--
>>    drivers/gpu/drm/amd/amdgpu/soc15.c    | 13 +++++++++++--
>>    2 files changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> index 372de9f1ce59..a4e1b9a58679 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> @@ -2361,6 +2361,7 @@ static void gmc_v9_0_gart_disable(struct amdgpu_device *adev)
>>    static int gmc_v9_0_hw_fini(void *handle)
>>    {
>>        struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>> +     bool irq_release = true;
>>
>>        gmc_v9_0_gart_disable(adev);
>>
>> @@ -2378,9 +2379,16 @@ static int gmc_v9_0_hw_fini(void *handle)
>>        if (adev->mmhub.funcs->update_power_gating)
>>                adev->mmhub.funcs->update_power_gating(adev, false);
>>
>> -     amdgpu_irq_put(adev, &adev->gmc.vm_fault, 0);
>> +     if (adev->shutdown)
>> +             irq_release = amdgpu_irq_enabled(adev, &adev->gmc.vm_fault, 0);
>>
>> -     if (adev->gmc.ecc_irq.funcs &&
>> +     if (irq_release)
>> +             amdgpu_irq_put(adev, &adev->gmc.vm_fault, 0);
>> +
>> +     if (adev->shutdown)
>> +             irq_release = amdgpu_irq_enabled(adev, &adev->gmc.ecc_irq, 0);
>> +
>> +     if (adev->gmc.ecc_irq.funcs && irq_release &&
>>                amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__UMC))
>>                amdgpu_irq_put(adev, &adev->gmc.ecc_irq, 0);
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
>> b/drivers/gpu/drm/amd/amdgpu/soc15.c
>> index 15033efec2ba..7ee835049d57 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>> @@ -1266,6 +1266,7 @@ static int soc15_common_hw_init(void *handle)
>>    static int soc15_common_hw_fini(void *handle)
>>    {
>>        struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>> +     bool irq_release = true;
>>
>>        /* Disable the doorbell aperture and selfring doorbell aperture
>>         * separately in hw_fini because soc15_enable_doorbell_aperture @@
>> -1280,10 +1281,18 @@ static int soc15_common_hw_fini(void *handle)
>>
>>        if (adev->nbio.ras_if &&
>>            amdgpu_ras_is_supported(adev, adev->nbio.ras_if->block)) {
>> -             if (adev->nbio.ras &&
>> +             if (adev->shutdown)
>> +                     irq_release = amdgpu_irq_enabled(adev,
>> +&adev->nbio.ras_controller_irq, 0);
>> +
>> +             if (adev->nbio.ras && irq_release &&
>>                    adev->nbio.ras->init_ras_controller_interrupt)
>>                        amdgpu_irq_put(adev, &adev->nbio.ras_controller_irq, 0);
>> -             if (adev->nbio.ras &&
>> +
>> +             if (adev->shutdown)
>> +                     irq_release = amdgpu_irq_enabled(adev,
>> +                                     &adev->nbio.ras_err_event_athub_irq, 0);
>> +
>> +             if (adev->nbio.ras && irq_release &&
>>                    adev->nbio.ras->init_ras_err_event_athub_interrupt)
>>                        amdgpu_irq_put(adev, &adev->nbio.ras_err_event_athub_irq, 0);
>>        }



More information about the amd-gfx mailing list