[PATCH 02/11] drm/amdgpu: send IVs to the KFD only after processing them v2

Christian König ckoenig.leichtzumerken at gmail.com
Mon Dec 3 16:35:55 UTC 2018


> No. As far as I can tell, you're missing these two:
>
> GFX_9_0__SRCID__CP_BAD_OPCODE_ERROR (183)
> GFX_9_0__SRCID__SQ_INTERRUPT_ID (239)
>
> 239 is used for signaling events from shaders and can be very frequent.
> Triggering an error message on those interrupts would be bad.

Mhm, then why didn't those trigger a message before?

As far as I can see we actually didn't changed the handling for that.

Christian.

Am 03.12.18 um 17:31 schrieb Kuehling, Felix:
> On 2018-12-01 9:11 a.m., Christian König wrote:
>>> Won't this break VM fault handling in KFD?
>> No, we still send all VM faults to KFD after processing them. Only
>> filtered retries are not send to the KFD any more.
> OK, I missed that src->funcs->process returning 0 means "not handled",
>> 0 means "handled". Currently I don't see any interrupt processing
> callbacks returning >0. I think that gets added in patch 4.
>
>
>>> As far as I can tell, the only code path that leave IRQs unhandled
>>> and passes them to KFD prints an error message in the kernel log. We
>>> can't have the kernel log flooded with error messages every time
>>> there are IRQs for KFD. We can get extremely high frequency
>>> interrupts for HSA signals.
>> Since the KFD didn't filtered the faults this would have a been a
>> problem before as well.
> I missed that r == 0 means not handled without being an error.
>
>
>> So I'm pretty sure that we already have registered handlers for all
>> interrupts the KFD is interested in as well.
> No. As far as I can tell, you're missing these two:
>
> GFX_9_0__SRCID__CP_BAD_OPCODE_ERROR (183)
> GFX_9_0__SRCID__SQ_INTERRUPT_ID (239)
>
> 239 is used for signaling events from shaders and can be very frequent.
> Triggering an error message on those interrupts would be bad.
>
> Regards,
>    Felix
>
>
>> Regards,
>> Christian.
>>
>> Am 30.11.18 um 17:31 schrieb Kuehling, Felix:
>>> Won't this break VM fault handling in KFD? I don't see a way with the
>>> current code that you can leave some VM faults for KFD to process. If
>>> we could consider VM faults with VMIDs 8-15 as not handled in amdgpu
>>> and leave them for KFD to process, then this could work.
>>>
>>> As far as I can tell, the only code path that leave IRQs unhandled
>>> and passes them to KFD prints an error message in the kernel log. We
>>> can't have the kernel log flooded with error messages every time
>>> there are IRQs for KFD. We can get extremely high frequency
>>> interrupts for HSA signals.
>>>
>>> Regards,
>>>     Felix
>>>
>>> -----Original Message-----
>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
>>> Alex Deucher
>>> Sent: Friday, November 30, 2018 10:03 AM
>>> To: Christian König <ckoenig.leichtzumerken at gmail.com>
>>> Cc: amd-gfx list <amd-gfx at lists.freedesktop.org>
>>> Subject: Re: [PATCH 02/11] drm/amdgpu: send IVs to the KFD only after
>>> processing them v2
>>>
>>> On Fri, Nov 30, 2018 at 7:36 AM Christian König
>>> <ckoenig.leichtzumerken at gmail.com> wrote:
>>>> This allows us to filter out VM faults in the GMC code.
>>>>
>>>> v2: don't filter out all faults
>>>>
>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>> Acked-by: Alex Deucher <alexander.deucher at amd.com>
>>>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 29
>>>> +++++++++++++++----------
>>>>    1 file changed, 17 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>> index 6b6524f04ce0..6db4c58ddc13 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>> @@ -149,9 +149,6 @@ static void amdgpu_irq_callback(struct
>>>> amdgpu_device *adev,
>>>>           if (!amdgpu_ih_prescreen_iv(adev))
>>>>                   return;
>>>>
>>>> -       /* Before dispatching irq to IP blocks, send it to amdkfd */
>>>> -       amdgpu_amdkfd_interrupt(adev, (const void *)
>>>> &ih->ring[ring_index]);
>>>> -
>>>>           entry.iv_entry = (const uint32_t *)&ih->ring[ring_index];
>>>>           amdgpu_ih_decode_iv(adev, &entry);
>>>>
>>>> @@ -371,29 +368,31 @@ void amdgpu_irq_dispatch(struct amdgpu_device
>>>> *adev,
>>>>           unsigned client_id = entry->client_id;
>>>>           unsigned src_id = entry->src_id;
>>>>           struct amdgpu_irq_src *src;
>>>> +       bool handled = false;
>>>>           int r;
>>>>
>>>>           trace_amdgpu_iv(entry);
>>>>
>>>>           if (client_id >= AMDGPU_IRQ_CLIENTID_MAX) {
>>>> -               DRM_DEBUG("Invalid client_id in IV: %d\n", client_id);
>>>> +               DRM_ERROR("Invalid client_id in IV: %d\n", client_id);
>>>>                   return;
>>>>           }
>>>>
>>>>           if (src_id >= AMDGPU_MAX_IRQ_SRC_ID) {
>>>> -               DRM_DEBUG("Invalid src_id in IV: %d\n", src_id);
>>>> +               DRM_ERROR("Invalid src_id in IV: %d\n", src_id);
>>>>                   return;
>>>>           }
>>>>
>>>>           if (adev->irq.virq[src_id]) {
>>>>                  
>>>> generic_handle_irq(irq_find_mapping(adev->irq.domain, src_id));
>>>> -       } else {
>>>> -               if (!adev->irq.client[client_id].sources) {
>>>> -                       DRM_DEBUG("Unregistered interrupt client_id:
>>>> %d src_id: %d\n",
>>>> -                                 client_id, src_id);
>>>> -                       return;
>>>> -               }
>>>> +               return;
>>>> +       }
>>>>
>>>> +       if (!adev->irq.client[client_id].sources) {
>>>> +               DRM_DEBUG("Unregistered interrupt client_id: %d
>>>> src_id: %d\n",
>>>> +                         client_id, src_id);
>>>> +               return;
>>>> +       } else {
>>>>                   src = adev->irq.client[client_id].sources[src_id];
>>>>                   if (!src) {
>>>>                           DRM_DEBUG("Unhandled interrupt src_id: %d\n",
>>>> src_id); @@ -401,9 +400,15 @@ void amdgpu_irq_dispatch(struct
>>>> amdgpu_device *adev,
>>>>                   }
>>>>
>>>>                   r = src->funcs->process(adev, src, entry);
>>>> -               if (r)
>>>> +               if (r < 0)
>>>>                           DRM_ERROR("error processing interrupt (%d)\n",
>>>> r);
>>>> +               else if (r)
>>>> +                       handled = true;
>>>>           }
>>>> +
>>>> +       /* Send it to amdkfd as well if it isn't already handled */
>>>> +       if (!handled)
>>>> +               amdgpu_amdkfd_interrupt(adev, entry->iv_entry);
>>>>    }
>>>>
>>>>    /**
>>>> -- 
>>>> 2.17.1
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list