[PATCH 02/11] drm/amdgpu: send IVs to the KFD only after processing them v2

Mon Dec 3 16:38:43 UTC 2018

Ah! Never mind, now I see it what you mean!

I accidentally added my debug change DRM_DEBUG->DRM_ERROR to this patch 
as well.

Sorry for the noise,
Christian.

Am 03.12.18 um 17:35 schrieb Christian König:
>> No. As far as I can tell, you're missing these two:
>>
>> GFX_9_0__SRCID__CP_BAD_OPCODE_ERROR (183)
>> GFX_9_0__SRCID__SQ_INTERRUPT_ID (239)
>>
>> 239 is used for signaling events from shaders and can be very frequent.
>> Triggering an error message on those interrupts would be bad.
>
> Mhm, then why didn't those trigger a message before?
>
> As far as I can see we actually didn't changed the handling for that.
>
> Christian.
>
> Am 03.12.18 um 17:31 schrieb Kuehling, Felix:
>> On 2018-12-01 9:11 a.m., Christian König wrote:
>>>> Won't this break VM fault handling in KFD?
>>> No, we still send all VM faults to KFD after processing them. Only
>>> filtered retries are not send to the KFD any more.
>> OK, I missed that src->funcs->process returning 0 means "not handled",
>>> 0 means "handled". Currently I don't see any interrupt processing
>> callbacks returning >0. I think that gets added in patch 4.
>>
>>
>>>> As far as I can tell, the only code path that leave IRQs unhandled
>>>> and passes them to KFD prints an error message in the kernel log. We
>>>> can't have the kernel log flooded with error messages every time
>>>> there are IRQs for KFD. We can get extremely high frequency
>>>> interrupts for HSA signals.
>>> Since the KFD didn't filtered the faults this would have a been a
>>> problem before as well.
>> I missed that r == 0 means not handled without being an error.
>>
>>
>>> So I'm pretty sure that we already have registered handlers for all
>>> interrupts the KFD is interested in as well.
>> No. As far as I can tell, you're missing these two:
>>
>> GFX_9_0__SRCID__CP_BAD_OPCODE_ERROR (183)
>> GFX_9_0__SRCID__SQ_INTERRUPT_ID (239)
>>
>> 239 is used for signaling events from shaders and can be very frequent.
>> Triggering an error message on those interrupts would be bad.
>>
>> Regards,
>>    Felix
>>
>>
>>> Regards,
>>> Christian.
>>>
>>> Am 30.11.18 um 17:31 schrieb Kuehling, Felix:
>>>> Won't this break VM fault handling in KFD? I don't see a way with the
>>>> current code that you can leave some VM faults for KFD to process. If
>>>> we could consider VM faults with VMIDs 8-15 as not handled in amdgpu
>>>> and leave them for KFD to process, then this could work.
>>>>
>>>> As far as I can tell, the only code path that leave IRQs unhandled
>>>> and passes them to KFD prints an error message in the kernel log. We
>>>> can't have the kernel log flooded with error messages every time
>>>> there are IRQs for KFD. We can get extremely high frequency
>>>> interrupts for HSA signals.
>>>>
>>>> Regards,
>>>>     Felix
>>>>
>>>> -----Original Message-----
>>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
>>>> Alex Deucher
>>>> Sent: Friday, November 30, 2018 10:03 AM
>>>> To: Christian König <ckoenig.leichtzumerken at gmail.com>
>>>> Cc: amd-gfx list <amd-gfx at lists.freedesktop.org>
>>>> Subject: Re: [PATCH 02/11] drm/amdgpu: send IVs to the KFD only after
>>>> processing them v2
>>>>
>>>> On Fri, Nov 30, 2018 at 7:36 AM Christian König
>>>> <ckoenig.leichtzumerken at gmail.com> wrote:
>>>>> This allows us to filter out VM faults in the GMC code.
>>>>>
>>>>> v2: don't filter out all faults
>>>>>
>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>> Acked-by: Alex Deucher <alexander.deucher at amd.com>
>>>>
>>>>> ---
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 29
>>>>> +++++++++++++++----------
>>>>>    1 file changed, 17 insertions(+), 12 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>>> index 6b6524f04ce0..6db4c58ddc13 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>>> @@ -149,9 +149,6 @@ static void amdgpu_irq_callback(struct
>>>>> amdgpu_device *adev,
>>>>>           if (!amdgpu_ih_prescreen_iv(adev))
>>>>>                   return;
>>>>>
>>>>> -       /* Before dispatching irq to IP blocks, send it to amdkfd */
>>>>> -       amdgpu_amdkfd_interrupt(adev, (const void *)
>>>>> &ih->ring[ring_index]);
>>>>> -
>>>>>           entry.iv_entry = (const uint32_t *)&ih->ring[ring_index];
>>>>>           amdgpu_ih_decode_iv(adev, &entry);
>>>>>
>>>>> @@ -371,29 +368,31 @@ void amdgpu_irq_dispatch(struct amdgpu_device
>>>>> *adev,
>>>>>           unsigned client_id = entry->client_id;
>>>>>           unsigned src_id = entry->src_id;
>>>>>           struct amdgpu_irq_src *src;
>>>>> +       bool handled = false;
>>>>>           int r;
>>>>>
>>>>>           trace_amdgpu_iv(entry);
>>>>>
>>>>>           if (client_id >= AMDGPU_IRQ_CLIENTID_MAX) {
>>>>> -               DRM_DEBUG("Invalid client_id in IV: %d\n", 
>>>>> client_id);
>>>>> +               DRM_ERROR("Invalid client_id in IV: %d\n", 
>>>>> client_id);
>>>>>                   return;
>>>>>           }
>>>>>
>>>>>           if (src_id >= AMDGPU_MAX_IRQ_SRC_ID) {
>>>>> -               DRM_DEBUG("Invalid src_id in IV: %d\n", src_id);
>>>>> +               DRM_ERROR("Invalid src_id in IV: %d\n", src_id);
>>>>>                   return;
>>>>>           }
>>>>>
>>>>>           if (adev->irq.virq[src_id]) {
>>>>> generic_handle_irq(irq_find_mapping(adev->irq.domain, src_id));
>>>>> -       } else {
>>>>> -               if (!adev->irq.client[client_id].sources) {
>>>>> -                       DRM_DEBUG("Unregistered interrupt client_id:
>>>>> %d src_id: %d\n",
>>>>> -                                 client_id, src_id);
>>>>> -                       return;
>>>>> -               }
>>>>> +               return;
>>>>> +       }
>>>>>
>>>>> +       if (!adev->irq.client[client_id].sources) {
>>>>> +               DRM_DEBUG("Unregistered interrupt client_id: %d
>>>>> src_id: %d\n",
>>>>> +                         client_id, src_id);
>>>>> +               return;
>>>>> +       } else {
>>>>>                   src = adev->irq.client[client_id].sources[src_id];
>>>>>                   if (!src) {
>>>>>                           DRM_DEBUG("Unhandled interrupt src_id: 
>>>>> %d\n",
>>>>> src_id); @@ -401,9 +400,15 @@ void amdgpu_irq_dispatch(struct
>>>>> amdgpu_device *adev,
>>>>>                   }
>>>>>
>>>>>                   r = src->funcs->process(adev, src, entry);
>>>>> -               if (r)
>>>>> +               if (r < 0)
>>>>>                           DRM_ERROR("error processing interrupt 
>>>>> (%d)\n",
>>>>> r);
>>>>> +               else if (r)
>>>>> +                       handled = true;
>>>>>           }
>>>>> +
>>>>> +       /* Send it to amdkfd as well if it isn't already handled */
>>>>> +       if (!handled)
>>>>> +               amdgpu_amdkfd_interrupt(adev, entry->iv_entry);
>>>>>    }
>>>>>
>>>>>    /**
>>>>> -- 
>>>>> 2.17.1
>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx at lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>