[PATCH 02/11] drm/amdgpu: send IVs to the KFD only after processing them v2
Koenig, Christian
Christian.Koenig at amd.com
Mon Dec 3 16:38:43 UTC 2018
Ah! Never mind, now I see it what you mean!
I accidentally added my debug change DRM_DEBUG->DRM_ERROR to this patch
as well.
Sorry for the noise,
Christian.
Am 03.12.18 um 17:35 schrieb Christian König:
>> No. As far as I can tell, you're missing these two:
>>
>> GFX_9_0__SRCID__CP_BAD_OPCODE_ERROR (183)
>> GFX_9_0__SRCID__SQ_INTERRUPT_ID (239)
>>
>> 239 is used for signaling events from shaders and can be very frequent.
>> Triggering an error message on those interrupts would be bad.
>
> Mhm, then why didn't those trigger a message before?
>
> As far as I can see we actually didn't changed the handling for that.
>
> Christian.
>
> Am 03.12.18 um 17:31 schrieb Kuehling, Felix:
>> On 2018-12-01 9:11 a.m., Christian König wrote:
>>>> Won't this break VM fault handling in KFD?
>>> No, we still send all VM faults to KFD after processing them. Only
>>> filtered retries are not send to the KFD any more.
>> OK, I missed that src->funcs->process returning 0 means "not handled",
>>> 0 means "handled". Currently I don't see any interrupt processing
>> callbacks returning >0. I think that gets added in patch 4.
>>
>>
>>>> As far as I can tell, the only code path that leave IRQs unhandled
>>>> and passes them to KFD prints an error message in the kernel log. We
>>>> can't have the kernel log flooded with error messages every time
>>>> there are IRQs for KFD. We can get extremely high frequency
>>>> interrupts for HSA signals.
>>> Since the KFD didn't filtered the faults this would have a been a
>>> problem before as well.
>> I missed that r == 0 means not handled without being an error.
>>
>>
>>> So I'm pretty sure that we already have registered handlers for all
>>> interrupts the KFD is interested in as well.
>> No. As far as I can tell, you're missing these two:
>>
>> GFX_9_0__SRCID__CP_BAD_OPCODE_ERROR (183)
>> GFX_9_0__SRCID__SQ_INTERRUPT_ID (239)
>>
>> 239 is used for signaling events from shaders and can be very frequent.
>> Triggering an error message on those interrupts would be bad.
>>
>> Regards,
>> Felix
>>
>>
>>> Regards,
>>> Christian.
>>>
>>> Am 30.11.18 um 17:31 schrieb Kuehling, Felix:
>>>> Won't this break VM fault handling in KFD? I don't see a way with the
>>>> current code that you can leave some VM faults for KFD to process. If
>>>> we could consider VM faults with VMIDs 8-15 as not handled in amdgpu
>>>> and leave them for KFD to process, then this could work.
>>>>
>>>> As far as I can tell, the only code path that leave IRQs unhandled
>>>> and passes them to KFD prints an error message in the kernel log. We
>>>> can't have the kernel log flooded with error messages every time
>>>> there are IRQs for KFD. We can get extremely high frequency
>>>> interrupts for HSA signals.
>>>>
>>>> Regards,
>>>> Felix
>>>>
>>>> -----Original Message-----
>>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
>>>> Alex Deucher
>>>> Sent: Friday, November 30, 2018 10:03 AM
>>>> To: Christian König <ckoenig.leichtzumerken at gmail.com>
>>>> Cc: amd-gfx list <amd-gfx at lists.freedesktop.org>
>>>> Subject: Re: [PATCH 02/11] drm/amdgpu: send IVs to the KFD only after
>>>> processing them v2
>>>>
>>>> On Fri, Nov 30, 2018 at 7:36 AM Christian König
>>>> <ckoenig.leichtzumerken at gmail.com> wrote:
>>>>> This allows us to filter out VM faults in the GMC code.
>>>>>
>>>>> v2: don't filter out all faults
>>>>>
>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>> Acked-by: Alex Deucher <alexander.deucher at amd.com>
>>>>
>>>>> ---
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 29
>>>>> +++++++++++++++----------
>>>>> 1 file changed, 17 insertions(+), 12 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>>> index 6b6524f04ce0..6db4c58ddc13 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>>> @@ -149,9 +149,6 @@ static void amdgpu_irq_callback(struct
>>>>> amdgpu_device *adev,
>>>>> if (!amdgpu_ih_prescreen_iv(adev))
>>>>> return;
>>>>>
>>>>> - /* Before dispatching irq to IP blocks, send it to amdkfd */
>>>>> - amdgpu_amdkfd_interrupt(adev, (const void *)
>>>>> &ih->ring[ring_index]);
>>>>> -
>>>>> entry.iv_entry = (const uint32_t *)&ih->ring[ring_index];
>>>>> amdgpu_ih_decode_iv(adev, &entry);
>>>>>
>>>>> @@ -371,29 +368,31 @@ void amdgpu_irq_dispatch(struct amdgpu_device
>>>>> *adev,
>>>>> unsigned client_id = entry->client_id;
>>>>> unsigned src_id = entry->src_id;
>>>>> struct amdgpu_irq_src *src;
>>>>> + bool handled = false;
>>>>> int r;
>>>>>
>>>>> trace_amdgpu_iv(entry);
>>>>>
>>>>> if (client_id >= AMDGPU_IRQ_CLIENTID_MAX) {
>>>>> - DRM_DEBUG("Invalid client_id in IV: %d\n",
>>>>> client_id);
>>>>> + DRM_ERROR("Invalid client_id in IV: %d\n",
>>>>> client_id);
>>>>> return;
>>>>> }
>>>>>
>>>>> if (src_id >= AMDGPU_MAX_IRQ_SRC_ID) {
>>>>> - DRM_DEBUG("Invalid src_id in IV: %d\n", src_id);
>>>>> + DRM_ERROR("Invalid src_id in IV: %d\n", src_id);
>>>>> return;
>>>>> }
>>>>>
>>>>> if (adev->irq.virq[src_id]) {
>>>>> generic_handle_irq(irq_find_mapping(adev->irq.domain, src_id));
>>>>> - } else {
>>>>> - if (!adev->irq.client[client_id].sources) {
>>>>> - DRM_DEBUG("Unregistered interrupt client_id:
>>>>> %d src_id: %d\n",
>>>>> - client_id, src_id);
>>>>> - return;
>>>>> - }
>>>>> + return;
>>>>> + }
>>>>>
>>>>> + if (!adev->irq.client[client_id].sources) {
>>>>> + DRM_DEBUG("Unregistered interrupt client_id: %d
>>>>> src_id: %d\n",
>>>>> + client_id, src_id);
>>>>> + return;
>>>>> + } else {
>>>>> src = adev->irq.client[client_id].sources[src_id];
>>>>> if (!src) {
>>>>> DRM_DEBUG("Unhandled interrupt src_id:
>>>>> %d\n",
>>>>> src_id); @@ -401,9 +400,15 @@ void amdgpu_irq_dispatch(struct
>>>>> amdgpu_device *adev,
>>>>> }
>>>>>
>>>>> r = src->funcs->process(adev, src, entry);
>>>>> - if (r)
>>>>> + if (r < 0)
>>>>> DRM_ERROR("error processing interrupt
>>>>> (%d)\n",
>>>>> r);
>>>>> + else if (r)
>>>>> + handled = true;
>>>>> }
>>>>> +
>>>>> + /* Send it to amdkfd as well if it isn't already handled */
>>>>> + if (!handled)
>>>>> + amdgpu_amdkfd_interrupt(adev, entry->iv_entry);
>>>>> }
>>>>>
>>>>> /**
>>>>> --
>>>>> 2.17.1
>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx at lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
More information about the amd-gfx
mailing list