[PATCH] drm/amdgpu: Enable SA software trap.
Felix Kuehling
felix.kuehling at amd.com
Thu Sep 22 19:16:29 UTC 2022
Am 2022-09-22 um 13:57 schrieb Belanger, David:
> [AMD Official Use Only - General]
>
>
>
>> -----Original Message-----
>> From: Kuehling, Felix <Felix.Kuehling at amd.com>
>> Sent: Thursday, September 22, 2022 1:14 PM
>> To: Belanger, David <David.Belanger at amd.com>; amd-
>> gfx at lists.freedesktop.org
>> Cc: Cornwall, Jay <Jay.Cornwall at amd.com>
>> Subject: Re: [PATCH] drm/amdgpu: Enable SA software trap.
>>
>> Am 2022-09-22 um 12:17 schrieb David Belanger:
>>> Enables support for software trap for MES >= 4.
>>> Adapted from implementation from Jay Cornwall.
>>>
>>> v2: Add IP version check in conditions.
>>>
>>> Signed-off-by: Jay Cornwall <Jay.Cornwall at amd.com>
>>> Signed-off-by: David Belanger <david.belanger at amd.com>
>>> Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 6 +-
>>> .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h | 771 +++++++++---------
>>> .../amd/amdkfd/cwsr_trap_handler_gfx10.asm | 21 +
>>> .../gpu/drm/amd/amdkfd/kfd_int_process_v11.c | 26 +-
>>> 4 files changed, 437 insertions(+), 387 deletions(-)
>> [snip]
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c
>>> index a6fcbeeb7428..4e03d19e9333 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c
>>> @@ -358,13 +358,35 @@ static void event_interrupt_wq_v11(struct
>> kfd_dev *dev,
>>> break;
>>> case SQ_INTERRUPT_WORD_ENCODING_ERROR:
>>> print_sq_intr_info_error(context_id0,
>> context_id1);
>>> + sq_int_priv = REG_GET_FIELD(context_id0,
>>> +
>> SQ_INTERRUPT_WORD_WAVE_CTXID0, PRIV);
>>> sq_int_errtype =
>> REG_GET_FIELD(context_id0,
>> SQ_INTERRUPT_WORD_ERROR_CTXID0, TYPE);
>>> - if (sq_int_errtype !=
>> SQ_INTERRUPT_ERROR_TYPE_ILLEGAL_INST &&
>>> - sq_int_errtype !=
>> SQ_INTERRUPT_ERROR_TYPE_MEMVIOL) {
>>> +
>>> + switch (sq_int_errtype) {
>>> + case SQ_INTERRUPT_ERROR_TYPE_EDC_FUE:
>>> + case SQ_INTERRUPT_ERROR_TYPE_EDC_FED:
>>>
>> event_interrupt_poison_consumption_v11(
>>> dev, pasid,
>> source_id);
>>> return;
>>> + case
>> SQ_INTERRUPT_ERROR_TYPE_ILLEGAL_INST:
>>> + /*if (!(((adev->mes.sched_version &
>> AMDGPU_MES_VERSION_MASK) >= 4) &&
>>> + (adev-
>>> ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) &&
>>> + (adev-
>>> ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3)))
>>> + && sq_int_priv)
>>> +
>> kfd_set_dbg_ev_from_interrupt(dev, pasid, -1,
>>> +
>> KFD_EC_MASK(EC_QUEUE_WAVE_ILLEGAL_INSTRUCTION),
>>> + NULL, 0);*/
>>> + return;
>>> + case
>> SQ_INTERRUPT_ERROR_TYPE_MEMVIOL:
>>> + /*if (!(((adev->mes.sched_version &
>> AMDGPU_MES_VERSION_MASK) >= 4) &&
>>> + (adev-
>>> ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) &&
>>> + (adev-
>>> ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3)))
>>> + && sq_int_priv)
>>> +
>> kfd_set_dbg_ev_from_interrupt(dev, pasid, -1,
>>> +
>> KFD_EC_MASK(EC_QUEUE_WAVE_MEMORY_VIOLATION),
>>> + NULL, 0);*/
>> Which branch is this for? kfd_set_dbg_ev_from_interrupt shouldn't exist on
>> the upstream branch yet. That code is still under review for upstream.
>>
> My understanding is that it is for branch amd-staging-drm-next to make its way upstream.
> The code that calls that function is commented out. There are other pre-existing instances in that file in amd-staging-drm-next branch that are commented out also with that function.
> Please advise if I should remove it from the patch for now or keep it as commented out.
I'd prefer not to check in commented-out code to the upstream branch.
Please work with Jon to make sure he includes this in his rocm-gdb patch
series, where these changes belong. And you can submit them to the DKMS
branch as a separate patch in the interim.
Thanks,
Felix
>
> Thanks,
> David B.
>
>> Regards,
>> Felix
>>
>>
>>> + return;
>>> }
>>> break;
>>> default:
More information about the amd-gfx
mailing list