[PATCH 1/3] drm/amdkfd: update parameter for event_interrupt_poison_consumption

Felix Kuehling felix.kuehling at amd.com
Mon Mar 14 18:25:02 UTC 2022


Am 2022-03-14 um 03:03 schrieb Tao Zhou:
> Other parameters can be gotten from ih_ring_entry, so only inputting
> ih_ring_entry is enough.

I'm not sure what's the reason for this change. You remove one 
parameter, but end up duplicating the SOC15_..._FROM_IH_RING_ENTRY 
translations. It doesn't look like a net improvement to me.

Looking at this function a bit more, this code looks problematic:

         if (atomic_read(&p->poison)) {
                 kfd_unref_process(p);
                 return;
         }

         atomic_set(&p->poison, 1);
         kfd_unref_process(p);

Doing the read and set as two separate operations is not atomic. You 
should use atomic_cmpxchg here to make sure the poison-consumption is 
handled only once:

	old_poison = atomic_cmpxchg(&p->poison, 0, 1);
	kfd_unref_process(p);
	if (old_poison)
		return;
	/* handle poison consumption */

Alternatively you could use atomic_inc_return and do the poison handling 
only if that returns exactly 1.

Regards,
   Felix


>
> Signed-off-by: Tao Zhou <tao.zhou1 at amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 13 +++++++++----
>   1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> index 7eedbcd14828..f7def0bf0730 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> @@ -91,11 +91,16 @@ enum SQ_INTERRUPT_ERROR_TYPE {
>   #define KFD_SQ_INT_DATA__ERR_TYPE__SHIFT 20
>   
>   static void event_interrupt_poison_consumption(struct kfd_dev *dev,
> -				uint16_t pasid, uint16_t source_id)
> +				const uint32_t *ih_ring_entry)
>   {
> +	uint16_t source_id, pasid;
>   	int ret = -EINVAL;
> -	struct kfd_process *p = kfd_lookup_process_by_pasid(pasid);
> +	struct kfd_process *p;
>   
> +	source_id = SOC15_SOURCE_ID_FROM_IH_ENTRY(ih_ring_entry);
> +	pasid = SOC15_PASID_FROM_IH_ENTRY(ih_ring_entry);
> +
> +	p = kfd_lookup_process_by_pasid(pasid);
>   	if (!p)
>   		return;
>   
> @@ -270,7 +275,7 @@ static void event_interrupt_wq_v9(struct kfd_dev *dev,
>   					sq_intr_err);
>   				if (sq_intr_err != SQ_INTERRUPT_ERROR_TYPE_ILLEGAL_INST &&
>   					sq_intr_err != SQ_INTERRUPT_ERROR_TYPE_MEMVIOL) {
> -					event_interrupt_poison_consumption(dev, pasid, source_id);
> +					event_interrupt_poison_consumption(dev, ih_ring_entry);
>   					return;
>   				}
>   				break;
> @@ -291,7 +296,7 @@ static void event_interrupt_wq_v9(struct kfd_dev *dev,
>   		if (source_id == SOC15_INTSRC_SDMA_TRAP) {
>   			kfd_signal_event_interrupt(pasid, context_id0 & 0xfffffff, 28);
>   		} else if (source_id == SOC15_INTSRC_SDMA_ECC) {
> -			event_interrupt_poison_consumption(dev, pasid, source_id);
> +			event_interrupt_poison_consumption(dev, ih_ring_entry);
>   			return;
>   		}
>   	} else if (client_id == SOC15_IH_CLIENTID_VMC ||


More information about the amd-gfx mailing list