[PATCH] drm/amdgpu: nuke the ih reentrant lock

Christian König ckoenig.leichtzumerken at gmail.com
Fri Mar 12 14:50:35 UTC 2021


Am 12.03.21 um 15:36 schrieb Daniel Vetter:
> On Fri, Mar 12, 2021 at 03:35:50PM +0100, Daniel Vetter wrote:
>> On Fri, Mar 12, 2021 at 03:27:58PM +0100, Christian König wrote:
>>>
>>> Am 12.03.21 um 15:04 schrieb Daniel Vetter:
>>>> On Fri, Mar 12, 2021 at 02:59:06PM +0100, Christian König wrote:
>>>>> Interrupts on are non-reentrant on linux. This is just an ancient
>>>>> leftover from radeon where irq processing was kicked of from different
>>>>> places.
>>>>>
>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>> Man you tricked me into grepping this on radeon and it looks horrible.
>>>> atomic_t is unordered in linux, so whatever was built there for radeon
>>>> does not wokr like a lock. It's missing all the barriers afiui. Good
>>>> riddance at least for amdgpu.
>>> Hui? atomic_xchg() is perfectly ordered as far as I know.
>> Oh right, if atomic_ returns a value it's fully ordered. Nice tour into
>> memory-barriers.txt. It's the atomic_t operations that don't return
>> anything which are fully unordered, and I mixed that up.
>>
>>> IIRC Alex added this for R600 because we had boards where interrupts where
>>> sporadically swallowed during driver startup and we had to kick of ring
>>> buffer processing manually.
>>>
>>> Anyway we have a fence processing fallback timer in amdgpu instead and that
>>> stuff is probably unused on any modern hardware.
>> Ah yes that stuff. Kinda annoying we have these old dma_fence around where
>> dma_fence_wait doesn't really work that well, but oh well.
> Argh I'm not awake. dma_fence_wait() works well on these, but the generic
> cb stuff (which is used in ever more place) doesn't, because it misses the
> fallback timer the ->wait does.

That's not what I meant. See amdgpu_fence_schedule_fallback().

As soon as somebody requested a dma_fence to be signaled we start a 
500ms timeout for fence processing which is cleared as soon as we see 
the first interrupt.

That is necessary because some motherboards have the ugly behavior of 
swallowing approx ~0.1% if all MSI interrupts after some idle time.

No idea where exactly that comes from, but I have the feeling that this 
was the same bug Alex tried to handle on R6xx over 10 years ago with 
kicking of interrupt processing manually after enabling interrupts for 
the first time.

Regards,
Christian.

> -Daniel
>
>>>> Reviewed-by: Daniel Vetter <daniel.vetter at ffwll.ch>
>>> Thanks,
>>> Christian.
>>>
>>>>> ---
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 -
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c     | 5 -----
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h     | 1 -
>>>>>    3 files changed, 7 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> index a15f1b604733..886625fb464b 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> @@ -3284,7 +3284,6 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>>>>    	/* mutex initialization are all done here so we
>>>>>    	 * can recall function without having locking issues */
>>>>> -	atomic_set(&adev->irq.ih.lock, 0);
>>>>>    	mutex_init(&adev->firmware.mutex);
>>>>>    	mutex_init(&adev->pm.mutex);
>>>>>    	mutex_init(&adev->gfx.gpu_clock_mutex);
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>>>> index 1024065f1f03..faaa6aa2faaf 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>>>> @@ -228,10 +228,6 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih)
>>>>>    	wptr = amdgpu_ih_get_wptr(adev, ih);
>>>>>    restart_ih:
>>>>> -	/* is somebody else already processing irqs? */
>>>>> -	if (atomic_xchg(&ih->lock, 1))
>>>>> -		return IRQ_NONE;
>>>>> -
>>>>>    	DRM_DEBUG("%s: rptr %d, wptr %d\n", __func__, ih->rptr, wptr);
>>>>>    	/* Order reading of wptr vs. reading of IH ring data */
>>>>> @@ -244,7 +240,6 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih)
>>>>>    	amdgpu_ih_set_rptr(adev, ih);
>>>>>    	wake_up_all(&ih->wait_process);
>>>>> -	atomic_set(&ih->lock, 0);
>>>>>    	/* make sure wptr hasn't changed while processing */
>>>>>    	wptr = amdgpu_ih_get_wptr(adev, ih);
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>>>>> index 87ec6d20dbe0..0649b59830a5 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>>>>> @@ -64,7 +64,6 @@ struct amdgpu_ih_ring {
>>>>>    	bool                    enabled;
>>>>>    	unsigned		rptr;
>>>>> -	atomic_t		lock;
>>>>>    	struct amdgpu_ih_regs	ih_regs;
>>>>>    	/* For waiting on IH processing at checkpoint. */
>>>>> -- 
>>>>> 2.25.1
>>>>>
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel at lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>> -- 
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch



More information about the dri-devel mailing list