[PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT

Christian König deathsimple at vodafone.de
Wed Oct 2 16:50:59 CEST 2013


Possible, but I would rather guess that this doesn't work because the IB test runs into a deadlock situation and so the GPU reset never fully completes.

Can you reproduce the problem?

If you want to make GPU resets more reliable I would rather suggest to remove the ring lock dependency.
Then we should try to give all the fence wait functions a (reliable) timeout and move reset handling a layer up into the ioctl functions. But for this you need to rip out the old PM code first.

Christian.

Marek Olšák <maraeo at gmail.com> schrieb:

>I'm afraid signalling the fences with an IB test is not reliable.
>
>Marek
>
>On Wed, Oct 2, 2013 at 3:52 PM, Christian König <deathsimple at vodafone.de> wrote:
>> NAK, after recovering from a lockup the first thing we do is signalling all remaining fences with an IB test.
>>
>> If we don't recover we indeed signal all fences manually.
>>
>> Signalling all fences regardless of the outcome of the reset creates problems with both types of partial resets.
>>
>> Christian.
>>
>> Marek Olšák <maraeo at gmail.com> schrieb:
>>
>>>From: Marek Olšák <marek.olsak at amd.com>
>>>
>>>After a lockup, fences are not signalled sometimes, causing
>>>the GEM_WAIT_IDLE ioctl to never return, which sometimes results
>>>in an X server freeze.
>>>
>>>This fixes only one of many deadlocks which can occur during a lockup.
>>>
>>>Signed-off-by: Marek Olšák <marek.olsak at amd.com>
>>>---
>>> drivers/gpu/drm/radeon/radeon_device.c | 5 +++++
>>> 1 file changed, 5 insertions(+)
>>>
>>>diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
>>>index 841d0e0..7b97baa 100644
>>>--- a/drivers/gpu/drm/radeon/radeon_device.c
>>>+++ b/drivers/gpu/drm/radeon/radeon_device.c
>>>@@ -1552,6 +1552,11 @@ int radeon_gpu_reset(struct radeon_device *rdev)
>>>       radeon_save_bios_scratch_regs(rdev);
>>>       /* block TTM */
>>>       resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
>>>+
>>>+      mutex_lock(&rdev->ring_lock);
>>>+      radeon_fence_driver_force_completion(rdev);
>>>+      mutex_unlock(&rdev->ring_lock);
>>>+
>>>       radeon_pm_suspend(rdev);
>>>       radeon_suspend(rdev);
>>>
>>>--
>>>1.8.1.2
>>>
>>>_______________________________________________
>>>dri-devel mailing list
>>>dri-devel at lists.freedesktop.org
>>>http://lists.freedesktop.org/mailman/listinfo/dri-devel


More information about the dri-devel mailing list