[PATCH 1/2] drm/radeon: stop the leaks in cik_ib_test

Christian König deathsimple at vodafone.de
Wed Oct 16 13:31:03 CEST 2013


Strange. I tested X with ~120 glxgears instances which got killed and 
restarted every 60-120 seconds for the whole night, but without any 
lockup or freeze.

What's the kernel backtrace when this happens? If I understand you 
correctly X is killable in that situation, is that right?

Please try the following:

echo 1 > 
/sys/kernel/debug/tracing/events/radeon/radeon_fence_wait_begin/enable
echo 1 > 
/sys/kernel/debug/tracing/events/radeon/radeon_fence_wait_end/enable

before starting X. And when X freezed "cat /sys/kernel/debug/tracing/trace".

Thanks,
Christian.

Am 15.10.2013 12:57, schrieb Marek Olšák:
> They are not lockups. X just freezes in GEM_WAIT. The only way to
> reproduce it is to apply the patches, use the computer and wait. It
> looks like a fence is not signalled and the process calling GEM_WAIT
> is not woken up.
>
> Marek
>
> On Tue, Oct 15, 2013 at 11:11 AM, Christian König
> <deathsimple at vodafone.de> wrote:
>> Mhm hard to say what's going wrong this time, but we probably need to fix it
>> before the final release.
>>
>> Do you have a kernel backtrace from the lockups? Or at least some way to
>> reproduce it?
>>
>> Christian.
>>
>> Am 14.10.2013 21:34, schrieb Marek Olšák:
>>
>>> Ooops, the new problem is not so rare. It has now happened to me 3
>>> times in an hour.
>>>
>>> Marek
>>>
>>> On Mon, Oct 14, 2013 at 9:13 PM, Marek Olšák <maraeo at gmail.com> wrote:
>>>> I tested this and had over 1546 lockups followed by a successful GPU
>>>> reset. Then the kernel probably crashed (judging by the fact ssh was
>>>> dead). Still, it's pretty impressive.
>>>>
>>>> There is a new problem though. The X server sometimes gets stuck in
>>>> GEM_WAIT and waits forever, even if there were no lockups before. It
>>>> occurs very rarely though. I didn't see this issue without your
>>>> patches.
>>>>
>>>> Marek
>>>>
>>>> On Mon, Oct 14, 2013 at 11:32 AM, Christian König
>>>> <deathsimple at vodafone.de> wrote:
>>>>> From: Christian König <christian.koenig at amd.com>
>>>>>
>>>>> Stop leaking IB memory and scratch register space when the test fails.
>>>>>
>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/radeon/cik.c | 3 +++
>>>>>    1 file changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
>>>>> index b874ccd..8f393df 100644
>>>>> --- a/drivers/gpu/drm/radeon/cik.c
>>>>> +++ b/drivers/gpu/drm/radeon/cik.c
>>>>> @@ -3182,6 +3182,7 @@ int cik_ib_test(struct radeon_device *rdev, struct
>>>>> radeon_ring *ring)
>>>>>           r = radeon_ib_get(rdev, ring->idx, &ib, NULL, 256);
>>>>>           if (r) {
>>>>>                   DRM_ERROR("radeon: failed to get ib (%d).\n", r);
>>>>> +               radeon_scratch_free(rdev, scratch);
>>>>>                   return r;
>>>>>           }
>>>>>           ib.ptr[0] = PACKET3(PACKET3_SET_UCONFIG_REG, 1);
>>>>> @@ -3198,6 +3199,8 @@ int cik_ib_test(struct radeon_device *rdev, struct
>>>>> radeon_ring *ring)
>>>>>           r = radeon_fence_wait(ib.fence, false);
>>>>>           if (r) {
>>>>>                   DRM_ERROR("radeon: fence wait failed (%d).\n", r);
>>>>> +               radeon_scratch_free(rdev, scratch);
>>>>> +               radeon_ib_free(rdev, &ib);
>>>>>                   return r;
>>>>>           }
>>>>>           for (i = 0; i < rdev->usec_timeout; i++) {
>>>>> --
>>>>> 1.8.1.2
>>>>>
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel at lists.freedesktop.org
>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>



More information about the dri-devel mailing list