radeon error on resume: "cp failed to get scratch reg" - anyone interested?

Simon Kitching skitching at vonos.net
Wed Sep 19 23:54:20 PDT 2012


On 19/09/12 11:12, Michel Dänzer wrote:
> On Mon, 2012-09-17 at 15:06 +0200, Simon Kitching wrote:
>> Hi,
>>
>> I've noticed that on resume from suspend, dmesg reports:
>>
>> [21895.997536] [drm] radeon: 1 quad pipes, 2 z pipes initialized.
>> [21896.012072] [drm] PCIE GART of 512M enabled (table at
>> 0x0000000000040000).
>> [21896.012082] radeon 0000:01:00.0: WB enabled
>> [21896.012085] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr
>> 0x0000000010000000 and cpu addr 0xffccb000
>> [21896.012138] [drm] radeon: ring at 0x0000000010001000
>> [21896.012157] [drm:r100_ring_test] *ERROR* radeon: cp failed to get
>> scratch reg (-22).
>> [21896.012158] [drm:r100_cp_init] *ERROR* radeon: cp isn't working (-22).
>> [21896.012160] radeon 0000:01:00.0: failed initializing CP (-22).
>>
>> Graphics still seems to work fine though.
>>
>> Is this info of interest to anyone? I'm happy to do additional testing
>> if that is useful.
> Does the patch below fix the failure to get a scratch register?
>
>
>  From 6780dca112752e1fc4e7d410441b39874565b754 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Michel=20D=C3=A4nzer?= <michel.daenzer at amd.com>
> Date: Wed, 19 Sep 2012 11:09:14 +0200
> Subject: [PATCH] drm/radeon: Fix scratch register leak in IB test.
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Restructure the code to jump out via labels instead of directly returning
> early.
> ---
>   drivers/gpu/drm/radeon/r100.c |   12 ++++++------
>   drivers/gpu/drm/radeon/r600.c |   12 ++++++------
>   2 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
> index 8acb34f..819c352 100644
> --- a/drivers/gpu/drm/radeon/r100.c
> +++ b/drivers/gpu/drm/radeon/r100.c
> @@ -3818,7 +3818,7 @@ int r100_ib_test(struct radeon_device *rdev, struct radeon_ring *ring)
>   	WREG32(scratch, 0xCAFEDEAD);
>   	r = radeon_ib_get(rdev, RADEON_RING_TYPE_GFX_INDEX, &ib, 256);
>   	if (r) {
> -		return r;
> +		goto free_scratch;
>   	}
>   	ib.ptr[0] = PACKET0(scratch, 0);
>   	ib.ptr[1] = 0xDEADBEEF;
> @@ -3831,13 +3831,11 @@ int r100_ib_test(struct radeon_device *rdev, struct radeon_ring *ring)
>   	ib.length_dw = 8;
>   	r = radeon_ib_schedule(rdev, &ib, NULL);
>   	if (r) {
> -		radeon_scratch_free(rdev, scratch);
> -		radeon_ib_free(rdev, &ib);
> -		return r;
> +		goto free_ib;
>   	}
>   	r = radeon_fence_wait(ib.fence, false);
>   	if (r) {
> -		return r;
> +		goto free_ib;
>   	}
>   	for (i = 0; i < rdev->usec_timeout; i++) {
>   		tmp = RREG32(scratch);
> @@ -3853,8 +3851,10 @@ int r100_ib_test(struct radeon_device *rdev, struct radeon_ring *ring)
>   			  scratch, tmp);
>   		r = -EINVAL;
>   	}
> -	radeon_scratch_free(rdev, scratch);
> +free_ib:
>   	radeon_ib_free(rdev, &ib);
> +free_scratch:
> +	radeon_scratch_free(rdev, scratch);
>   	return r;
>   }
>   
> diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
> index d79c639..38b546e 100644
> --- a/drivers/gpu/drm/radeon/r600.c
> +++ b/drivers/gpu/drm/radeon/r600.c
> @@ -2638,7 +2638,7 @@ int r600_ib_test(struct radeon_device *rdev, struct radeon_ring *ring)
>   	r = radeon_ib_get(rdev, ring->idx, &ib, 256);
>   	if (r) {
>   		DRM_ERROR("radeon: failed to get ib (%d).\n", r);
> -		return r;
> +		goto free_scratch;
>   	}
>   	ib.ptr[0] = PACKET3(PACKET3_SET_CONFIG_REG, 1);
>   	ib.ptr[1] = ((scratch - PACKET3_SET_CONFIG_REG_OFFSET) >> 2);
> @@ -2646,15 +2646,13 @@ int r600_ib_test(struct radeon_device *rdev, struct radeon_ring *ring)
>   	ib.length_dw = 3;
>   	r = radeon_ib_schedule(rdev, &ib, NULL);
>   	if (r) {
> -		radeon_scratch_free(rdev, scratch);
> -		radeon_ib_free(rdev, &ib);
>   		DRM_ERROR("radeon: failed to schedule ib (%d).\n", r);
> -		return r;
> +		goto free_ib;
>   	}
>   	r = radeon_fence_wait(ib.fence, false);
>   	if (r) {
>   		DRM_ERROR("radeon: fence wait failed (%d).\n", r);
> -		return r;
> +		goto free_ib;
>   	}
>   	for (i = 0; i < rdev->usec_timeout; i++) {
>   		tmp = RREG32(scratch);
> @@ -2669,8 +2667,10 @@ int r600_ib_test(struct radeon_device *rdev, struct radeon_ring *ring)
>   			  scratch, tmp);
>   		r = -EINVAL;
>   	}
> -	radeon_scratch_free(rdev, scratch);
> +free_ib:
>   	radeon_ib_free(rdev, &ib);
> +free_scratch:
> +	radeon_scratch_free(rdev, scratch);
>   	return r;
>   }
>   
Sadly the patch does not fix the issue. However I agree that this patch 
is a good thing - the original code does clearly leak scratch registers 
in some failure conditions.

Maybe this patch could also add the DRM_ERROR calls for these error 
cases from the r600 version into the r100 version?

Note that on failure of radeon_ib_schedule, the order of resource 
releases *was*
   radeon_scratch_free
   radeon_ib_free
and with this patch is:
   radeon_ib_free
   radeon_scratch_free
This change looks correct to me, ie post-patch the frees are in reverse 
order to original allocation.

If you do submit this patch, please add "reviewed-by 
skitching at vonos.net" or "tested-by" or whatever you think the 
appropriate tag is.

Thanks to this patch pointing me to the appropriate area, I have 
actually found the real cause of the scratch-register leak, and will 
post a candidate patch today.

Regards,Simon


More information about the dri-devel mailing list