[PATCH] radeon: Fix a false positive lockup after 10s of inactivity

Michel Dänzer michel at daenzer.net
Wed Jun 12 03:26:17 PDT 2013


On Die, 2013-06-11 at 16:23 -0700, Andy Lutomirski wrote:
> If the device is idle for over ten seconds, then the next attempt to do
> anything can race with the lockup detector and cause a bogus lockup
> to be detected.
> 
> Oddly, the situation is well-described in the lockup detector's comments
> and a fix is even described.  This patch implements that fix (and corrects
> some typos in the description).
> 
> My system has been stable for about a week running this code.  Without this,
> my screen would go blank every now and then and, when it came back, everything
> would be remarkably slow (the latter is a separate bug).
> 
> Signed-off-by: Andy Lutomirski <luto at amacapital.net>

[...]

> diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c
> index 1ef5eaa..fb7b3ea 100644
> --- a/drivers/gpu/drm/radeon/radeon_ring.c
> +++ b/drivers/gpu/drm/radeon/radeon_ring.c
> @@ -547,12 +547,12 @@ void radeon_ring_lockup_update(struct radeon_ring *ring)
>   * have CP rptr to a different value of jiffies wrap around which will force
>   * initialization of the lockup tracking informations.
>   *
> - * A possible false positivie is if we get call after while and last_cp_rptr ==
> - * the current CP rptr, even if it's unlikely it might happen. To avoid this
> - * if the elapsed time since last call is bigger than 2 second than we return
> - * false and update the tracking information. Due to this the caller must call
> - * radeon_ring_test_lockup several time in less than 2sec for lockup to be reported
> - * the fencing code should be cautious about that.
> + * A possible false positive is if we get called after a while and
> + * last_cp_rptr == the current CP rptr, even if it's unlikely it might
> + * happen. To avoid this if the elapsed time since the last call is bigger
> + * than 2 second then we return false and update the tracking
> + * information. Due to this the caller must call radeon_ring_test_lockup
> + * more frequently than once every 2s when waiting.

Is it guaranteed that radeon_ring_test_lockup will be called more often
than every 2s when waiting? If not, this change might prevent a real
lockup from being detected?

Either way, I wonder if there might not be a simpler solution to the
problem, e.g. by updating last_activity when submitting commands to a
previously empty ring.


-- 
Earthling Michel Dänzer           |                   http://www.amd.com
Libre software enthusiast         |          Debian, X and DRI developer


More information about the dri-devel mailing list