[PATCH] drm/radeon: disable any GPU activity after unrecovered lockup

Michel Dänzer michel at daenzer.net
Wed Jun 27 08:42:44 PDT 2012


On Mit, 2012-06-27 at 10:49 -0400, Jerome Glisse wrote: 
> On Wed, Jun 27, 2012 at 5:19 AM, Michel Dänzer <michel at daenzer.net> wrote:
> > On Die, 2012-06-26 at 17:04 -0400, j.glisse at gmail.com wrote:
> >> From: Jerome Glisse <jglisse at redhat.com>
> >>
> >> After unrecovered GPU lockup avoid any GPU activities to avoid
> >> things like kernel segfault and alike to happen in any of the
> >> path that assume hw is working.
> >
> > Has the patch been tested and confirmed to actually fix such a problem?
> 
> Yes it has been tested i dont send untested patch to ml.

I didn't expect (or mean to suggest) otherwise. I think I misread the
related IRC conversation from last night: I thought you basically
whipped up this patch in response to a report of such problems. But on
re-reading now, I guess you wrote this patch a while ago and are just
sending it now in response to the report on IRC.


> >>       r = radeon_asic_reset(rdev);
> >>       if (!r) {
> >>               dev_info(rdev->dev, "GPU reset succeed\n");
> >>               radeon_resume(rdev);
> >> -             radeon_restore_bios_scratch_regs(rdev);
> >> -             drm_helper_resume_force_mode(rdev->ddev);
> >> -             ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
> >>       }
> >>
> >> +     /* no matter what restore video mode */
> >> +     radeon_restore_bios_scratch_regs(rdev);
> >> +     drm_helper_resume_force_mode(rdev->ddev);
> >> +     ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
> >
> > Maybe this should be in a separate patch.
> 
> Idea is to send this patch to stable thus having one patch that have it all.

That doesn't make sense. Either the changes belong into a single patch
(but then the commit log should describe all of them) or not. They can
be sent to stable[0] either way.

[0] Actually, patches with Cc: stable are picked up automagically once
they hit mainline, there's no point in sending them there directly.


> >> @@ -399,6 +418,14 @@ static int radeon_bo_move(struct ttm_buffer_object *bo,
> >>               radeon_move_null(bo, new_mem);
> >>               return 0;
> >>       }
> >> +     if (!rdev->accel_working) {
> >> +             /* when accel is not working GPU is in broken state just
> >> +              * do nothing for any ttm operation to avoid making the
> >> +              * situation worst than it's
> >
> > 'worse than it is', same in the following two hunks.

Are you gonna fix these typos?


-- 
Earthling Michel Dänzer           |                   http://www.amd.com
Libre software enthusiast         |          Debian, X and DRI developer


More information about the dri-devel mailing list