[Intel-gfx] [PATCH 5/6] drm/i915: Implement GPU reset

Ben Gamari bgamari.foss at gmail.com
Sun Sep 13 21:24:03 CEST 2009


On Sun, Sep 13, 2009 at 11:21:35AM -0700, Jesse Barnes wrote:
> On Sun, 13 Sep 2009 18:27:48 +0100
> Daniel J Blueman <daniel.blueman at gmail.com> wrote:
> 
> > A couple of things just caught my eye while looking through the patch,
> > perhaps to consider tweaking?
> 
> Thanks for looking.
> 
> > > +void i965_reset(struct drm_device *dev, u8 flags)
> > > +{
> > > +       drm_i915_private_t *dev_priv = dev->dev_private;
> > > +       unsigned long timeout;
> > > +       u8 gdrst;
> > > +       bool need_display = true; //!(flags & (GDRST_RENDER |
> > > GDRST_MEDIA));
> > 
> > As the kernel's coding style isn't C99, is it worth using /* foo */
> > here to be compliant?
> 
> Or just drop it and add a comment about the other two.  We'd need to
> experiment some more before trying to use them.

Yeah, I think this is the right thing to do.

> 
> > > +#if 1
> > > +               DRM_ERROR("i915 struct_mutex lock is still held by
> > > %s. Giving on up reset.\n", dev->struct_mutex.owner->task->comm);
> > > +               return;
> > > +#else
> > > +               struct task_struct *task =
> > > dev->struct_mutex.owner->task;
> > > +               DRM_ERROR("Killing process %d (%s) for holding i915
> > > device mutex\n",
> > > +                               task->pid, task->comm);
> > > +               force_sig(SIGILL, task);
> > 
> > Should be SIGKILL here?
> 
> We should just decide whether we want the warning here or the kill.
> Ben, I assume you tested the warning mostly?  Killing a waiter will
> probably just kill the unlucky task who happened to be waiting for the
> hung batch; might be friendlier to let it continue, but it also makes
> the reset less reliable.

We do need to decide what to do with the task that gave us the bad
batch. This particular code, however, doesn't handle that case. By the
time we get here we have already woken up the waiters with the wedged
flag set so in theory no one should be holding the lock. In the event
that someone is, we really have three options,

1) Try waking it up again and hope it doesn't take it this time
2) Wait for it to release it (probably will never happen)
3) Kill the process

I opted to kill the process.

> 
> > > +               pci_read_config_byte(dev->pdev, GDRST, &gdrst);
> > > +               //TODO: Set domains
> > 
> > Consider using kernel coding style perhaps?
> 
> Yeah, or just drop since domains are the render, media etc mentioned
> above.

Yep, this definitely slipped through.

> 
> > 
> > > +               pci_write_config_byte(dev->pdev, GDRST, gdrst |
> > > flags | ((flags == GDRST_FULL) ? 0x1 : 0x0));
> > > +               udelay(50);
> > > +               pci_write_config_byte(dev->pdev, GDRST, gdrst &
> > > 0xfe); +
> > > +               /* ...we don't want to loop forever though, 500ms
> > > should be plenty */
> > > +              timeout = jiffies + msecs_to_jiffies(500);
> > > +               do {
> > > +                       udelay(100);
> > > +                       pci_read_config_byte(dev->pdev, GDRST,
> > > &gdrst);
> > > +               } while ((gdrst & 0x1) && time_after(timeout,
> > > jiffies)); +
> > > +               if (gdrst & 0x1) {
> > > +                       DRM_ERROR("Failed to reset chip. You're
> > > screwed.");
> > 
> > Perhaps drop the second sentence, to keep things looking neat.
> 
> We might want to panic in that case?

Meh, this seems a little much. Perhaps just WARN?

Thanks for looking this over.

-  Ben



More information about the Intel-gfx mailing list