[Intel-gfx] [PATCH v2 1/3] drm: Add support for panic message output
Daniel Vetter
daniel at ffwll.ch
Wed Mar 13 08:37:10 UTC 2019
On Wed, Mar 13, 2019 at 08:49:17AM +0100, John Ogness wrote:
> On 2019-03-12, Ahmed S. Darwish <darwish.07 at gmail.com> wrote:
> >>>> +
> >>>> +static void drm_panic_kmsg_dump(struct kmsg_dumper *dumper,
> >>>> + enum kmsg_dump_reason reason)
> >>>> +{
> >>>> + class_for_each_device(drm_class, NULL, dumper, drm_panic_dev_iter);
> >>>
> >>> class_for_each_device uses klist, which only uses an irqsave
> >>> spinlock. I think that's good enough. Comment to that effect would
> >>> be good e.g.
> >>>
> >>> /* based on klist, which uses only a spin_lock_irqsave, which we
> >>> * assume still works */
> >>>
> >>> If we aim for perfect this should be a trylock still, maybe using
> >>> our own device list.
> >>>
> >
> > I definitely agree here.
> >
> > The lock may already be locked either by a stopped CPU, or by the
> > very same CPU we execute panic() on (e.g. NMI panic() on the
> > printing CPU).
> >
> > This is why it's very common for example in serial consoles, which
> > are usually careful about re-entrance and panic contexts, to do:
> >
> > xxxxxx_console_write(...) {
> > if (oops_in_progress)
> > locked = spin_trylock_irqsave(&port->lock, flags);
> > else
> > spin_lock_irqsave(&port->lock, flags);
> > }
> >
> > I'm quite positive we should do the same for panic drm drivers.
>
> This construction will continue, even if the trylock fails. It only
> makes sense to do this if the driver has a chance of being
> successful. Ignoring locks is a serious issue. I personally am quite
> unhappy that the serial drivers do this, which was part of my motivation
> for the new printk design I'm working on.
>
> If the driver is not capable of doing something useful on a failed
> trylock, then I recommend just skipping that device. Maybe trying it
> again later after trying all the devices?
Ah yes missed that. If the trylock fails anywhere, we must bail out.
Not sure retrying is useful, my experience from at least drm is that
either you're lucky, and drm wasn't doing anything right when the machine
blew up, and then the trylocks will all go through. Or you're unlucky, and
most likely that means drm itself blew up, and no amount of retrying is
going to help. I wouldn't bother.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the Intel-gfx
mailing list