[Intel-gfx] [PATCH v2 1/3] drm: Add support for panic message output
Noralf Trønnes
noralf at tronnes.org
Wed Mar 13 10:24:30 UTC 2019
Den 12.03.2019 23.13, skrev Ahmed S. Darwish:
> Hi,
>
> [[ CCing John for the trylock parts ]]
>
> On Mon, Mar 11, 2019 at 11:33:15PM +0100, Noralf Trønnes wrote:
>>
>>
>> Den 11.03.2019 20.23, skrev Daniel Vetter:
>>> On Mon, Mar 11, 2019 at 06:42:16PM +0100, Noralf Trønnes wrote:
>>>> This adds support for outputting kernel messages on panic().
>>>> A kernel message dumper is used to dump the log. The dumper iterates
>>>> over each DRM device and it's crtc's to find suitable framebuffers.
>>>>
>>>> All the other dumpers are run before this one except mtdoops.
>>>> Only atomic drivers are supported.
>>>>
>>>> Signed-off-by: Noralf Trønnes <noralf at tronnes.org>
>>>
>>> Bunch of comments/ideas for you or Darwish below, whoever picks this up.
>>
>> Actually it would ne nice if Darwish could pick it up since he will do
>> it on i915 which will be useful to a much broader audience.
>> If not I'll respin when I'm done with the drm_fb_helper refactoring.
>>
>
> Yup, I'll be more than happy to do this..
Thanks for doing that.
Noralf.
> while preserving all of
> Noralf's authorship and copyright notices of course.
>
> I guess it can be:
>
> - Handle the comments posted by Daniel and others (I'll post
> some questions too)
>
> - Add the necessary i915 specific bits
>
> - Test, post v3/v4/../vn. Rinse and repeat. Keep it local at
> dri-devel until getting the necessary S-o-Bs.
>
> - Post to wider audience (some feedback from distribution folks
> would also be nice, before posting to lkml)
>
> More comments below..
>
> [...]
>
>>>> +
>>>> +static void drm_panic_kmsg_dump(struct kmsg_dumper *dumper,
>>>> + enum kmsg_dump_reason reason)
>>>> +{
>>>> + class_for_each_device(drm_class, NULL, dumper, drm_panic_dev_iter);
>>>
>>> class_for_each_device uses klist, which only uses an irqsave spinlock. I
>>> think that's good enough. Comment to that effect would be good e.g.
>>>
>>> /* based on klist, which uses only a spin_lock_irqsave, which we
>>> * assume still works */
>>>
>>> If we aim for perfect this should be a trylock still, maybe using our own
>>> device list.
>>>
>
> I definitely agree here.
>
> The lock may already be locked either by a stopped CPU, or by the
> very same CPU we execute panic() on (e.g. NMI panic() on the
> printing CPU).
>
> This is why it's very common for example in serial consoles, which
> are usually careful about re-entrance and panic contexts, to do:
>
> xxxxxx_console_write(...) {
> if (oops_in_progress)
> locked = spin_trylock_irqsave(&port->lock, flags);
> else
> spin_lock_irqsave(&port->lock, flags);
> }
>
> I'm quite positive we should do the same for panic drm drivers.
> John?
>
>>>> +}
>>>> +
>>>> +static struct kmsg_dumper drm_panic_kmsg_dumper = {
>>>> + .dump = drm_panic_kmsg_dump,
>>>> + .max_reason = KMSG_DUMP_PANIC,
>>>> +};
>>>> +
>>>> +static ssize_t drm_panic_file_panic_write(struct file *file,
>>>> + const char __user *user_buf,
>>>> + size_t count, loff_t *ppos)
>>>> +{
>>>> + unsigned long long val;
>>>> + char buf[24];
>>>> + size_t size;
>>>> + ssize_t ret;
>>>> +
>>>> + size = min(sizeof(buf) - 1, count);
>>>> + if (copy_from_user(buf, user_buf, size))
>>>> + return -EFAULT;
>>>> +
>>>> + buf[size] = '\0';
>>>> + ret = kstrtoull(buf, 0, &val);
>>>> + if (ret)
>>>> + return ret;
>>>> +
>>>> + drm_panic_kmsg_dumper.max_reason = KMSG_DUMP_OOPS;
>>>> + wmb();
>>>> +
>>>> + /* Do a real test with: echo c > /proc/sysrq-trigger */
>>>> +
>>>> + if (val == 0) {
>>>> + pr_info("Test panic screen using kmsg_dump(OOPS)\n");
>>>> + kmsg_dump(KMSG_DUMP_OOPS);
>>>> + } else if (val == 1) {
>>>> + char *null_pointer = NULL;
>>>> +
>>>> + pr_info("Test panic screen using NULL pointer dereference\n");
>>>> + *null_pointer = 1;
>>>> + } else {
>>>> + return -EINVAL;
>>>> + }
>>>
>>> This isn't quite what I had in mind, since it still kills the kernel (like
>>> sysrq-trigger).
>>
>> If val == 0, it doesn't kill the kernel, it only dumps the kernel log.
>> And it doesn't taint the kernel either.
>>
>>> Instead what I had in mind is to recreate the worst
>>> possible panic context as much as feasible (disabling interrupts should be
>>> a good start, maybe we can even do an nmi callback), and then call our
>>> panic implementation. That way we can test the panic handler in a
>>> non-destructive way (i.e. aside from last dmesg lines printed to the
>>> screen nothing bad happens to the kernel: No real panic, no oops, no
>>> tainting).
>>
>> The interrupt case I can do, nmi I have no idea.
>>
>
> I agree too. Disabling interrupts + CONFIG_DEBUG_ATOMIC_SLEEP
> would be a nice non-destructive test-case emulation.
>
> thanks!
>
> --
> darwi
> http://darwish.chasingpointers.com
>
More information about the Intel-gfx
mailing list