[Intel-gfx] [PATCH v2 1/3] drm: Add support for panic message output

Daniel Vetter daniel at ffwll.ch
Wed Mar 13 08:43:58 UTC 2019


On Tue, Mar 12, 2019 at 08:02:56PM +0200, Ville Syrjälä wrote:
> On Tue, Mar 12, 2019 at 06:37:57PM +0100, Noralf Trønnes wrote:
> > 
> > 
> > Den 12.03.2019 18.25, skrev Ville Syrjälä:
> > > On Tue, Mar 12, 2019 at 06:15:24PM +0100, Noralf Trønnes wrote:
> > >>
> > >>
> > >> Den 12.03.2019 17.17, skrev Ville Syrjälä:
> > >>> On Tue, Mar 12, 2019 at 11:47:04AM +0100, Michel Dänzer wrote:
> > >>>> On 2019-03-11 6:42 p.m., Noralf Trønnes wrote:
> > >>>>> This adds support for outputting kernel messages on panic().
> > >>>>> A kernel message dumper is used to dump the log. The dumper iterates
> > >>>>> over each DRM device and it's crtc's to find suitable framebuffers.
> > >>>>>
> > >>>>> All the other dumpers are run before this one except mtdoops.
> > >>>>> Only atomic drivers are supported.
> > >>>>>
> > >>>>> Signed-off-by: Noralf Trønnes <noralf at tronnes.org>
> > >>>>> ---
> > >>>>>  [...]
> > >>>>>
> > >>>>> diff --git a/include/drm/drm_framebuffer.h b/include/drm/drm_framebuffer.h
> > >>>>> index f0b34c977ec5..f3274798ecfe 100644
> > >>>>> --- a/include/drm/drm_framebuffer.h
> > >>>>> +++ b/include/drm/drm_framebuffer.h
> > >>>>> @@ -94,6 +94,44 @@ struct drm_framebuffer_funcs {
> > >>>>>  		     struct drm_file *file_priv, unsigned flags,
> > >>>>>  		     unsigned color, struct drm_clip_rect *clips,
> > >>>>>  		     unsigned num_clips);
> > >>>>> +
> > >>>>> +	/**
> > >>>>> +	 * @panic_vmap:
> > >>>>> +	 *
> > >>>>> +	 * Optional callback for panic handling.
> > >>>>> +	 *
> > >>>>> +	 * For vmapping the selected framebuffer in a panic context. Must
> > >>>>> +	 * be super careful about locking (only trylocking allowed).
> > >>>>> +	 *
> > >>>>> +	 * RETURNS:
> > >>>>> +	 *
> > >>>>> +	 * NULL if it didn't work out, otherwise an opaque cookie which is
> > >>>>> +	 * passed to @panic_draw_xy. It can be anything: vmap area, structure
> > >>>>> +	 * with more details, just a few flags, ...
> > >>>>> +	 */
> > >>>>> +	void *(*panic_vmap)(struct drm_framebuffer *fb);
> > >>>>
> > >>>> FWIW, the panic_vmap hook cannot work in general with the amdgpu/radeon
> > >>>> drivers:
> > >>>>
> > >>>> Framebuffers are normally tiled, writing to them with the CPU results in
> > >>>> garbled output.
> > >>>>
> > >>
> > >> In which case the driver needs to support the ->panic_draw_xy callback,
> > >> or maybe it's possible to make a generic helper for tiled buffers.

I've proposed somewhere else that we rename panic_vmap to panic_prepare,
and the vmap pointer to an abstract cookie. Then the driver can do
whatever it wants too, e.g. in ->panic_prepare it does a few trylocks to
get at the buffer and make sure it can set up a temporary pte to write
into it page-by-page. ->panic_draw_xy can then do whatever it needs to do,
using the opaque void *cookie.

And if the trylock fails you just return NULL from ->panic_prepare

And ->panic_cleanup would be just to clean up the mess for the validation
use-case when running this from debugfs.

> > >>
> > >>>> With a discrete GPU having a large amount of VRAM, the framebuffer may
> > >>>> not be directly CPU accessible at all.
> > >>>>
> > >>
> > >> I would have been nice to know how Windows works around this.
> > >>
> > >>>>
> > >>>> There would need to be a mechanism for switching scanout to a linear,
> > >>>> CPU accessible framebuffer.
> > >>>
> > >>> I suppose panic_vmap() could just provide a linear temp buffer
> > >>> to the panic handler, and panic_unmap() could copy the contents
> > >>> over to the real fb.
> > >>>
> > >>> That said, this approach of scribbling over the primary plane's
> > >>> framebuffer has some clear limitations:
> > >>> * something may overwrite the oops message before the user
> > >>>   can even read it
> > >>
> > >> When the dumper drm_panic_kmsg_dump() runs, the other CPU's should have
> > >> been stopped. See panic().
> > > 
> > > GPUs etc. may still be executing away.
> > > 
> > 
> > Would it be safe to stop it in a panic situation? It would ofc be bad to
> > crash the box even harder.
> 
> Some drivers/devices may have working (and hopefully even reliable)
> gpu reset, some may not.

I don't think touching the gpu is a good idea. Even disabling planes and
all that feels risky. And there's really not much working anymore in panic
context, we can't even schedule a worker/timer to redraw the panic output
a bit later.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the dri-devel mailing list