[PATCH v4 2/4] drm/panic: Add a drm panic handler

Tue Oct 10 12:59:33 UTC 2023

On Tue, Oct 10, 2023 at 02:15:47PM +0200, Maxime Ripard wrote:
> On Tue, Oct 10, 2023 at 01:29:52PM +0200, Noralf Trønnes wrote:
> > 
> > 
> > On 10/10/23 11:25, Maxime Ripard wrote:
> > > 
> > > 
> > > On Tue, Oct 10, 2023 at 10:55:09AM +0200, Thomas Zimmermann wrote:
> > >>>> So if I understand correctly, drm_panic would pre-allocate a plane/commit,
> > >>>> and use that when a panic occurs ?
> > >>>
> > >>> And have it checked already, yes. We would only wait for a panic to
> > >>> happen to pull the trigger on the commit.
> > >>>
> > >>>> I have two concern about this approach:
> > >>>> - How much memory would be allocated for this ? a whole framebuffer can be
> > >>>> big for just this use case.
> > >>
> > >> As I outlined in my email at [1], there are a number of different scenarios.
> > >> The question of atomic state and commits is entirely separate from the DRM
> > >> panic handler. We should not throw them together. Whatever is necessary is
> > >> get a scanout buffer, should happen on the driver side of
> > >> get_scanout_buffer, not on the drm_panic side.
> > >>
> > >> [1] https://lore.kernel.org/dri-devel/39bd4c35-8a61-42ee-8675-ccea4f5d4ac6@suse.de/T/#m22f116e9438e00a5f0a9dc43987d4153424f8be1
> > >>
> > >>>
> > >>> I'dd expect a whole framebuffer for the current
> > >>> configuration/resolution. It would be typically 4MB for a full-HD system
> > >>> which isn't a lot really and I guess we can always add an option to
> > >>> disable the mechanism if needed.
> > >>>
> > >>>> - I find it risky to completely reconfigure the hardware in a panic handler.
> > >>>
> > >>> I would expect to only change the format and base address of the
> > >>> framebuffer. I guess it can fail, but it doesn't seem that different to
> > >>> the async plane update we already have and works well.
> > >>
> > >> The one thing I don't understand is: Why should we use atomic commits in the
> > >> first place? It doesn't make sense for firmware-based drivers.
> > > 
> > > Because this is generic infrastructure that is valuable for any drivers
> > > and not only firmware-based drivers?
> > > 
> > >> In some drivers, even the simple ast, we hold locks during the regular
> > >> commit. Trying to run the panic commit concurrently will likely give a
> > >> deadlock.
> > > 
> > > You're in the middle of a panic. Don't take any locks and you won't deadlock.
> > > 
> > >> In the end it's a per-driver decision, but in most cases, the driver can
> > >> easily switch to a default mode with some ad-hoc code.
> > > 
> > > When was the last time a per-driver decision has been a good thing? I'm
> > > sorry, but the get_scanout_buffer approach buffer won't work for any
> > > driver out there.
> > > 
> > > I'm fine with discussing alternatives if you don't like the ones I
> > > suggested, but they must allow the panic handler infrastructure to work
> > > with any driver we have, not just 4.
> > > 
> > 
> > Why can't we use the model[1] suggested by Daniel using a draw_pixel
> > callback giving drivers full control on how they can put a pixel on the
> > display?
> 
> I share kind of the same general ideas/conclusions: "qthe idea is that
> all the fb selection and lookup is handled in shared code (and with
> proper locking, but only for atomic drivers)."
> 
> 2016 is a bit old though and multiple developments happened since
> (secure playback is a thing now, framebuffer compression too), so I
> still think that their expectation that the framebuffer is accessible to
> / writable by the CPU no longer holds true.

I think largely it should still be ok, because the idea behind the draw_xy
callback was that the driver could take care of anything special, like
- tiling
- clearing compression bits so that just writing the raw pixels works (if
  your compression format allows for that)
- handling any differences in how the pixels need to be written, like
  cache flushing, mmio_write vs normal memory, amd also has peek/poke
  registers to be able to write even into unmappable memory

What would probably be a good idea is to do an s/void */uinptr_t/ over my
interface proposal, or maybe an even more opaque cookie since really the
only thing you can do is get the void * that ->panic_vmap returns and pass
it into ->panic_draw_xy.

I thought (but I didn't dig through details) that the panic fb struct is
essentially meant to serve this purpose in the current series?

> > This will even work for the AMD debug interface.
> > In the linear CPU accessible buffer case, we can provide a helper for
> > that, maybe we can do helpers for other common cases as well.
> 
> Yeah, their idea of a panic_draw would work great for that.
> 
> > Adding to that we would need a panic_setup/enter and panic_teardown/exit
> > callback.
> 
> What for?

So panic teardown would be for testing in CI, to make it non-destructive
and clean up anything panic_vmap (or _enter or whatever you call it) has
done. I thought John Oggness was also looking into how the new panic
handlers/consoles could be made testable in the panic paths, including
lock stealing and getting called from nmi.
-Sima
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch