[PATCH v4 2/4] drm/panic: Add a drm panic handler

Maxime Ripard mripard at kernel.org
Mon Oct 9 11:33:31 UTC 2023


On Mon, Oct 09, 2023 at 11:48:29AM +0200, Jocelyn Falempe wrote:
> On 09/10/2023 10:28, Maxime Ripard wrote:
> > Hi,
> > 
> > On Mon, Oct 09, 2023 at 09:47:49AM +0200, Jocelyn Falempe wrote:
> > > On 06/10/2023 18:54, Noralf Trønnes wrote:
> > > > 
> > > > 
> > > > On 10/6/23 16:35, Maxime Ripard wrote:
> > > > > Hi Jocelyn,
> > > > > 
> > > > > On Thu, Oct 05, 2023 at 11:16:15AM +0200, Jocelyn Falempe wrote:
> > > > > > On 05/10/2023 10:18, Maxime Ripard wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > On Tue, Oct 03, 2023 at 04:22:45PM +0200, Jocelyn Falempe wrote:
> > > > > > > > diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> > > > > > > > index 89e2706cac56..e538c87116d3 100644
> > > > > > > > --- a/include/drm/drm_drv.h
> > > > > > > > +++ b/include/drm/drm_drv.h
> > > > > > > > @@ -43,6 +43,7 @@ struct dma_buf_attachment;
> > > > > > > >     struct drm_display_mode;
> > > > > > > >     struct drm_mode_create_dumb;
> > > > > > > >     struct drm_printer;
> > > > > > > > +struct drm_scanout_buffer;
> > > > > > > >     struct sg_table;
> > > > > > > >     /**
> > > > > > > > @@ -408,6 +409,19 @@ struct drm_driver {
> > > > > > > >     	 */
> > > > > > > >     	void (*show_fdinfo)(struct drm_printer *p, struct drm_file *f);
> > > > > > > > +	/**
> > > > > > > > +	 * @get_scanout_buffer:
> > > > > > > > +	 *
> > > > > > > > +	 * Get the current scanout buffer, to display a panic message with drm_panic.
> > > > > > > > +	 * It is called from a panic callback, and must follow its restrictions.
> > > > > > > > +	 *
> > > > > > > > +	 * Returns:
> > > > > > > > +	 *
> > > > > > > > +	 * Zero on success, negative errno on failure.
> > > > > > > > +	 */
> > > > > > > > +	int (*get_scanout_buffer)(struct drm_device *dev,
> > > > > > > > +				  struct drm_scanout_buffer *sb);
> > > > > > > > +
> > > > > > > 
> > > > > > > What is the format of that buffer? What is supposed to happen if the
> > > > > > > planes / CRTC are setup in a way that is incompatible with the buffer
> > > > > > > format?
> > > > > > 
> > > > > > Currently, it only supports linear format, either in system memory, or
> > > > > > iomem.
> > > > > > But really what is needed is the screen size, and a way to write pixels to
> > > > > > it.
> > > > > > For more complex GPU, I don't know if it's easier to reprogram the GPU to
> > > > > > linear format, or to add a simple "tiled" support to drm_panic.
> > > > > > What would you propose as a panic interface to handle those complex format ?
> > > > > 
> > > > > It's not just about tiling, but also about YUV formats. If the display
> > > > > engine is currently playing a video at the moment, it's probably going
> > > > > to output some variation of multi-planar YUV and you won't have an RGB
> > > > > buffer available.
> > > > > 
> > > > 
> > > > I had support for some YUV formats in my 2019 attempt on a panic
> > > > handler[1] and I made a recording of a test run as well[2] (see 4:30 for
> > > > YUV). There was a discussion about challenges and i915 can disable
> > > > tiling by flipping a bit in a register[3] and AMD has a debug
> > > > interface[4] they can use to write pixels.
> > > 
> > > I only added support for the format used by simpledrm, because I don't want
> > > to add support for all possible format if no driver are using it.
> > 
> > Sure.
> > 
> > > It should be possible to add YUV format too.
> > > 
> > > I also prefer to convert only the foreground/background color, and then
> > > write directly into the buffers, instead of converting line by line.
> > > It works for all format where pixel size is a multiple of byte.
> > 
> > My point was that there might not be a buffer to write to.
> > DMA_ATTR_NO_KERNEL_MAPPING exists, dma-buf might be unaccessible, unsafe
> > or be completely out of control of the kernel space, or even not be
> > accessible by the system at all (when doing secure playback for example,
> > where the "trusted" component will do the decoding and will only give
> > back a dma-buf handle to a secure memory buffer).
> > 
> > I appreciate that we probably don't want to address these scenarios
> > right now, but we should have a path forward to support them eventually.
> > Copying the panic handler content to the buffer is optimistic and won't
> > work in all the scenarios described above, pretty much requiring to
> > start from scratch that effort.
> > 
> > > > https://lore.kernel.org/dri-devel/20190311174218.51899-1-noralf@tronnes.org/
> > > > [2] https://youtu.be/lZ80vL4dgpE
> > > > [3]
> > > > https://lore.kernel.org/dri-devel/20190314095004.GP2665@phenom.ffwll.local/
> > > > [4]
> > > > https://lore.kernel.org/dri-devel/d233c376-ed07-2127-6084-8292d313dac7@amd.com/
> > > > 
> > > > > Same story if you're using a dma-buf buffer. You might not even be able
> > > > > to access that buffer at all from the CPU or the kernel.
> > > > > 
> > > > > I really think we should have some emergency state ready to commit on
> > > > > the side, and possibly a panic_commit function to prevent things like
> > > > > sleeping or waiting that regular atomic_commit can use.
> > > > > 
> > > > > That way, you know have all the resources available to you any time.
> > > 
> > > I think reusing the atomic commit functions might be hard, because there are
> > > locks/allocation/threads hidden in drivers callback.
> > 
> > Allocations are bugs as far as I'm concerned. Allocations in
> > atomic_commit path are pretty big hint that you're doing something wrong
> > so I wouldn't worry too much about them. For locking, yeah... Which is
> > why I was suggesting an emergency atomic_commit of some sorts (for
> > planes only?). Switching back to whatever we were doing to an RGB plane
> > should be simple enough for most drivers and probably can be done safely
> > enough on most drivers without any locks.
> > 
> > And you don't need to support all kinds of tiling, YUV or RGB variants.
> 
> So if I understand correctly, drm_panic would pre-allocate a plane/commit,
> and use that when a panic occurs ?

And have it checked already, yes. We would only wait for a panic to
happen to pull the trigger on the commit.

> I have two concern about this approach:
> - How much memory would be allocated for this ? a whole framebuffer can be
> big for just this use case.

I'dd expect a whole framebuffer for the current
configuration/resolution. It would be typically 4MB for a full-HD system
which isn't a lot really and I guess we can always add an option to
disable the mechanism if needed.

> - I find it risky to completely reconfigure the hardware in a panic handler.

I would expect to only change the format and base address of the
framebuffer. I guess it can fail, but it doesn't seem that different to
the async plane update we already have and works well.

> Also how many drivers would need this ?
> 
> Currently I was mostly considering x86 platform, so:
> 
> simpledrm/ast/mgag200 which works well with the get_scanout_buffer().
> 
> i915/amdgpu/nouveau, which are quite complex, and will need to do their own
> thing anyway.

I guess we're not entirely aligned there then. I would expect that
mechanism to work with any atomic KMS driver. You are right that i915,
amdgpu and nouveau are special enough that some extra internal plumbing
is going to be required, but I'd expect it to be easy to support with
any other driver for a memory-mapped device.

I guess what I'm trying to say is, even though it's totally fine that
you only support those drivers at first, supporting in vc4 for example
shouldn't require to rewrite the whole thing.

> > > I'm more in favor of an emergency function, that each driver has to
> > > implement, and use what the hardware can do to display a simple frame
> > > quickly. get_scanout_buffer() is a good start for simple driver, but
> > > will need refactoring for the more complex case, like adding a
> > > callback to write pixels one by one, if there is no memory mapped
> > > buffer available.
> > 
> > Sorry, I'm not quite sure what you mean there, where would you write the
> > pixel to?
> 
> It was mentioned by Noralf, about the amdgpu driver:
> 
> https://lore.kernel.org/dri-devel/d233c376-ed07-2127-6084-8292d313dac7@amd.com/
> 
> They have a slow "debug interface" that you can write to, and can be used
> for the panic handler. It's not memory mapped, so you have to write pixels
> one by one.
> 
> So for the struct drm_scanout_buffer, I can add a function pointer to a
> write_pixel(u32 x, u32 y, u32 color)
> So if the iosys map is null, it will use that instead.

It would be nice to support indeed, but it's a fairly rare feature afaik
so I'd rather focus on getting something that can work for everyone first

Maxime
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20231009/e35c1fcf/attachment.sig>


More information about the dri-devel mailing list