[PATCH v4 2/4] drm/panic: Add a drm panic handler

Tue Oct 10 13:24:51 UTC 2023

On 10/10/2023 14:59, Daniel Vetter wrote:
> On Tue, Oct 10, 2023 at 02:15:47PM +0200, Maxime Ripard wrote:
>> On Tue, Oct 10, 2023 at 01:29:52PM +0200, Noralf Trønnes wrote:
>>>
>>>
>>> On 10/10/23 11:25, Maxime Ripard wrote:
>>>>
>>>>
>>>> On Tue, Oct 10, 2023 at 10:55:09AM +0200, Thomas Zimmermann wrote:
>>>>>>> So if I understand correctly, drm_panic would pre-allocate a plane/commit,
>>>>>>> and use that when a panic occurs ?
>>>>>>
>>>>>> And have it checked already, yes. We would only wait for a panic to
>>>>>> happen to pull the trigger on the commit.
>>>>>>
>>>>>>> I have two concern about this approach:
>>>>>>> - How much memory would be allocated for this ? a whole framebuffer can be
>>>>>>> big for just this use case.
>>>>>
>>>>> As I outlined in my email at [1], there are a number of different scenarios.
>>>>> The question of atomic state and commits is entirely separate from the DRM
>>>>> panic handler. We should not throw them together. Whatever is necessary is
>>>>> get a scanout buffer, should happen on the driver side of
>>>>> get_scanout_buffer, not on the drm_panic side.
>>>>>
>>>>> [1] https://lore.kernel.org/dri-devel/39bd4c35-8a61-42ee-8675-ccea4f5d4ac6@suse.de/T/#m22f116e9438e00a5f0a9dc43987d4153424f8be1
>>>>>
>>>>>>
>>>>>> I'dd expect a whole framebuffer for the current
>>>>>> configuration/resolution. It would be typically 4MB for a full-HD system
>>>>>> which isn't a lot really and I guess we can always add an option to
>>>>>> disable the mechanism if needed.
>>>>>>
>>>>>>> - I find it risky to completely reconfigure the hardware in a panic handler.
>>>>>>
>>>>>> I would expect to only change the format and base address of the
>>>>>> framebuffer. I guess it can fail, but it doesn't seem that different to
>>>>>> the async plane update we already have and works well.
>>>>>
>>>>> The one thing I don't understand is: Why should we use atomic commits in the
>>>>> first place? It doesn't make sense for firmware-based drivers.
>>>>
>>>> Because this is generic infrastructure that is valuable for any drivers
>>>> and not only firmware-based drivers?
>>>>
>>>>> In some drivers, even the simple ast, we hold locks during the regular
>>>>> commit. Trying to run the panic commit concurrently will likely give a
>>>>> deadlock.
>>>>
>>>> You're in the middle of a panic. Don't take any locks and you won't deadlock.
>>>>
>>>>> In the end it's a per-driver decision, but in most cases, the driver can
>>>>> easily switch to a default mode with some ad-hoc code.
>>>>
>>>> When was the last time a per-driver decision has been a good thing? I'm
>>>> sorry, but the get_scanout_buffer approach buffer won't work for any
>>>> driver out there.
>>>>
>>>> I'm fine with discussing alternatives if you don't like the ones I
>>>> suggested, but they must allow the panic handler infrastructure to work
>>>> with any driver we have, not just 4.
>>>>
>>>
>>> Why can't we use the model[1] suggested by Daniel using a draw_pixel
>>> callback giving drivers full control on how they can put a pixel on the
>>> display?
>>
>> I share kind of the same general ideas/conclusions: "qthe idea is that
>> all the fb selection and lookup is handled in shared code (and with
>> proper locking, but only for atomic drivers)."
>>
>> 2016 is a bit old though and multiple developments happened since
>> (secure playback is a thing now, framebuffer compression too), so I
>> still think that their expectation that the framebuffer is accessible to
>> / writable by the CPU no longer holds true.
> 
> I think largely it should still be ok, because the idea behind the draw_xy
> callback was that the driver could take care of anything special, like
> - tiling
> - clearing compression bits so that just writing the raw pixels works (if
>    your compression format allows for that)
> - handling any differences in how the pixels need to be written, like
>    cache flushing, mmio_write vs normal memory, amd also has peek/poke
>    registers to be able to write even into unmappable memory
> 
> What would probably be a good idea is to do an s/void */uinptr_t/ over my
> interface proposal, or maybe an even more opaque cookie since really the
> only thing you can do is get the void * that ->panic_vmap returns and pass
> it into ->panic_draw_xy.
> 
> I thought (but I didn't dig through details) that the panic fb struct is
> essentially meant to serve this purpose in the current series?
> 

Yes, in this series there is a struct drm_scanout_buffer, that the 
driver has to provide when a panic occurs:

struct drm_scanout_buffer {
	const struct drm_format_info *format;
	struct iosys_map map;
	unsigned int pitch;
	unsigned int width;
	unsigned int height;
};

That works well for CPU accessible, linear format.
It should be possible to support YUV, or simple tiling with that, but 
for the more complex case, I proposed to add a draw_pixel() callback.

>>> This will even work for the AMD debug interface.
>>> In the linear CPU accessible buffer case, we can provide a helper for
>>> that, maybe we can do helpers for other common cases as well.
>>
>> Yeah, their idea of a panic_draw would work great for that.
>>
>>> Adding to that we would need a panic_setup/enter and panic_teardown/exit
>>> callback.
>>
>> What for?
> 
> So panic teardown would be for testing in CI, to make it non-destructive
> and clean up anything panic_vmap (or _enter or whatever you call it) has
> done. I thought John Oggness was also looking into how the new panic
> handlers/consoles could be made testable in the panic paths, including
> lock stealing and getting called from nmi.

Thanks, I was also wondering why a panic teardown would be necessary, 
since after the panic handler has run, the system will probably halt. 
(or maybe run kdump and reboot).

> -Sima