[Intel-gfx] [PATCH v2 1/3] drm: Add support for panic message output

Wed Mar 13 17:30:37 UTC 2019

Am 13.03.19 um 17:16 schrieb Kazlauskas, Nicholas:
> On 3/13/19 11:54 AM, Christian König wrote:
>> Am 13.03.19 um 16:38 schrieb Michel Dänzer:
>>> On 2019-03-13 2:37 p.m., Christian König wrote:
>>>> Am 13.03.19 um 14:31 schrieb Ville Syrjälä:
>>>>> On Wed, Mar 13, 2019 at 10:35:08AM +0100, Michel Dänzer wrote:
>>>>>> On 2019-03-12 6:15 p.m., Noralf Trønnes wrote:
>>>>>>> Den 12.03.2019 17.17, skrev Ville Syrjälä:
>>>>>>>> On Tue, Mar 12, 2019 at 11:47:04AM +0100, Michel Dänzer wrote:
>>>>>>>>> On 2019-03-11 6:42 p.m., Noralf Trønnes wrote:
>>>>>>>>>> This adds support for outputting kernel messages on panic().
>>>>>>>>>> A kernel message dumper is used to dump the log. The dumper
>>>>>>>>>> iterates
>>>>>>>>>> over each DRM device and it's crtc's to find suitable
>>>>>>>>>> framebuffers.
>>>>>>>>>>
>>>>>>>>>> All the other dumpers are run before this one except mtdoops.
>>>>>>>>>> Only atomic drivers are supported.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Noralf Trønnes <noralf at tronnes.org>
>>>>>>>>>> ---
>>>>>>>>>>     [...]
>>>>>>>>>>
>>>>>>>>>> diff --git a/include/drm/drm_framebuffer.h
>>>>>>>>>> b/include/drm/drm_framebuffer.h
>>>>>>>>>> index f0b34c977ec5..f3274798ecfe 100644
>>>>>>>>>> --- a/include/drm/drm_framebuffer.h
>>>>>>>>>> +++ b/include/drm/drm_framebuffer.h
>>>>>>>>>> @@ -94,6 +94,44 @@ struct drm_framebuffer_funcs {
>>>>>>>>>>                  struct drm_file *file_priv, unsigned flags,
>>>>>>>>>>                  unsigned color, struct drm_clip_rect *clips,
>>>>>>>>>>                  unsigned num_clips);
>>>>>>>>>> +
>>>>>>>>>> +    /**
>>>>>>>>>> +     * @panic_vmap:
>>>>>>>>>> +     *
>>>>>>>>>> +     * Optional callback for panic handling.
>>>>>>>>>> +     *
>>>>>>>>>> +     * For vmapping the selected framebuffer in a panic context.
>>>>>>>>>> Must
>>>>>>>>>> +     * be super careful about locking (only trylocking allowed).
>>>>>>>>>> +     *
>>>>>>>>>> +     * RETURNS:
>>>>>>>>>> +     *
>>>>>>>>>> +     * NULL if it didn't work out, otherwise an opaque cookie
>>>>>>>>>> which is
>>>>>>>>>> +     * passed to @panic_draw_xy. It can be anything: vmap area,
>>>>>>>>>> structure
>>>>>>>>>> +     * with more details, just a few flags, ...
>>>>>>>>>> +     */
>>>>>>>>>> +    void *(*panic_vmap)(struct drm_framebuffer *fb);
>>>>>>>>> FWIW, the panic_vmap hook cannot work in general with the
>>>>>>>>> amdgpu/radeon
>>>>>>>>> drivers:
>>>>>>>>>
>>>>>>>>> Framebuffers are normally tiled, writing to them with the CPU
>>>>>>>>> results in
>>>>>>>>> garbled output.
>>>>>>>>>
>>>>>>> In which case the driver needs to support the ->panic_draw_xy
>>>>>>> callback,
>>>>>>> or maybe it's possible to make a generic helper for tiled buffers.
>>>>>> I'm afraid that won't help, at least not without porting big chunks of
>>>>>> https://gitlab.freedesktop.org/mesa/mesa/tree/master/src/amd/addrlib
>>>>>> into the kernel, none of which will be used for anything else.
>>>>>>
>>>>>>
>>>>>>>>> There would need to be a mechanism for switching scanout to a
>>>>>>>>> linear,
>>>>>>>>> CPU accessible framebuffer.
>>>>>>>> I suppose panic_vmap() could just provide a linear temp buffer
>>>>>>>> to the panic handler, and panic_unmap() could copy the contents
>>>>>>>> over to the real fb.
>>>>>> Copy how? Using a GPU engine?
>>>>> CPU maybe? Though I suppose that won't work if the buffer isn't CPU
>>>>> accesible :/
>>>> Well we do have a debug path for accessing invisible memory with the
>>>> CPU.
>>>>
>>>> E.g. three registers: DATA and auto increment OFFSET_LO/HI. So you can
>>>> just read/write DATA over and over again if you want to access some
>>>> memory.
>>> Right. I assume that'll be very slow, but I guess it could do when the
>>> memory isn't directly CPU accessible.
>> Just made a quick test and reading 33423360 bytes (4096x2040x4) using
>> that interfaces takes about 13 seconds.
>>
>> IIRC we don't use the auto increment optimization yet, so that can
>> probably be improved by a factor of 3 or more.
>>
>>>> But turning of tilling etc is still extremely tricky when the system is
>>>> already unstable.
>>> Maybe we could add a little hook to the display code, which just
>>> disables tiling for scanout and maybe disables non-primary planes, but
>>> doesn't touch anything else. Harry / Nicholas, does that seem feasible?
>>>
>>>
>>> I'm coming around from "this is never going to work" to "it might
>>> actually work" with our hardware...
>> Yeah, agree. It's a bit tricky, but doable.
> A "disable_tiling" hook or something along those lines could work for
> display. It's a little bit non trivial when you want to start dealing
> with locking and any active DRM commits, but we have a global lock
> around all our hardware programming anyway that makes that easier to
> deal with.
>
> I think we can just re-commit and update the existing hardware state
> with only the tiling info for every plane reset to off. For most buffers
> I don't think we'd have to really consider changing anything else here
> as long as you respect the current FB size and pitch.

That sounds like it is also already to invasive.

Keep in mind that all other CPU cores and processes/threads are dead and 
you can be in any context you want (e.g. NMI).

In other words we can't take any locks whatsoever and we need to keep 
the hardware interaction as minimal as possible.

Christian.

>
> Nicholas Kazlauskas
>
>> Takeaway for Noralf is that this whole vmap on panic won't even remotely
>> work. We need to get the data byte by byte without a page mapping if
>> that is ever going to fly.
>>
>> Christian.
>>
>>>