[Intel-gfx] [PATCH] RFC: debugobjects: capture stack traces at _init() time

Daniel Vetter daniel.vetter at ffwll.ch
Fri Mar 23 16:26:17 UTC 2018


On Tue, Mar 20, 2018 at 8:44 PM, Thomas Gleixner <tglx at linutronix.de> wrote:
> On Tue, 20 Mar 2018, Daniel Vetter wrote:
>
>> Sometimes it's really easy to know which object has gone boom and
>> where the offending code is, and sometimes it's really hard. One case
>> we're trying to hunt down is when module unload catches a live debug
>> object, with a module with lots of them.
>>
>> Capture a full stack trace from debug_object_init() and dump it when
>> there's a problem.
>>
>> FIXME: Should we have a separate Kconfig knob for the backtraces,
>> they're quite expensive? Atm I'm just selecting it for the general
>> debug object stuff.
>
> Just make it boot time enabled. We really want to be able to switch it off.
>
>> +#define ODEBUG_STACKDEPTH 32
>
> Do we really need it that deep? I doubt that.

Entirely arbitrary, I just needed this to start hunting a rare crash
we're seeing in CI and stitched something together. I agree we
probably don't need that much.

>> +static noinline depot_stack_handle_t save_stack(struct debug_obj *obj)
>> +{
>> +     unsigned long entries[ODEBUG_STACKDEPTH];
>> +     struct stack_trace trace = {
>> +             .entries = entries,
>> +             .max_entries = ODEBUG_STACKDEPTH,
>> +             .skip = 2
>> +     };
>> +
>> +     save_stack_trace(&trace);
>> +     if (trace.nr_entries != 0 &&
>> +         trace.entries[trace.nr_entries-1] == ULONG_MAX)
>> +             trace.nr_entries--;
>> +
>> +     /* May be called under spinlock, so avoid sleeping */
>> +     return depot_save_stack(&trace, GFP_NOWAIT);
>
> I really don't like the idea of unconditional allocations in that code
> path. We went great length to make it perform halfways sane with the
> pool. Though we should actually have per cpu pools to avoid the lock
> contention, but thats a different problem.
>
> I'd rather make the storage a fixed size allocation which is appended
> to the debug object. Something like this:
>
> struct debug_obj_trace {
>         struct stack_trace      trace;
>         unsigned long           entries[ODEBUG_STACKDEPTH];
> };
>
> struct debug_obj {
>         unsigned int            astate;
>         void                    *object;
>         struct debug_obj_descr  *descr;
>         struct debug_obj_trace  trace[0];
> };
>
> void __init debug_objects_mem_init(void)
> {
>         unsigned long objsize = sizeof (struct debug_obj);
>
>         if (!debug_objects_enabled)
>                 return;
>
>         if (debug_objects_trace_stack)
>                 objsize += sizeof(struct debug_obj_trace);
>
>         obj_cache = kmem_cache_create("debug_objects_cache", objsize, 0,
>                                       SLAB_DEBUG_OBJECTS, NULL);
>
> __debug_object_init(void *addr, struct debug_obj_descr *descr, int onstack)
> {
>         ....
>         case ODEBUG_STATE_NONE:
>         case ODEBUG_STATE_INIT:
>         case ODEBUG_STATE_INACTIVE:
>                 if (debug_object_trace_stack) {
>                         obj->trace[0].trace.entries = obj->trace[0].entries;
>                         save_stack_trace(&trace[0].trace);
>                 }
>
> That means we need to size the static objects which are used during early
> boot with stack under the assumption that stack tracing is enabled, but
> that's simple enough.
>
> Hmm?

Yeah looks reasonable, and gives us an easy Kconfig/boot-time option
to enable/disable it. I'm a bit swamped, but I'll try to respin.
Thanks very much for the look and suggesting a cleaner integration
approach - the allocations and recursion into the stack depot really
did not result in clean code.

Thanks, Daniel

> Thanks,
>
>         tglx



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the Intel-gfx mailing list