[Intel-gfx] [PATCH v2] drm/i915: Don't dump umpteen thousand requests

Tue Apr 24 09:33:46 UTC 2018

Quoting Tvrtko Ursulin (2018-04-24 10:27:23)
> 
> On 24/04/2018 09:16, Chris Wilson wrote:
> > If we have more than a few, possibly several thousand request in the
> > queue, don't show the central portion, just the first few and the last
> > being executed and/or queued. The first few should be enough to help
> > identify a problem in execution, and most often comparing the first/last
> > in the queue is enough to identify problems in the scheduling.
> > 
> > We may need some fine tuning to set MAX_REQUESTS_TO_SHOW for common
> > debug scenarios, but for the moment if we can avoiding spending more
> > than a few seconds dumping the GPU state that will avoid a nasty
> > livelock (where hangcheck spends so long dumping the state, it fires
> > again and starts to dump the state again in parallel, ad infinitum).
> > 
> > v2: Remember to print last not the stale rq iter after the loop.
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/intel_engine_cs.c | 43 +++++++++++++++++++++++---
> >   1 file changed, 38 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> > index 66cddd059666..2398ea71e747 100644
> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> > @@ -1307,11 +1307,13 @@ void intel_engine_dump(struct intel_engine_cs *engine,
> >                      struct drm_printer *m,
> >                      const char *header, ...)
> >   {
> > +     const int MAX_REQUESTS_TO_SHOW = 8;
> >       struct intel_breadcrumbs * const b = &engine->breadcrumbs;
> >       const struct intel_engine_execlists * const execlists = &engine->execlists;
> >       struct i915_gpu_error * const error = &engine->i915->gpu_error;
> > -     struct i915_request *rq;
> > +     struct i915_request *rq, *last;
> >       struct rb_node *rb;
> > +     int count;
> >   
> >       if (header) {
> >               va_list ap;
> > @@ -1378,16 +1380,47 @@ void intel_engine_dump(struct intel_engine_cs *engine,
> >       }
> >   
> >       spin_lock_irq(&engine->timeline->lock);
> > -     list_for_each_entry(rq, &engine->timeline->requests, link)
> > -             print_request(m, rq, "        E ");
> > +
> > +     last = NULL;
> > +     count = 0;
> > +     list_for_each_entry(rq, &engine->timeline->requests, link) {
> > +             if (count++ < MAX_REQUESTS_TO_SHOW - 1)
> > +                     print_request(m, rq, "        E ");
> > +             else
> > +                     last = rq;
> 
> else {
>         last = list_last_entry(...) ?
>         break;
> }
> 
> > +     }
> > +     if (last) {
> > +             if (count > MAX_REQUESTS_TO_SHOW) {
> > +                     drm_printf(m,
> > +                                "        ...skipping %d executing requests...\n",
> > +                                count - MAX_REQUESTS_TO_SHOW);
> > +             }
> > +             print_request(m, last, "        E ");
> > +     }
> 
> Or even stuff this printf in the first loop above, under the else 
> branch. Maybe shorter would be like this, module off by ones if I made some:
> 
> list_for_each_entry(rq, &engine->timeline->requests, link) {
>         struct i915_request *pr = rq;
> 
>         if (++count > MAX_REQUESTS_TO_SHOW) {
>                 pr = list_last_entry(...);
>                 drm_printf(m,
>                            "        ...skipping %d executing requests...\n",
>                            count - MAX_REQUESTS_TO_SHOW);
>         }
>         
>         print_request(m, pr, "        E ");
>         
>         if (count > MAX_REQUESTS_TO_SHOW)
>                 break;
> }
> 
> > +
> > +     last = NULL;
> > +     count = 0;
> >       drm_printf(m, "        Queue priority: %d\n", execlists->queue_priority);
> >       for (rb = execlists->first; rb; rb = rb_next(rb)) {
> >               struct i915_priolist *p =
> >                       rb_entry(rb, typeof(*p), node);
> >   
> > -             list_for_each_entry(rq, &p->requests, sched.link)
> > -                     print_request(m, rq, "        Q ");
> > +             list_for_each_entry(rq, &p->requests, sched.link) {
> > +                     if (count++ < MAX_REQUESTS_TO_SHOW - 1)
> > +                             print_request(m, rq, "        Q ");
> > +                     else
> > +                             last = rq;
> > +             }
> >       }
> > +     if (last) {
> > +             if (count > MAX_REQUESTS_TO_SHOW) {
> > +                     drm_printf(m,
> > +                                "        ...skipping %d queued requests...\n",
> > +                                count - MAX_REQUESTS_TO_SHOW);
> > +             }
> > +             print_request(m, last, "        Q ");
> > +     }
> 
> Then I am thinking how to avoid the duplication and extract the smart 
> printer. Macro would be easy at least, if a bit ugly.

Yeah, and for the moment, I'd like to keep the duplication obvious and
keep the two passes very similar.

> Another idea comes to mind, but probably for the future, to merge same 
> prio, context and strictly consecutive seqnos to a single line of output 
> like:
> 
>   [prefix]seqno-seqno [%llx:seqno-seqno] (%u consecutive requests)
> 
> Give or take, but it will be more involved to do that.

Deciding when rq are equivalent will get messy, and going to drift.

> 
> > +
> >       spin_unlock_irq(&engine->timeline->lock);
> >   
> >       spin_lock_irq(&b->rb_lock);
> > 
> 
> Looks OK in general, just please see if you like my idea to contain the 
> logic within a single loop and maybe even move it to a macro.

I don't, because it missed one important thing, the count of skipped
requests. :-p
-Chris