[Intel-gfx] [PATCH v2] drm/i915: Don't dump umpteen thousand requests

Tue Apr 24 09:40:57 UTC 2018

On 24/04/2018 10:33, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-04-24 10:27:23)
>>
>> On 24/04/2018 09:16, Chris Wilson wrote:
>>> If we have more than a few, possibly several thousand request in the
>>> queue, don't show the central portion, just the first few and the last
>>> being executed and/or queued. The first few should be enough to help
>>> identify a problem in execution, and most often comparing the first/last
>>> in the queue is enough to identify problems in the scheduling.
>>>
>>> We may need some fine tuning to set MAX_REQUESTS_TO_SHOW for common
>>> debug scenarios, but for the moment if we can avoiding spending more
>>> than a few seconds dumping the GPU state that will avoid a nasty
>>> livelock (where hangcheck spends so long dumping the state, it fires
>>> again and starts to dump the state again in parallel, ad infinitum).
>>>
>>> v2: Remember to print last not the stale rq iter after the loop.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/intel_engine_cs.c | 43 +++++++++++++++++++++++---
>>>    1 file changed, 38 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
>>> index 66cddd059666..2398ea71e747 100644
>>> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
>>> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
>>> @@ -1307,11 +1307,13 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>>>                       struct drm_printer *m,
>>>                       const char *header, ...)
>>>    {
>>> +     const int MAX_REQUESTS_TO_SHOW = 8;
>>>        struct intel_breadcrumbs * const b = &engine->breadcrumbs;
>>>        const struct intel_engine_execlists * const execlists = &engine->execlists;
>>>        struct i915_gpu_error * const error = &engine->i915->gpu_error;
>>> -     struct i915_request *rq;
>>> +     struct i915_request *rq, *last;
>>>        struct rb_node *rb;
>>> +     int count;
>>>    
>>>        if (header) {
>>>                va_list ap;
>>> @@ -1378,16 +1380,47 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>>>        }
>>>    
>>>        spin_lock_irq(&engine->timeline->lock);
>>> -     list_for_each_entry(rq, &engine->timeline->requests, link)
>>> -             print_request(m, rq, "        E ");
>>> +
>>> +     last = NULL;
>>> +     count = 0;
>>> +     list_for_each_entry(rq, &engine->timeline->requests, link) {
>>> +             if (count++ < MAX_REQUESTS_TO_SHOW - 1)
>>> +                     print_request(m, rq, "        E ");
>>> +             else
>>> +                     last = rq;
>>
>> else {
>>          last = list_last_entry(...) ?
>>          break;
>> }
>>
>>> +     }
>>> +     if (last) {
>>> +             if (count > MAX_REQUESTS_TO_SHOW) {
>>> +                     drm_printf(m,
>>> +                                "        ...skipping %d executing requests...\n",
>>> +                                count - MAX_REQUESTS_TO_SHOW);
>>> +             }
>>> +             print_request(m, last, "        E ");
>>> +     }
>>
>> Or even stuff this printf in the first loop above, under the else
>> branch. Maybe shorter would be like this, module off by ones if I made some:
>>
>> list_for_each_entry(rq, &engine->timeline->requests, link) {
>>          struct i915_request *pr = rq;
>>
>>          if (++count > MAX_REQUESTS_TO_SHOW) {
>>                  pr = list_last_entry(...);
>>                  drm_printf(m,
>>                             "        ...skipping %d executing requests...\n",
>>                             count - MAX_REQUESTS_TO_SHOW);
>>          }
>>          
>>          print_request(m, pr, "        E ");
>>          
>>          if (count > MAX_REQUESTS_TO_SHOW)
>>                  break;
>> }
>>
>>> +
>>> +     last = NULL;
>>> +     count = 0;
>>>        drm_printf(m, "        Queue priority: %d\n", execlists->queue_priority);
>>>        for (rb = execlists->first; rb; rb = rb_next(rb)) {
>>>                struct i915_priolist *p =
>>>                        rb_entry(rb, typeof(*p), node);
>>>    
>>> -             list_for_each_entry(rq, &p->requests, sched.link)
>>> -                     print_request(m, rq, "        Q ");
>>> +             list_for_each_entry(rq, &p->requests, sched.link) {
>>> +                     if (count++ < MAX_REQUESTS_TO_SHOW - 1)
>>> +                             print_request(m, rq, "        Q ");
>>> +                     else
>>> +                             last = rq;
>>> +             }
>>>        }
>>> +     if (last) {
>>> +             if (count > MAX_REQUESTS_TO_SHOW) {
>>> +                     drm_printf(m,
>>> +                                "        ...skipping %d queued requests...\n",
>>> +                                count - MAX_REQUESTS_TO_SHOW);
>>> +             }
>>> +             print_request(m, last, "        Q ");
>>> +     }
>>
>> Then I am thinking how to avoid the duplication and extract the smart
>> printer. Macro would be easy at least, if a bit ugly.
> 
> Yeah, and for the moment, I'd like to keep the duplication obvious and
> keep the two passes very similar.
>   
>> Another idea comes to mind, but probably for the future, to merge same
>> prio, context and strictly consecutive seqnos to a single line of output
>> like:
>>
>>    [prefix]seqno-seqno [%llx:seqno-seqno] (%u consecutive requests)
>>
>> Give or take, but it will be more involved to do that.
> 
> Deciding when rq are equivalent will get messy, and going to drift.
> 
>>
>>> +
>>>        spin_unlock_irq(&engine->timeline->lock);
>>>    
>>>        spin_lock_irq(&b->rb_lock);
>>>
>>
>> Looks OK in general, just please see if you like my idea to contain the
>> logic within a single loop and maybe even move it to a macro.
> 
> I don't, because it missed one important thing, the count of skipped
> requests. :-p

Doh.. Well, I don't like it, but have no better/easier ideas at the moment.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Regards,

Tvrtko