[Intel-gfx] [PATCH v2 1/3] drm/i915: Record the ringbuffer associated with the request

Tue Dec 15 08:53:13 PST 2015

On 14/12/15 11:28, Chris Wilson wrote:
> On Mon, Dec 14, 2015 at 11:14:31AM +0000, Dave Gordon wrote:
>> On 11/12/15 22:59, Chris Wilson wrote:
>>> The request tells us where to read the ringbuf from, so use that
>>> information to simplify the error capture. If no request was active at
>>> the time of the hang, the ring is idle and there is no information
>>> inside the ring pertaining to the hang.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> ---
>>>   drivers/gpu/drm/i915/i915_gpu_error.c | 29 ++++++++++-------------------
>>>   1 file changed, 10 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>>> index 3e137fc701cf..6eefe9c36931 100644
>>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>>> @@ -995,7 +995,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
>>>
>>>   	for (i = 0; i < I915_NUM_RINGS; i++) {
>>>   		struct intel_engine_cs *ring = &dev_priv->ring[i];
>>> -		struct intel_ringbuffer *rbuf;
>>> +		struct intel_ringbuffer *rbuf = NULL;
>>>
>>>   		error->ring[i].pid = -1;
>>>
>>> @@ -1039,26 +1039,17 @@ static void i915_gem_record_rings(struct drm_device *dev,
>>>   				}
>>>   				rcu_read_unlock();
>>>   			}
>>> +
>>> +			rbuf = request->ringbuf;
>>>   		}
>>>
>>> -		if (i915.enable_execlists) {
>>> -			/* TODO: This is only a small fix to keep basic error
>>> -			 * capture working, but we need to add more information
>>> -			 * for it to be useful (e.g. dump the context being
>>> -			 * executed).
>>> -			 */
>>> -			if (request)
>>> -				rbuf = request->ctx->engine[ring->id].ringbuf;
>>> -			else
>>> -				rbuf = ring->default_context->engine[ring->id].ringbuf;
>>> -		} else
>>> -			rbuf = ring->buffer;
>>> -
>>> -		error->ring[i].cpu_ring_head = rbuf->head;
>>> -		error->ring[i].cpu_ring_tail = rbuf->tail;
>>> -
>>> -		error->ring[i].ringbuffer =
>>> -			i915_error_ggtt_object_create(dev_priv, rbuf->obj);
>>> +		if (rbuf) {
>>> +			error->ring[i].cpu_ring_head = rbuf->head;
>>> +			error->ring[i].cpu_ring_tail = rbuf->tail;
>>> +			error->ring[i].ringbuffer =
>>> +				i915_error_ggtt_object_create(dev_priv,
>>> +							      rbuf->obj);
>>> +		}
>>>
>>>   		error->ring[i].hws_page =
>>>   			i915_error_ggtt_object_create(dev_priv, ring->status_page.obj);
>>
>> I think the code you deleted is intended to capture the *default*
>> ringbuffer if there is no request active -- sometimes we will switch
>> an engine to the default context (and therefore ringbuffer) when
>> there's no work to be done.
>
> So answer the question, why? I don't have a use for it. This code in
> particular is nothing more than a hack for execlists and in no way
> reflects my intentions for the postmortem debugging tool.
>
>> Another option that's sometimes useful is to capture the ringbuffer
>> pointed to by the START register. This helps in finding situations
>> where the driver and the GPU disagree about what should be in
>> progress.
>
> That is a possibitly, except is no more interesting than inspecting the
> START vs expected (and requires the stop_machine rework to walk the
> lists without crashing).
>
>> I've got a few patches that update some of the error capture that's
>> always been missing in execlist mode (like, actually capturing the
>> active context), and add some more decoding of what we do capture.
>
> No decoding. That is easier done in userspace. And I sent patches to do
> more capturing many, many months ago, along with fixing up most of the
> invalid ppgtt state.
> -Chris

Anyway, the removal of the unnecessary execlist/non-execlist is 
worthwhile, so

Reviewed-by: Dave Gordon <david.s.gordon at intel.com>

and then maybe I'll rework the default and/or START capture on top of 
this later.

.Dave.