[Intel-gfx] [RFC 03/11] drm/i915: Add reset stats entry point for per-engine reset.

Tue Jun 16 08:54:14 PDT 2015

On 16/06/2015 14:49, Daniel Vetter wrote:
> On Mon, Jun 08, 2015 at 06:03:21PM +0100, Tomas Elf wrote:
>> In preparation for per-engine reset add way for setting context reset stats.
>>
>> OPEN QUESTIONS:
>> 1. How do we deal with get_reset_stats and the GL robustness interface when
>> introducing per-engine resets?
>>
>> 	a. Do we set context that cause per-engine resets as guilty? If so, how
>> 	does this affect context banning?
>>
>> 	b. Do we extend the publically available reset stats to also contain
>> 	per-engine reset statistics? If so, would this break the ABI?
>>
>> Signed-off-by: Tomas Elf <tomas.elf at intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h |    2 ++
>>   drivers/gpu/drm/i915/i915_gem.c |   14 ++++++++++++++
>>   2 files changed, 16 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 47be4a5..ab5dfdc 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2781,6 +2781,8 @@ static inline bool i915_stop_ring_allow_warn(struct drm_i915_private *dev_priv)
>>   }
>>
>>   void i915_gem_reset(struct drm_device *dev);
>> +void i915_gem_reset_engine(struct drm_i915_private *dev_priv,
>> +			   struct intel_engine_cs *engine);
>>   bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
>>   int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);
>>   int __must_check i915_gem_init(struct drm_device *dev);
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 8ce363a..4c88e5c 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2676,6 +2676,20 @@ void i915_gem_reset(struct drm_device *dev)
>>   	i915_gem_restore_fences(dev);
>>   }
>>
>> +void i915_gem_reset_engine(struct drm_i915_private *dev_priv,
>> +			   struct intel_engine_cs *engine)
>> +{
>> +	u32 completed_seqno;
>> +	struct drm_i915_gem_request *req;
>> +
>> +	completed_seqno = engine->get_seqno(engine, false);
>> +
>> +	/* Find pending batch buffers and mark them as such  */
>> +	list_for_each_entry(req, &engine->request_list, list)
>> +	        if (req && (req->seqno > completed_seqno))
>> +	                i915_set_reset_status(dev_priv, req->ctx, false);
>> +}
>
> Please don't add dead code since it makes it impossible to review the
> patch without peeking ahead. And that makes the split-up useless - the
> point of splitting patches it to make review easier by presenting
> logically self-contained small changes, not to evenly spread out changes
> across a lot of mails.
> -Daniel

I did actually split this out from the main TDR patch (drm/i915: Adding 
TDR / per-engine reset support for gen8) by mistake. But since this is a 
RFC series, which I thought had the purpose of acting as material for a 
design discussion rather than serving as an actual code submission, I 
didn't spend too much time thinking about splitting the patch series 
into sensible chunks. If that is a problem I would expect people to take 
issue with the fact that e.g. the main TDR patch is a huge, monolithic 
chunk of code spanning more than 2000 lines. Obviously, that will be 
subdivided sensibly at a later stage and the code in this patch mail 
will be properly associated with the calling code.

Is it ok if we leave the patch subdivision discussion to after the 
initial RFC stage or how do these things typically work at this point in 
the process?

Thanks,
Tomas

>