[Intel-gfx] [PATCH 14/15] drm/i915/guc: Keep the execbuf client allocated across reset

Mon Nov 28 15:44:41 UTC 2016

On 28/11/2016 14:11, Chris Wilson wrote:
> On Mon, Nov 28, 2016 at 01:49:03PM +0000, Tvrtko Ursulin wrote:
>>
>> On 25/11/2016 09:30, Chris Wilson wrote:
>>> In order to avoid some complexity in trying to reconstruct the
>>> workqueues across reset, remember them instead. The issue comes when we
>>> have to handle a reset between request allocation and submission, the
>>> request has reserved space in the wq, but is not in any list so we fail
>>> to restore the reserved space. By keeping the execbuf client intact
>>> across the reset, we also keep the reservations.
>>
>> I lost track a bit on why do we need to reserve the space at request
>> creation time? Is it not becoming a bit cumbersome?
>
> It is very, very hard to handle a failure. We have to be careful not to
> alter global state prior to executing the request, or at least
> submitting the request, which we are currently not. Since we can't
> unwind the global state changes, that imposes a point-of-no-return on
> request construction after which, the request must be submitted. (It is
> possible though to detect when a request doesn't make any global state
> changes and drop the request on add.) As the reservation may fail, we
> have to do that early.

We couldn't just not do any of the ring buffer writing at execbuf time, 
just add it to the appropriate timeline and do all of that later, when 
the scheduler decides it is time to actually submit it?

>>> @@ -883,8 +877,13 @@ guc_client_alloc(struct drm_i915_private *dev_priv,
>>>
>>> 	guc_proc_desc_init(guc, client);
>>> 	guc_ctx_desc_init(guc, client);
>>> -	if (guc_init_doorbell(guc, client, db_id))
>>> -		goto err;
>>> +
>>> +	/* For runtime client allocation we need to enable the doorbell. Not
>>> +	 * required yet for the static execbuf_client as this special kernel
>>> +	 * client is enabled from i915_guc_submission_enable().
>>> +	 *
>>> +	 * guc_update_doorbell_id(guc, client, db_id);
>>> +	 */
>>
>> What future is the "not yet" part referring to? What are the other clients?
>
> No idea. The code is designed around the premise that users could
> allocate guc contexts and do direct submission on their private
> channels. That may be more appropriate in a bufferless world, but not
> yet.
>
>>> 	DRM_DEBUG_DRIVER("new priority %u client %p for engine(s) 0x%x: ctx_index %u\n",
>>> 		priority, client, client->engines, client->ctx_index);
>>> @@ -1484,6 +1483,9 @@ int i915_guc_submission_init(struct drm_i915_private *dev_priv)
>>> 	struct intel_guc *guc = &dev_priv->guc;
>>> 	struct i915_vma *vma;
>>>
>>> +	if (!HAS_GUC_SCHED(dev_priv))
>>> +		return 0;
>>
>> Why did you have to add this hunk? I think this function does not
>> get called unless there is a GuC.
>
> I too thought that it would not called without a guc.

But it is or what?

>>> int i915_guc_submission_enable(struct drm_i915_private *dev_priv)
>>> {
>>> 	struct intel_guc *guc = &dev_priv->guc;
>>> -	struct drm_i915_gem_request *request;
>>> -	struct i915_guc_client *client;
>>> +	struct i915_guc_client *client = guc->execbuf_client;
>>> 	struct intel_engine_cs *engine;
>>> 	enum intel_engine_id id;
>>>
>>> -	/* client for execbuf submission */
>>> -	client = guc_client_alloc(dev_priv,
>>> -				  INTEL_INFO(dev_priv)->ring_mask,
>>> -				  GUC_CTX_PRIORITY_KMD_NORMAL,
>>> -				  dev_priv->kernel_context);
>>> -	if (!client) {
>>> -		DRM_ERROR("Failed to create normal GuC client!\n");
>>> -		return -ENOMEM;
>>> -	}
>>> +	if (!client)
>>> +		return -ENODEV;
>>>
>>> -	guc->execbuf_client = client;
>>> +	guc_reset_wq(client);
>>> 	host2guc_sample_forcewake(guc, client);
>>> 	guc_init_doorbell_hw(guc);
>>>
>>> 	/* Take over from manual control of ELSP (execlists) */
>>> 	for_each_engine(engine, dev_priv, id) {
>>> +		struct drm_i915_gem_request *rq;
>>> +
>>> 		engine->submit_request = i915_guc_submit;
>>> 		engine->schedule = NULL;
>>>
>>> 		/* Replay the current set of previously submitted requests */
>>> -		list_for_each_entry(request,
>>> -				    &engine->timeline->requests, link) {
>>> +		list_for_each_entry(rq, &engine->timeline->requests, link) {
>>> 			client->wq_rsvd += sizeof(struct guc_wq_item);
>>> -			if (i915_sw_fence_done(&request->submit))
>>> -				i915_guc_submit(request);
>>
>> i915_sw_fence_done check is not needed because only submit-ready
>> requests can be on the engine timeline?
>
> More so, only requests that have not only passed the submit fence but
> also have the execute fence signaled can be on the engine/execution
> timeline. (Distinction is more interesting when the sw scheduler delays
> the execute.)

Hm yes, that is correct.

Patch itself looked fine to me. It was just complexity which made me 
wonder if we couldn't have taken a different route. But that was long 
time ago so not relevant to this patch anyway.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Regards,

Tvrtko