[Intel-gfx] [PATCH 25/55] drm/i915: Update i915_gem_object_sync() to take a request structure

Thu Jun 18 09:16:15 PDT 2015

On 18/06/2015 16:39, Chris Wilson wrote:
> On Thu, Jun 18, 2015 at 04:24:53PM +0200, Daniel Vetter wrote:
>> On Thu, Jun 18, 2015 at 01:59:13PM +0100, John Harrison wrote:
>>> On 18/06/2015 13:21, Chris Wilson wrote:
>>>> On Thu, Jun 18, 2015 at 01:14:56PM +0100, John.C.Harrison at Intel.com wrote:
>>>>> From: John Harrison <John.C.Harrison at Intel.com>
>>>>>
>>>>> The plan is to pass requests around as the basic submission tracking structure
>>>>> rather than rings and contexts. This patch updates the i915_gem_object_sync()
>>>>> code path.
>>>>>
>>>>> v2: Much more complex patch to share a single request between the sync and the
>>>>> page flip. The _sync() function now supports lazy allocation of the request
>>>>> structure. That is, if one is passed in then that will be used. If one is not,
>>>>> then a request will be allocated and passed back out. Note that the _sync() code
>>>>> does not necessarily require a request. Thus one will only be created until
>>>>> certain situations. The reason the lazy allocation must be done within the
>>>>> _sync() code itself is because the decision to need one or not is not really
>>>>> something that code above can second guess (except in the case where one is
>>>>> definitely not required because no ring is passed in).
>>>>>
>>>>> The call chains above _sync() now support passing a request through which most
>>>>> callers passing in NULL and assuming that no request will be required (because
>>>>> they also pass in NULL for the ring and therefore can't be generating any ring
>>>>> code).
>>>>>
>>>>> The exeception is intel_crtc_page_flip() which now supports having a request
>>>>> returned from _sync(). If one is, then that request is shared by the page flip
>>>>> (if the page flip is of a type to need a request). If _sync() does not generate
>>>>> a request but the page flip does need one, then the page flip path will create
>>>>> its own request.
>>>>>
>>>>> v3: Updated comment description to be clearer about 'to_req' parameter (Tomas
>>>>> Elf review request). Rebased onto newer tree that significantly changed the
>>>>> synchronisation code.
>>>>>
>>>>> v4: Updated comments from review feedback (Tomas Elf)
>>>>>
>>>>> For: VIZ-5115
>>>>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>>>>> Reviewed-by: Tomas Elf <tomas.elf at intel.com>
>>>>> ---
>>>>>   drivers/gpu/drm/i915/i915_drv.h            |    4 ++-
>>>>>   drivers/gpu/drm/i915/i915_gem.c            |   48 +++++++++++++++++++++-------
>>>>>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |    2 +-
>>>>>   drivers/gpu/drm/i915/intel_display.c       |   17 +++++++---
>>>>>   drivers/gpu/drm/i915/intel_drv.h           |    3 +-
>>>>>   drivers/gpu/drm/i915/intel_fbdev.c         |    2 +-
>>>>>   drivers/gpu/drm/i915/intel_lrc.c           |    2 +-
>>>>>   drivers/gpu/drm/i915/intel_overlay.c       |    2 +-
>>>>>   8 files changed, 57 insertions(+), 23 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>>> index 64a10fa..f69e9cb 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>>> @@ -2778,7 +2778,8 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
>>>>>   int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
>>>>>   int i915_gem_object_sync(struct drm_i915_gem_object *obj,
>>>>> -			 struct intel_engine_cs *to);
>>>>> +			 struct intel_engine_cs *to,
>>>>> +			 struct drm_i915_gem_request **to_req);
>>>> Nope. Did you forget to reorder the code to ensure that the request is
>>>> allocated along with the context switch at the start of execbuf?
>>>> -Chris
>>>>
>>> Not sure what you are objecting to? If you mean the lazily allocated request
>>> then that is for page flip code not execbuff code. If we get here from an
>>> execbuff call then the request will definitely have been allocated and will
>>> be passed in. Whereas the page flip code may or may not require a request
>>> (depending on whether MMIO or ring flips are in use. Likewise the sync code
>>> may or may not require a request (depending on whether there is anything to
>>> sync to or not). There is no point allocating and submitting an empty
>>> request in the MMIO/idle case. Hence the sync code needs to be able to use
>>> an existing request or create one if none already exists.
>> I guess Chris' comment was that if you have a non-NULL to, then you better
>> have a non-NULL to_req. And since we link up reqeusts to the engine
>> they'll run on the former shouldn't be required any more. So either that's
>> true and we can remove the to or we don't understand something yet (and
>> perhaps that should be done as a follow-up).
> I am sure I sent a patch that outlined in great detail how that we need
> only the request parameter in i915_gem_object_sync(), for handling both
> execbuffer, pipelined pin_and_fence and synchronous pin_and_fence.
> -Chris
>

As the driver stands, the page flip code wants to synchronise with the 
framebuffer object but potentially without touching the ring and 
therefore without creating a request. If the synchronisation is a no-op 
(because there are no outstanding operations on the given object) then 
there is no need for a request anywhere in the call chain. Thus there is 
a need to pass in the ring together with an optional request and to be 
able to pass out a request that has been created internally.

 >  if you have a non-NULL to, then you better have a non-NULL to_req

I assume you mean 'a non-NULL *to_req'?

No, that is the whole point. If you have a non-null '*to_req' then 'to' 
must be non-null (and specifically must be the ring that '*to_req' is 
referencing). However, it is valid to have a non-null 'to' and a null 
'*to_req'.  In the case of MMIO flips, the page flip itself does not 
require a request as it does not go through the ring. However, it still 
passes in 'i915_gem_request_get_ring(obj->last_write_req)' as the ring 
to synchronise to. Thus it is potentially passing in a valid to pointer 
but without wanting to pre-allocate a request object. If the 
synchronisation requires writing a semaphore to the ring then a request 
will be created internally and passed back out for the page flip code to 
submit (and to re-use in the case of non-MMIO flips). But if the 
synchronisation is a no-op then no request ever gets created or 
submitting and nothing touches the ring at all.

John.