[Intel-gfx] [PATCH v8 1/6] drm/i915: Add per context timelines to fence object

Wed May 18 13:20:18 UTC 2016

Op 18-05-16 om 14:49 schreef John Harrison:
> On 18/05/2016 13:22, Maarten Lankhorst wrote:
>> Op 13-05-16 om 11:16 schreef John Harrison:
>>> On 13/05/2016 08:39, Chris Wilson wrote:
>>>> On Thu, May 12, 2016 at 10:06:31PM +0100, John.C.Harrison at Intel.com wrote:
>>>>> From: John Harrison <John.C.Harrison at Intel.com>
>>>>>
>>>>> The fence object used inside the request structure requires a sequence
>>>>> number. Although this is not used by the i915 driver itself, it could
>>>>> potentially be used by non-i915 code if the fence is passed outside of
>>>>> the driver. This is the intention as it allows external kernel drivers
>>>>> and user applications to wait on batch buffer completion
>>>>> asynchronously via the dma-buff fence API.
>>>>>
>>>>> To ensure that such external users are not confused by strange things
>>>>> happening with the seqno, this patch adds in a per context timeline
>>>>> that can provide a guaranteed in-order seqno value for the fence. This
>>>>> is safe because the scheduler will not re-order batch buffers within a
>>>>> context - they are considered to be mutually dependent.
>>>>>
>>>>> v2: New patch in series.
>>>>>
>>>>> v3: Renamed/retyped timeline structure fields after review comments by
>>>>> Tvrtko Ursulin.
>>>>>
>>>>> Added context information to the timeline's name string for better
>>>>> identification in debugfs output.
>>>>>
>>>>> v5: Line wrapping and other white space fixes to keep style checker
>>>>> happy.
>>>>>
>>>>> v7: Updated to newer nightly (lots of ring -> engine renaming).
>>>>>
>>>>> v8: Moved to earlier in patch series so no longer needs to remove the
>>>>> quick hack timeline that was being added before.
>>>>>
>>>>> For: VIZ-5190
>>>>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>> Cc: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/i915/i915_drv.h         | 14 ++++++++++++
>>>>>    drivers/gpu/drm/i915/i915_gem.c         | 40 +++++++++++++++++++++++++++++++++
>>>>>    drivers/gpu/drm/i915/i915_gem_context.c | 14 ++++++++++++
>>>>>    drivers/gpu/drm/i915/intel_lrc.c        |  8 +++++++
>>>>>    4 files changed, 76 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>>> index d5496ab..520480b 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>>> @@ -818,6 +818,19 @@ struct i915_ctx_hang_stats {
>>>>>        bool banned;
>>>>>    };
>>>>>    +struct i915_fence_timeline {
>>>>> +    char        name[32];
>>>>> +    unsigned    fence_context;
>>>>> +    unsigned    next;
>>>>> +
>>>>> +    struct intel_context *ctx;
>>>>> +    struct intel_engine_cs *engine;
>>>>> +};
>>>>> +
>>>>> +int i915_create_fence_timeline(struct drm_device *dev,
>>>>> +                   struct intel_context *ctx,
>>>>> +                   struct intel_engine_cs *ring);
>>>>> +
>>>>>    /* This must match up with the value previously used for execbuf2.rsvd1. */
>>>>>    #define DEFAULT_CONTEXT_HANDLE 0
>>>>>    @@ -869,6 +882,7 @@ struct intel_context {
>>>>>            u64 lrc_desc;
>>>>>            uint32_t *lrc_reg_state;
>>>>>            bool initialised;
>>>>> +        struct i915_fence_timeline fence_timeline;
>>>> This is still fundamentally wrong. This i915_fence_timeline is both the
>>>> fence context and timeline. Our timelines are singular per vm, with a
>>>> fence context under each timeline per engine.
>>> As I said last time you mentioned 'per vm', please explain. Where does a virtual machine come in? Or is that not what you mean? What is the purpose of having multiple fence contexts within a single timeline? It will stop external uses attempting to combine fences but it won't stop them from actually being out of order. The timeline needs to be per hardware context not simply per engine because the whole point is that requests should not be re-ordered once they have been allocated a point on a timeline. However, the purpose of the scheduler (which is what this series is preparation for) is to re-order requests between contexts. Thus the timeline must be per context to prevent requests ending up with out of order completions.
>> Virtual memory, not machine. Guessing i915_address_space :-)
> How do you have a timeline per 'virtual memory'? Guessing is pointless when trying to get patches written and accepted. A design is not complete when it is based on guesses about what the important stakeholders mean.
>
>> Fence contexts may not have to be bound to a single timeline,
> Fence contexts absolutely have to be bound to a single timeline. The context is used to say whether any two fences can be combined by numerical comparison of their timeline points. If they share a context but have points generated from different timelines then any such comparison is broken. The fence context is the timeline!
>
>>   lets say you have 2 engines, running commands for Com1 and Com2.
>>
>> ENG1: Com1.1
>> ENG2: Com2.1 Com1.2 Com1.3
>>
>> then there's no harm if you end up with multiple fence contexts since they're independently executing.
> Not sure what you are meaning here. If Com1.1 and Com1.2 are sharing fence context '1' and have timeline values '1' and '2' then the system is broken. The fence framework could merge the two by simply discarding Com1.1 because '2' is greater than '1' (timeline value) and '1' is equal to '1' (context). Thus a user would end up waiting on only Com1.2, however, being executed concurrently on different engines means that Com1.2 might actually complete before Com1.1. And hence the user is broken as it now thinks everything is finished when Engine1 is still running its work. 
Indeed. Not completely sure what the original comment meant here.