[Intel-gfx] [PATCH 5/5] drm/i915: s/seqno/request/ tracking inside objects

Tue Sep 2 12:06:29 CEST 2014

Hello,

Is this patch going to be split up into more manageable pieces? I tried 
to apply it to a tree fetched yesterday and got a very large number of 
conflicts. I don't know whether that is because more execlist patches 
have been merged or if it is other random changes that have broken it or 
if I am just missing earlier patches in the set.

The patch has been sent with subjects of '[PATCH]', '[PATCH 5/5]' and 
'[PATCH 3/3]'. However, all three emails seem to be the same humongous 
single part patch and I can't find any 0/3, 4/5, etc. emails. Am I 
missing some prep work patches without which the final monster patch is 
never going to apply?

Thanks,
John.

On 27/08/2014 11:39, Chris Wilson wrote:
> On Wed, Aug 27, 2014 at 11:55:34AM +0200, Daniel Vetter wrote:
>> On Tue, Aug 12, 2014 at 08:05:51PM +0100, Chris Wilson wrote:
>>> At the heart of this change is that the seqno is a too low level of an
>>> abstraction to handle the growing complexities of command tracking, both
>>> with the introduction of multiple command queues with execbuffer and the
>>> potential for reordering with a scheduler. On top of the seqno we have
>>> the request. Conceptually this is just a fence, but it also has
>>> substantial bookkeeping of its own in order to track the context and
>>> batch in flight, for example. It is the central structure upon which we
>>> can extend with dependency tracking et al.
>>>
>>> As regards the objects, they were using the seqno as a simple fence,
>>> upon which is check or even wait upon for command completion. This patch
>>> exchanges that seqno/ring pair with the request itself. For the
>>> majority, lifetime of the request is ordered by how we retire objects
>>> then requests. However, both the unlocked waits and probing elsewhere do
>>> not tie into the normal request lifetimes and so we need to introduce a
>>> kref. Extending the objects to use the request as the fence naturally
>>> extends to segregrating read/write fence tracking. This has significance
>>> for it reduces the number of semaphores we need to emit, reducing the
>>> likelihood of #54226, and improving performance overall.
>>>
>>> v2: Rebase and split out the othogonal tweaks.
>>>
>>> A silly happened with this patch. It seemed to nullify our earlier
>>> seqno-vs-interrupt w/a. I could not spot why, but gen6+ started to fail
>>> with missed interrupts (a good test of our robustness handling). So I
>>> ripped out the existing ACTHD read and replaced it with a RING_HEAD to
>>> manually check whether the request is complete. That also had the nice
>>> consequence of forcing __wait_request() to being the central arbiter of
>>> request completion.
>>>
>>> The keener eyed reviewr will also spot that the reset_counter is moved
>>> into the request simplifing __wait_request() callsites and reducing the
>>> number of atomic reads by virtue of moving the check for a pending GPU
>>> reset to the endpoints of GPU access.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Jesse Barnes <jbarnes at virtuousgeek.org>
>>> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
>>> Cc: Oscar Mateo <oscar.mateo at intel.com>
>>> Cc: Brad Volkin <bradley.d.volkin at intel.com>
>>> Cc: "Kukanova, Svetlana" <svetlana.kukanova at intel.com>
>> So I've tried to split this up and totally failed. Non-complete list of
>> things I didn't manage to untangle:
>>
>> - The mmio flip refactoring.
> Yeah, that's a fairly instrumental part of the patch. It's not
> complicated but it does benefit a lot from using requests to both make
> the decision cleaner and the tracking correct.
>
>> - The overlay request tracking refactoring.
> Again, the api changes allow the code to be compacted.
>
>> - The switch to multiple parallel readers with the resulting cascading
>>    changes all over.
> I thought that was fairly isolated to the gem object. It's glossed over
> in errors/debugfs for simplicity, which deserves to be fixed given a
> compact representation of all the requests, and so would kill the icky
> code to find the "last" request.
>
>> - The missed irq w/a prep changes. It's easy to split out the change to
>>    re-add the rc6 reference and to ditch the ACT_HEAD read, but the commit
>>    message talks about instead reading the RING_HEAD, and I just didn't
>>    spot the changes relevant to that in this big diff. Was probably looking
>>    in the wrong place.
> I did mention that I tried that earlier on on the ml, but missed saying
> that it the forcewake reference didn't unbreak the old w/a in the
> changelog.
>
>> - The move_to_active/retire refactoring. There's a pile of code movement
>>    in there, but I couldn't spot really what's just refactoring and what is
>>    real changed needed for the s/seqno/request/ change.
> move-to() everything since that now operates on the request. retire(), not
> a lot changes there, just the extra requests being tracked and the strict
> lifetime ordering of the reference the object holds onto the requests.
>
>> - De-duping some of the logical_ring_ functions. Spotted because it
>>    conflicted (but was easy to hack around), still this shouldn't really be
>>    part of this.
>> Things I've spotted which could be split out but amount to a decent
>> rewrite of the patch:
>> - Getting at the ring of the last write to an object. Although I guess
>>    without the multi-reader stuff and the pageflip refactoring that would
>>    pretty much disappear.
> Why? Who uses the ring of the last_write request? We compare engines in
> pageflip but that's about it.
>
>> - Probably similar helpers for seqno if we don't switch to parallel writes
>>    in the same patch.
>>
>> Splitting out the renames was easy, but that reduced the diff by less than
>> 5% in size. So didn't help in reviewing the patch at all.
> The actual rename patch is larger than this one (v2).
> -Chris
>