[Intel-gfx] [PATCH 02/55] drm/i915: Reserve ring buffer space for i915_add_request() commands

Wed Jun 24 10:05:08 PDT 2015

On 24/06/2015 13:45, Daniel Vetter wrote:
> On Wed, Jun 24, 2015 at 01:18:48PM +0100, John Harrison wrote:
>> On 23/06/2015 21:00, Daniel Vetter wrote:
>>> On Tue, Jun 23, 2015 at 04:43:24PM +0100, John Harrison wrote:
>>>> On 23/06/2015 14:24, Daniel Vetter wrote:
>>>>> On Tue, Jun 23, 2015 at 12:38:01PM +0100, John Harrison wrote:
>>>>>> On 22/06/2015 21:12, Daniel Vetter wrote:
>>>>>>> On Fri, Jun 19, 2015 at 05:34:12PM +0100, John.C.Harrison at Intel.com wrote:
>>>>>>>> From: John Harrison <John.C.Harrison at Intel.com>
>>>>>>>>
>>>>>>>> It is a bad idea for i915_add_request() to fail. The work will already have been
>>>>>>>> send to the ring and will be processed, but there will not be any tracking or
>>>>>>>> management of that work.
>>>>>>>>
>>>>>>>> The only way the add request call can fail is if it can't write its epilogue
>>>>>>>> commands to the ring (cache flushing, seqno updates, interrupt signalling). The
>>>>>>>> reasons for that are mostly down to running out of ring buffer space and the
>>>>>>>> problems associated with trying to get some more. This patch prevents that
>>>>>>>> situation from happening in the first place.
>>>>>>>>
>>>>>>>> When a request is created, it marks sufficient space as reserved for the
>>>>>>>> epilogue commands. Thus guaranteeing that by the time the epilogue is written,
>>>>>>>> there will be plenty of space for it. Note that a ring_begin() call is required
>>>>>>>> to actually reserve the space (and do any potential waiting). However, that is
>>>>>>>> not currently done at request creation time. This is because the ring_begin()
>>>>>>>> code can allocate a request. Hence calling begin() from the request allocation
>>>>>>>> code would lead to infinite recursion! Later patches in this series remove the
>>>>>>>> need for begin() to do the allocate. At that point, it becomes safe for the
>>>>>>>> allocate to call begin() and really reserve the space.
>>>>>>>>
>>>>>>>> Until then, there is a potential for insufficient space to be available at the
>>>>>>>> point of calling i915_add_request(). However, that would only be in the case
>>>>>>>> where the request was created and immediately submitted without ever calling
>>>>>>>> ring_begin() and adding any work to that request. Which should never happen. And
>>>>>>>> even if it does, and if that request happens to fall down the tiny window of
>>>>>>>> opportunity for failing due to being out of ring space then does it really
>>>>>>>> matter because the request wasn't doing anything in the first place?
>>>>>>>>
>>>>>>>> v2: Updated the 'reserved space too small' warning to include the offending
>>>>>>>> sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added
>>>>>>>> re-initialisation of tracking state after a buffer wrap to keep the sanity
>>>>>>>> checks accurate.
>>>>>>>>
>>>>>>>> v3: Incremented the reserved size to accommodate Ironlake (after finally
>>>>>>>> managing to run on an ILK system). Also fixed missing wrap code in LRC mode.
>>>>>>>>
>>>>>>>> v4: Added extra comment and removed duplicate WARN (feedback from Tomas).
>>>>>>>>
>>>>>>>> v5: Re-write of wrap handling to prevent unnecessary early wraps (feedback from
>>>>>>>> Daniel Vetter).
>>>>>>> This didn't actually implement what I suggested (wrapping is the worst
>>>>>>> case, hence skipping the check for that is breaking the sanity check) and
>>>>>>> so changed the patch from "correct, but a bit fragile" to broken. I've
>>>>>>> merged the previous version instead.
>>>>>>> -Daniel
>>>>>> I'm confused. I thought your main issue was the early wrapping not the
>>>>>> sanity check. The check is to ensure that the reservation is large enough to
>>>>>> cover all the commands written during request submission. That should not be
>>>>>> affected by whether a wrap occurs or not. Wrapping does not magically add an
>>>>>> extra bunch of dwords to the emit_request() call. Whereas making the check
>>>>>> work with the wrap condition requires adding in extra tracking state of
>>>>>> exactly where the wrap occurred. That is extra code that only exists to
>>>>>> catch something in the very rare case which should already have been caught
>>>>>> in the very common case. I.e. if your reserved size is too small then you
>>>>>> will hit the warning on every batch buffer submission.
>>>>> The problem is that if you allow a wrap in the reserve size then the
>>>>> ringspace requirements are bigger than if you don't wrap. And since the
>>>>> add request is split up into many intel_ring_begin that's possible. Hence
>>>>> if you allow wrapping in the reserved space, then the most important case
>>>>> for the debug check is to make sure that it catches any kind of
>>>>> reservation overflow while wrapping. The not-wrapped case is probably the
>>>>> boring one.
>>>>>
>>>>> And indeed eventually we should overflow since according to your comment
>>>>> the worst case add request on ilk is 136 dwords. And the largest
>>>>> intel_ring_begin in there is 32 dwords, which means at most we'll throw
>>>>> away 31 dwords when wrapping. Which means the 160 dwords of reservation
>>>>> are not enough since we'd need 167 dwords of space for the worst case. But
>>>>> since the space_end debug check was a no-op for the wrapped case you won't
>>>>> catch this one.
>>>> The minimum reservation size in this case is still only 136. The prepare
>>>> code checks for the 32 words actually requested and wraps if necessary. It
>>>> then checks for 136+32 words of space. If that would cause a wrap it will
>>>> then add on the amount of space actually left in the ring and wait for that
>>>> bigger total. That guarantees that it has waited for the 136 at the start of
>>>> the ring. The caller is then free to fill in the 32 words and there is still
>>>> guaranteed to be a minimum of 136 words available (with or without wrapping)
>>>> before any further wait for space is necessary. Thus the add_request() code
>>>> is safe from fear of failure irrespective of where any wrap might occur.
>>>>
>>>>
>>>>> Wrt keeping track of wrapping while the reservation is in use, the
>>>>> following should do that without any need of additional tracking:
>>>>>
>>>>>
>>>>> 	int used_size = ringbuf->tail - ringbuf->reserved_tail;
>>>>>
>>>>> 	if (used_size < 0)
>>>>> 		used_size += ringbuf->size;
>>>>>
>>>>> 	WARN(used_size < ringbuf->reserved_size,
>>>>> 	     "request reserved size too small: %d vs %d!\n",
>>>>> 	     used_size, ringbuf->reserved_size);
>>>>>
>>>>> I was mistaken that you can reuse __intel_ring_space (since that has
>>>>> slightly different requirements), but this gives you a nicely localized
>>>>> check for reservation overflow which works even when you wrap. Ofc it
>>>>> won't work if an add_request is bigger than the entire ring, but that's
>>>>> impossible anyway since we can at most reserve ringbuf->size -
>>>>> I915_RING_FREE_SPACE.
>>>> The problem with the above calculation is that it includes the wasted space
>>>> at the end of the ring. Thus it will complain the reserved size was too
>>>> small when in fact it was just fine.
>>> Ok I again misunderstood your patch a bit since it didn't quite do what I
>>> expect, and I stand corrected that v5 works too. But I still seem to fail
>>> to get my main concern across. I'll see whether I can whip up a patch as a
>>> short demonstration, maybe that helps to unconfuse this dicussion.
>>>
>>> For now I think we're covered with either v4 or v5 so sticking with either
>>> is ok with me.
>>> -Daniel
>> I think v5 is much better. It reduces the ring space wastage which I thought
>> was your main concern.
> Ok with me too - I simply didn't pick it up when merging yesterday because
> I couldn't immediately convince myself it's correct, but really wanted to
> pull in your series. Unfortunately it's now burried below piles of
> patches, so can you please do a delta patch?
Delta patch posted:  '[PATCH] drm/i915: Reserve space improvements'.


>
>> The problem with a more simplistic approach that just doubles the minimum
>> reserve size to ensure that it will fit before or after a wrap is that you
>> are doubling the reserve size. That too is rather wasteful of ring space. It
>> also means that you only find out when the reserve size is too small when
>> you hit the maximum usage coincident with a worst case wrap point. Whereas
>> the v5 method means that you notice a too small reserve whether wrapping or
>> not.
> We don't need to double the reservation since the add_request tail is
> split up into many individual intel_ring_begin. And we'd only need to wrap
> for the largest of those, which is substantially less than the entire
> reservation. Furthermore with the reservation these commands can't ever
> fail, so for those we know are only used in the add_request tail we could
> go to a wrap-only intel_ring_begin which never waits and have one at a
> dword cmd boundary. That means we'd need to overestimate the needed
> ringbuffer space by just a few dwords (namely the size of the longest CS
> cmd we emit under reservation). Which is around 6 dwords or so iirc. And
> to avoid changing ilk we could just special case that in reserve_space().
>
> In practice I don't think there would be any difference with your v5 since
> especially with the scheduler we shouldn't ever overfill rings really. But
> the clear upside is that the reserve_space_end check would be independent
> of any implementation details of how reservation vs. wrapping is done
> exactly. And hence robust against any future fumbles in this area. Looking
> at our history of the relevant code we can expect a lot of those.
> -Daniel