[Intel-gfx] [PATCH 55/56] drm/i915: Remove 'faked' request from LRC submission
Daniel Vetter
daniel at ffwll.ch
Wed Mar 11 09:14:13 PDT 2015
On Wed, Mar 11, 2015 at 02:53:39PM +0000, John Harrison wrote:
> On 05/03/2015 14:49, Daniel Vetter wrote:
> >On Thu, Mar 05, 2015 at 01:57:31PM +0000, John.C.Harrison at Intel.com wrote:
> >>From: John Harrison <John.C.Harrison at Intel.com>
> >>
> >>The LRC submission code requires a request for tracking purposes. It does not
> >>actually require that request to 'complete' it simply uses it for keeping hold
> >>of reference counts on contexts and such like.
> >>
> >>In the case where the ring buffer is completely full, the LRC code looks for a
> >>pending request that would free up sufficient space upon completion and waits
> >>for it. If no such request can be found it resorts to simply polling the free
> >>space count until it is big enough. This situation should only occur when the
> >>entire buffer is filled with the request currently being generated. I.e., the
> >>user is trying to submit a single piece of work that is large than the ring
> >>buffer itself (which should be impossible because very large batch buffers don't
> >>consume any more ring buffer space). Before starting to poll, a submit call is
> >>made to make sure that the currently queued up work in the buffer will actually
> >>be submtted and thus the poll will eventually succeed.
> >>
> >>The problem here is that the 'official' request cannot be used as that could
> >>lead to multiple LRC submissions being tagged to a single request structure.
> >>Instead, the code fakes up a private request structure and uses that.
> >>
> >>This patch moves the faked request allocation higher up in the call stack to the
> >>wait code itself (rather than being at the very lowest submission level). Thus
> >>it is now obvious where the faked request is coming from and why it is
> >>necessary. The patch also replaces it with a call to the official request
> >>allocation code rather than attempting to duplicate that code. This becomes
> >>especially important in the future when the request allocation changes to
> >>accommodate a conversion to struct fence.
> >>
> >>For: VIZ-5115
> >>Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> >This is only possible if you pile up tons of olr. Since your patch series
> >fixes this design issue by removing olr I think we can just put a WARN_ON
> >in here if this ever happens and bail out with -ELOSTMYMARBLES or
> >something. And then rip out all this complexity.
> >
> >Or do I miss something important?
> >-Daniel
>
> Yeah, you missed the extremely important bug in the free space calculation
> that meant this impossible code path was being hit on a regular basis. The
> LRC wait_request code differed from the legacy wait_request code in the the
> latter was updated with request->postfix changes and the former was not.
> Thus the LRC one would happily find a request that frees up enough space,
> wait on it, retire it and then find there was still not enough space!
>
> New patches to fix the space calculation bug and to completely remove the
> polling path will be forth coming...
Hm, Jesse did dig out a regression where gem_ringfill blows up on
execlist. That igt is specifically to exercise this cornercases. I'm not
sure whith bugzilla Jesse meant, there's two:
https://bugs.freedesktop.org/show_bug.cgi?id=89001
https://bugs.freedesktop.org/show_bug.cgi?id=88865
Can you please check whether your fixes help for those issues two?
Also adding Jesse since he's chasing these atm.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the Intel-gfx
mailing list