[Intel-gfx] [PATCH] drm/i915: Keep ring->active_list and ring->requests_list consistent

Fri Mar 20 08:33:08 PDT 2015

On Fri, Mar 20, 2015 at 03:04:39PM +0000, Chris Wilson wrote:
> On Fri, Mar 20, 2015 at 04:00:50PM +0100, Daniel Vetter wrote:
> > On Fri, Mar 20, 2015 at 02:45:04PM +0000, Chris Wilson wrote:
> > > On Fri, Mar 20, 2015 at 03:32:52PM +0100, Daniel Vetter wrote:
> > > > But if we do that short-circuiting in ring_idle the all the requests
> > > > _should_ be completed. Which meanse retire_request_ring should move all
> > > > buffers to the inactive list, even when we do that before retiring
> > > > requests.
> > > 
> > > We test for the requests to be retired after we test for the buffers to
> > > be retired. It is very easy then for us to have active buffers as the
> > > seqno advanced after the buffer retirement and before the requests. That
> > > is (one of) the reasons why we previously sampled seqno only once when
> > > retiring buffers + requests.
> > 
> > Yeah I get that part of the race. But before we retire anything in these
> > callsites we call gpu_idle. And that waits for everything to complete,
> > except whent there are not outstanding requests (i.e. ->request_list is
> > empyt). So either
> > - ->request_list is empty in ring_idle, which means all requests should
> >   have completed.  Even if there are some lingering active buffers still
> >   around we should clean them up.
> > - ->request_list is not empty, in which case we do a full wait for the
> >   most recent request. Again all requests should have completed and we
> >   should be able to clean out both request and active lists.
> > 
> > I do see how we can get out of the retire_request functions with requests
> > empty but still active buffers around. But I don't understand how that's
> > possible with a gpu_idle in front. And thus far all traces are from places
> > where we do call gpu_idle first.
> > 
> > Or am I missing something?
> 
> The retire comes before the before the gpu_idle (we retire often as a
> part of busy, execbuffer, timers etc). The traces show exactly that.

Yeah, the sequence I see is:
1. retire requests leaves active objects behind with all requests retired.
2. evict_vim
|-> 2a. gpu_idle
|-> 2b. retire_requests
|-> 2c. WARN_ON(i915_gem_evict_vm);

I agree with you that before the call to evict_vm the lists are
inconsistent. What I don't understand how that inconsistency can get past
the 2a/2b double-punch.

Or do I have the wrong sequence in mind?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch