[Intel-gfx] [PATCH] drm/i915: fixup seqno allocation logic for lazy_request

Wed Jan 25 16:46:51 CET 2012

On Wed, 25 Jan 2012 16:32:49 +0100, Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
> Currently we reserve seqnos only when we emit the request to the ring
> (by bumping dev_priv->next_seqno), but start using it much earlier for
> ring->oustanding_lazy_request. When 2 threads compete for the gpu and
> run on two different rings (e.g. ddx on blitter vs. compositor)
> hilarity ensued, especially when we get constantly interrupted while
> reserving buffers.
> 
> Breakage seems to have been introduced in
> 
> commit 6f392d548658a17600da7faaf8a5df25ee5f01f6
> Author: Chris Wilson <chris at chris-wilson.co.uk>
> Date:   Sat Aug 7 11:01:22 2010 +0100
> 
>     drm/i915: Use a common seqno for all rings.
> 
> This patch fixes up the seqno reservation logic by moving it into
> i915_gem_next_request_seqno. The ring->add_request functions now
> superflously still return the new seqno through a pointer, that will
> be refactored in the next patch.
> 
> Note that with this change we now unconditionally allocate a seqno,
> even when ->add_request might fail because the rings are full and the
> gpu died. But this does not open up a new can of worms because we can
> already leave behind an outstanding_request_seqno if e.g. the caller
> gets interrupted with a signal while stalling for the gpu in the
> eviciton paths. And with the bugfix we only ever have one seqno
> allocated per ring (and only that ring), so there are no ordering
> issues with multiple outstanding seqnos on the same ring.
> 
> v2: Keep i915_gem_get_seqno (but move it to i915_gem.c) to make it
> clear that we only have one seqno counter for all rings. Suggested by
> Chris Wilson.
> 
> v3: As suggested by Chris Wilson use i915_gem_next_request_seqno
> instead of ring->oustanding_lazy_request to make the follow-up
> refactoring more clearly correct. Also improve the commit message
> with issues discussed on irc.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45181
> Tested-by: Nicolas Kalkhof nkalkhof()at()web.de
> Signed-Off-by: Daniel Vetter <daniel.vetter at ffwll.ch>

I'm not completely sold on the extra i915_gem_get_next_request_seqno()
in i915_add_request(). Daniel describes it as paranoia, I think of it as
muddling the real bugfix with a bit of extra confusion. Nevertheless, it
does fix the issue where the seqno emitted by the ring is at odds with
the seqno assigned to the buffers associated with that request, and that
is clearly a good thing. I haven't quite managed to join the dots and
create a scenario whereby the ring never advances its seqno to one more
advanced than assigned to a buffer (outside of pathological wraparound)
and so I don't see how by itself we would have confused the GPU as the
ring state itself would always be internally consistent.

Anyway,
Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>

The actual refactoring patch are also ok, though I'd like for Daniel
to scope out who owns the seqno vs the request, especially in the light
of no-more-domains...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre