[Intel-gfx] [PATCH] [RFC] drm/i915: read-read semaphore optimization

Eric Anholt eric at anholt.net
Tue Jan 17 18:55:34 CET 2012


On Tue, 17 Jan 2012 11:15:31 +0100, Daniel Vetter <daniel at ffwll.ch> wrote:
> On Mon, Jan 16, 2012 at 02:20:55PM -0800, Ben Widawsky wrote:
> > On 01/16/2012 01:50 PM, Daniel Vetter wrote:
> > > On Tue, Dec 13, 2011 at 10:36:15AM -0800, Ben Widawsky wrote:
> > >> On 12/13/2011 09:22 AM, Eric Anholt wrote:
> > >>> On Mon, 12 Dec 2011 19:52:08 -0800, Ben Widawsky<ben at bwidawsk.net>  wrote:
> > >>>> Since we don't differentiate on the different GPU read domains, it
> > >>>> should be safe to allow back to back reads to occur without issuing a
> > >>>> wait (or flush in the non-semaphore case).
> > >>>>
> > >>>> This has the unfortunate side effect that we need to keep track of all
> > >>>> the outstanding buffer reads so that we can synchronize on a write, to
> > >>>> another ring (since we don't know which read finishes first). In other
> > >>>> words, the code is quite simple for two rings, but gets more tricky for
> > >>>>> 2 rings.
> > >>>>
> > >>>> Here is a picture of the solution to the above problem
> > >>>>
> > >>>> Ring 0            Ring 1             Ring 2
> > >>>> batch 0           batch 1            batch 2
> > >>>>   read buffer A     read buffer A      wait batch 0
> > >>>>                                        wait batch 1
> > >>>>                                        write buffer A
> > >>>>
> > >>>> This code is really untested. I'm hoping for some feedback if this is
> > >>>> worth cleaning up, and testing more thoroughly.
> > >>>
> > >>> You say it's an optimization -- do you have performance numbers?
> > >>
> > >> 33% improvement on a hacked version of gem_ring_sync_loop with.
> > >>
> > >> It's not really a valid test as it's not coherent, but this is
> > >> approximately the best case improvement.
> > >>
> > >> Oddly semaphores doesn't make much difference in this test, which
> > >> was surprising.
> > > 
> > > Our domain tracking is already complicated in unfunny ways. And (at least
> > > without a use-case showing gains with hard numbers in either perf or power
> > > usage) I think this patch is the kind of "this looks cool" stuff that
> > > added a lot to the current problem.
> > > 
> > > So before adding more complexity on top I'd like to remove some of the
> > > superflous stuff we already have. I.e. all the flushing_list stuff and
> > > maybe other things ...
> > 
> > Can you be more clear on what exactly you want done before taking a
> > patch like this? Maybe I can work on it during some down time.
> 
> I was thinking about Eric's no-more-domains stuff specifically, which has
> tons of natural split-points - and we want to exploit these for merging.
> Imo step 1 would be to just rework the batch dispatch in
> intel_ringbuffer.c so that we unconditionally invalidate before and flush
> afterwards. The ring->flush would become a no-op. No changes to the core
> flushing_list tracking for step 1.
> 
> If we can get this in for 3.4 we could (in the -next merge cycle) walk all
> the callchains from their ends and rip out everything which is a no-op,
> starting from ring->flush. I think that's the safest way to attack this.
> Eric?

I was stuck with a mysterious regression in my previous attempt at the
patch series.  I've recently realized that I can do the major code
deletion with fewer + lines in the diff, which I hope will avoid that.
I'm working it that again now.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20120117/a690ea1a/attachment.sig>


More information about the Intel-gfx mailing list