[Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce memory usage

Fri Nov 28 19:24:18 CET 2014

On Fri, Nov 28, 2014 at 05:34:31PM +0000, Chris Wilson wrote:
> On Fri, Nov 28, 2014 at 05:05:11PM +0000, Gore, Tim wrote:
> > 
> > > -----Original Message-----
> > > From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> > > Sent: Friday, November 28, 2014 4:47 PM
> > > To: Gore, Tim
> > > Cc: Lespiau, Damien; intel-gfx at lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce memory
> > > usage
> > > 
> > > On Fri, Nov 28, 2014 at 04:34:08PM +0000, Gore, Tim wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> > > > > Sent: Friday, November 28, 2014 4:20 PM
> > > > > To: Lespiau, Damien
> > > > > Cc: Gore, Tim; intel-gfx at lists.freedesktop.org
> > > > > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce
> > > > > memory usage
> > > > >
> > > > > On Fri, Nov 28, 2014 at 04:04:14PM +0000, Damien Lespiau wrote:
> > > > > > On Fri, Nov 28, 2014 at 03:47:01PM +0000, Gore, Tim wrote:
> > > > > > > N_buffers_load is still used. I am still submitting 1000 buffers
> > > > > > > to the ring, its just that I use the same buffers over and over
> > > > > > > (hence the "i %
> > > > > NUM_BUSY_BUFFERS").
> > > > > > > So I only allocate 32 buffers, and each gets copied 1000/32
> > > > > > > times, so the ring is kept busy for as long as previously.
> > > > > >
> > > > > > Ah oops, yes, indeed. Looks good then, pushed, thanks for the patch.
> > > > >
> > > > > The ring is kept as busy, but the queue depth is drastically reduced
> > > > > (from N_buffers to 32). Since both numbers are arbitrary, I am not
> > > > > adverse to the change, but I would feel happier if it was
> > > > > demonstrated that the new test is still capable of detecting bugs
> > > > > deliberately introduced into the ring synchronisation code.
> > > > > -Chris
> > > > >
> > > >
> > > > Excuse a rather novice question, but which queue depth is reduced?
> > > 
> > > We track on the object the last read/write request. If you reuse objects the
> > > effective depth in the read/write queue is reduced, and this queue is
> > > implicitly used when synchronising between rings.
> > > -Chris
> > > 
> > 
> > OK thanks. So the test has changed but in a subtle way. It would seem that the main
> > thrust of the test is still there but perhaps we could do better by checking how much
> > memory we have and then using 1000 buffers of a size that we can accommodate.?
> 
> Yes. You could allocate 1000 single pages and perform lots of little copies
> in each page to generate the workload with large queue depths. Good idea.

I think overall we have way too many copypastas of "throw load with
depencies onto $engine with $ctx". Volunteers to extract something
autotuning and use it everwhere highgly welcome ;-)

I think Ville is working on something for kms_flip (since on vlv the
current workload takes to long and results in timeouts for flips and so
test failures). But not sure whether he'll do the full librarization.

Aside on depencies: As long as we keep things in the same (implicit)
context we won't reorder with the scheduler. Or at least that's been my
working assumption while writing tests. Only for cross-ctx/fd/engine stuff
do the actual depencies matter.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch