[Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce memory usage

Fri Nov 28 19:49:40 CET 2014

On Fri, Nov 28, 2014 at 07:24:18PM +0100, Daniel Vetter wrote:
> On Fri, Nov 28, 2014 at 05:34:31PM +0000, Chris Wilson wrote:
> > On Fri, Nov 28, 2014 at 05:05:11PM +0000, Gore, Tim wrote:
> > > 
> > > > -----Original Message-----
> > > > From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> > > > Sent: Friday, November 28, 2014 4:47 PM
> > > > To: Gore, Tim
> > > > Cc: Lespiau, Damien; intel-gfx at lists.freedesktop.org
> > > > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce memory
> > > > usage
> > > > 
> > > > On Fri, Nov 28, 2014 at 04:34:08PM +0000, Gore, Tim wrote:
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> > > > > > Sent: Friday, November 28, 2014 4:20 PM
> > > > > > To: Lespiau, Damien
> > > > > > Cc: Gore, Tim; intel-gfx at lists.freedesktop.org
> > > > > > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce
> > > > > > memory usage
> > > > > >
> > > > > > On Fri, Nov 28, 2014 at 04:04:14PM +0000, Damien Lespiau wrote:
> > > > > > > On Fri, Nov 28, 2014 at 03:47:01PM +0000, Gore, Tim wrote:
> > > > > > > > N_buffers_load is still used. I am still submitting 1000 buffers
> > > > > > > > to the ring, its just that I use the same buffers over and over
> > > > > > > > (hence the "i %
> > > > > > NUM_BUSY_BUFFERS").
> > > > > > > > So I only allocate 32 buffers, and each gets copied 1000/32
> > > > > > > > times, so the ring is kept busy for as long as previously.
> > > > > > >
> > > > > > > Ah oops, yes, indeed. Looks good then, pushed, thanks for the patch.
> > > > > >
> > > > > > The ring is kept as busy, but the queue depth is drastically reduced
> > > > > > (from N_buffers to 32). Since both numbers are arbitrary, I am not
> > > > > > adverse to the change, but I would feel happier if it was
> > > > > > demonstrated that the new test is still capable of detecting bugs
> > > > > > deliberately introduced into the ring synchronisation code.
> > > > > > -Chris
> > > > > >
> > > > >
> > > > > Excuse a rather novice question, but which queue depth is reduced?
> > > > 
> > > > We track on the object the last read/write request. If you reuse objects the
> > > > effective depth in the read/write queue is reduced, and this queue is
> > > > implicitly used when synchronising between rings.
> > > > -Chris
> > > > 
> > > 
> > > OK thanks. So the test has changed but in a subtle way. It would seem that the main
> > > thrust of the test is still there but perhaps we could do better by checking how much
> > > memory we have and then using 1000 buffers of a size that we can accommodate.?
> > 
> > Yes. You could allocate 1000 single pages and perform lots of little copies
> > in each page to generate the workload with large queue depths. Good idea.
> 
> I think overall we have way too many copypastas of "throw load with
> depencies onto $engine with $ctx". Volunteers to extract something
> autotuning and use it everwhere highgly welcome ;-)
> 
> I think Ville is working on something for kms_flip (since on vlv the
> current workload takes to long and results in timeouts for flips and so
> test failures). But not sure whether he'll do the full librarization.

My current hacked up dummy load stuff still lives in kms_flip. It just
tries to tune for a 1 second delay now. And it still uses two bos to do
the copies, but those are now fixed 2kx2k and only the final copy is
aimed at the fb. Seems to working reosonably well now. I'll try to clean
it up a bit and post it to the list, but it's getting a bit late here
so that'll have to wait until early next week.

-- 
Ville Syrjälä
Intel OTC