[Mesa-dev] [PATCH] r600g: Use a fake reloc to sleep for fences

Wed Feb 1 09:34:22 PST 2012

On Wednesday 1 February 2012, Michel Dänzer <michel at daenzer.net> wrote:
> On Mit, 2012-02-01 at 15:01 +0000, Simon Farnsworth wrote: 
> > +	if (sleep_bo) {
> > +		unsigned reloc_index;
> > +		/* Create a dummy BO so that fence_finish without a timeout can sleep waiting for completion */
> > +		*sleep_bo = ctx->ws->buffer_create(ctx->ws, 1, 1,
> > +						   PIPE_BIND_CUSTOM,
> > +						   RADEON_DOMAIN_GTT);
> > +		/* Add the fence as a dummy relocation. */
> > +		reloc_index = ctx->ws->cs_add_reloc(ctx->cs,
> > +						    ctx->ws->buffer_get_cs_handle(*sleep_bo),
> > +						    RADEON_USAGE_READWRITE, RADEON_DOMAIN_GTT);
> > +		if (reloc_index >= ctx->creloc)
> > +			ctx->creloc = reloc_index+1;
> > +	}
> 
> Is there a point in making sleep_bo optional?
>
I can't think of a reason to make it optional; I'll remove that in v2.
> 
> > diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c
> > index c38fbc5..71e31b1 100644
> > --- a/src/gallium/drivers/r600/r600_pipe.c
> > +++ b/src/gallium/drivers/r600/r600_pipe.c
> > @@ -605,6 +605,14 @@ static boolean r600_fence_finish(struct pipe_screen *pscreen,
> >  	}
> >  
> >  	while (rscreen->fences.data[rfence->index] == 0) {
> > +		/* Special-case infinite timeout */
> > +		if (timeout == PIPE_TIMEOUT_INFINITE &&
> > +		    rfence->sleep_bo) {
> > +			rscreen->ws->buffer_wait(rfence->sleep_bo, RADEON_USAGE_READWRITE);
> > +			pb_reference(&rfence->sleep_bo, NULL);
> > +			continue;
> > +		}
> 
> I think rfence->sleep_bo should only be unreferenced in
> r600_fence_reference() when the fence is recycled, otherwise it'll be
> leaked if r600_fence_finish() is never called for some reason.
>
I'll fix this in v2.

> If r600_fence_finish() only ever called os_time_sleep(), never
> sched_yield() (like r300_fence_finish()), would that avoid your problem
> even with a finite timeout?
>
I experimented with that - depending on the specific workload, I need the
timeout to vary, otherwise I can see the impact of the loop in terms of bad
latency behaviour (resulting in occasional dropped frames). For the
workloads I tried, I needed the sleep to vary between 1 usec (for
low-complexity workloads) and 100 usec (for high complexity
workloads). Recompiling Mesa for each workload is obviously not an option.

I did try an adaptive spin - essentially removing the "if (spins++ % 256)
continue", and adding:

if (spins < 40)
    os_sleep_time(1);
else if (spins < 100)
    os_sleep_time(10);
else
    os_sleep_time(100);

But I felt this was ugly, when the core problem is that I want to sleep
until completion, the hardware has support for sleeping until completion,
and the only reason I can't is deficiencies in the driver stack.

Fundamentally, I suspect that the reason I'm seeing pain from this and other
people aren't is that I'm comparing an AMD E-350 to an Intel Atom D510, and
I've tuned my software stack on the D510 to within an inch of its life.

My expectation is that the better GPU in the E-350 will make my 2D
graphics-intensive workload (OpenGL compositing of 2D movies) perform about
as well as it did on the D510 - sleep-based waiting for fence completion
gets in the way, as the D510 has slightly more CPU power than the E-350, and
I'm not (yet) fully exploiting the E-350's GPU.
-- 
Simon Farnsworth
Software Engineer
ONELAN Limited
http://www.onelan.com/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20120201/75363f05/attachment.pgp>