RFC: Radeon multi ring support branch

Jerome Glisse j.glisse at gmail.com
Fri Nov 18 07:41:01 PST 2011

On Fri, Nov 18, 2011 at 09:21:50AM -0500, Jerome Glisse wrote:
> 2011/11/18 Christian König <deathsimple at vodafone.de>:
> > On 17.11.2011 17:58, Jerome Glisse wrote:
> >>
> >> 2011/11/17 Christian König<deathsimple at vodafone.de>:
> >>>
> >>> On 16.11.2011 01:24, Jerome Glisse wrote:
> >>>>
> >>>> Well as we don't specify on which value semaphore should wait on, i am
> >>>> prety sure the first ring to increment the semaphore will unblock all
> >>>> waiter. So if you have ring1 that want to wait on ring2 and ring3 as
> >>>> soon as
> >>>> ring2 or ring3 is done ring1 will go one while either ring2 or ring3
> >>>> might
> >>>> not be done. I will test that tomorrow but from doc i have it seems so.
> >>>> Thus
> >>>> it will be broken with more than one ring, that would mean you have to
> >>>> allocate one semaphore for each ring couple you want to synchronize.
> >>>> Note
> >>>> that the usual case will likely be sync btw 2 ring.
> >>>
> >>> Good point, but I played with it a bit more today and it is just behaving
> >>> as
> >>> I thought it would be. A single signal command will just unblock a single
> >>> waiter, even if there are multiple waiters currently for this semaphore,
> >>> the
> >>> only thing you can't tell is which waiter will come first.
> >>
> >> I am not talking about multiple waiter but about single waiter on
> >> multilple
> >> signaler. Current implementation allocate one and only one semaphore
> >> not matter how much ring you want to synchronize with. Which means
> >> the single waiter will be unblock once only a single ring signal, which is
> >> broken if you want to wait for several rings.
> >
> > Wait a moment, having a single semaphore doesn't means that there is only a
> > single waiter, just take a look at the code again:
> >
> >        for (i = 0; i < RADEON_NUM_RINGS; ++i) {
> >                /* no need to sync to our own or unused rings */
> >                if (i == p->ring || !p->sync_to_ring[i])
> >                        continue;
> >
> >                if (!p->ib->fence->semaphore) {
> >                        r = radeon_semaphore_create(p->rdev,
> > &p->ib->fence->semaphore);
> >                        if (r)
> >                                return r;
> >                }
> >
> >                radeon_semaphore_emit_signal(p->rdev, i,
> > p->ib->fence->semaphore);
> >                radeon_semaphore_emit_wait(p->rdev, p->ring,
> > p->ib->fence->semaphore); <--------
> >        }
> >
> > It is allocating a single semaphore, thats correct, but at the same time it
> > emits multiple wait commands. So if we want to synchronize with just one
> > ring, we only wait once on the semaphore, but if we want to synchronize with
> > N rings we also wait N times on the same semaphore. Since we don't really
> > care about the order in which the signal operations arrive that is pretty
> > much ok, because when execution continues after the last wait command it has
> > been made sure that every signal operation has been fired. The extended
> > semaphore test in my repository is also checking for this condition, and it
> > really seems to work fine.
> So how does the wait logic works, that's what i am asking.
> If semaphore block wait for the addr to be non zero than it doesn't work
> If semaphore block wait block until it get a signal than it should work thought
> there is a possible race issue in hw if the signal ring wakeup the semaphore
> block at the same time do the semaphore block consider this as a single
> signal or as 2 signal.
> radeon_semaphore_create(&p->ib->fence->semaphore)
> radeon_semaphore_emit_signal(ringA, &p->ib->fence->semaphore)
> radeon_semaphore_emit_signal(ringB, &p->ib->fence->semaphore)
> radeon_semaphore_emit_wait(ringC, &p->ib->fence->semaphore)
> radeon_semaphore_emit_wait(ringC, &p->ib->fence->semaphore)
> ringC will stall until either ringA or ringB signal if it only wait on non zero
> Code you put in you repo yesterday tested following case:
> radeon_semaphore_emit_wait(ringA, &p->ib->fence->semaphore)
> radeon_semaphore_emit_wait(ringB, &p->ib->fence->semaphore)
> radeon_semaphore_emit_signal(ringC, &p->ib->fence->semaphore)
> Which is fine, i don't argue on such use case. I argue on :
> radeon_semaphore_emit_signal(ringA, &p->ib->fence->semaphore)
> radeon_semaphore_emit_signal(ringB, &p->ib->fence->semaphore)
> radeon_semaphore_emit_wait(ringC, &p->ib->fence->semaphore)
> What i think would be safer is :
> radeon_semaphore_emit_signal(ringA, semaphore_ringa_ringc)
> radeon_semaphore_emit_signal(ringB, semaphore_ringb_ringc)
> radeon_semaphore_emit_wait(ringC,semaphore_ringa_ringc)
> radeon_semaphore_emit_wait(ringC,semaphore_ringb_ringc)
> Which wouldn't be affected by any possible race.
> >>> I should also note that the current algorithm will just emit multiple
> >>> wait
> >>> operations to a single ring, and spread the signal operations to all
> >>> other
> >>> rings we are interested in. That isn't very efficient, but should indeed
> >>> work quite fine.

Ok so i have played with that, and semaphore block never write anything
to memory nor does it read it (at least from what i see). It seems that
the semaphore block queue wait based on addr and unlock wait on signal
one at a time again on addr. Well it does read the value but doesn't
write anything to it afaict. If value is non zero it assume semaphore
have been signaled. My guess is that it's to allow cpu to force a
semaphore to pass.

This lead me to believe that we might be better of using 1 page for all
semaphore and reuse them over and over. Worse case would be when we have
PAGE_SIZE/8 (512 for 4k page) semaphore in flight and we need a new one.
Then we would have to wait for previous semaphore to signal. I think it's
fine given that if we hit that case that means that gpu is super duper
busy with something and that scheduling any new thing can wait (i assume
there won't be gpu with 512 rings or more ;)).

Anyway such simplification can be done as an after patch. I will go over
the serie again as there is a free issue somewhere.


More information about the dri-devel mailing list