RFC: Radeon multi ring support branch

Fri Nov 18 06:21:50 PST 2011

2011/11/18 Christian König <deathsimple at vodafone.de>:
> On 17.11.2011 17:58, Jerome Glisse wrote:
>>
>> 2011/11/17 Christian König<deathsimple at vodafone.de>:
>>>
>>> On 16.11.2011 01:24, Jerome Glisse wrote:
>>>>
>>>> Well as we don't specify on which value semaphore should wait on, i am
>>>> prety sure the first ring to increment the semaphore will unblock all
>>>> waiter. So if you have ring1 that want to wait on ring2 and ring3 as
>>>> soon as
>>>> ring2 or ring3 is done ring1 will go one while either ring2 or ring3
>>>> might
>>>> not be done. I will test that tomorrow but from doc i have it seems so.
>>>> Thus
>>>> it will be broken with more than one ring, that would mean you have to
>>>> allocate one semaphore for each ring couple you want to synchronize.
>>>> Note
>>>> that the usual case will likely be sync btw 2 ring.
>>>
>>> Good point, but I played with it a bit more today and it is just behaving
>>> as
>>> I thought it would be. A single signal command will just unblock a single
>>> waiter, even if there are multiple waiters currently for this semaphore,
>>> the
>>> only thing you can't tell is which waiter will come first.
>>
>> I am not talking about multiple waiter but about single waiter on
>> multilple
>> signaler. Current implementation allocate one and only one semaphore
>> not matter how much ring you want to synchronize with. Which means
>> the single waiter will be unblock once only a single ring signal, which is
>> broken if you want to wait for several rings.
>
> Wait a moment, having a single semaphore doesn't means that there is only a
> single waiter, just take a look at the code again:
>
>        for (i = 0; i < RADEON_NUM_RINGS; ++i) {
>                /* no need to sync to our own or unused rings */
>                if (i == p->ring || !p->sync_to_ring[i])
>                        continue;
>
>                if (!p->ib->fence->semaphore) {
>                        r = radeon_semaphore_create(p->rdev,
> &p->ib->fence->semaphore);
>                        if (r)
>                                return r;
>                }
>
>                radeon_semaphore_emit_signal(p->rdev, i,
> p->ib->fence->semaphore);
>                radeon_semaphore_emit_wait(p->rdev, p->ring,
> p->ib->fence->semaphore); <--------
>        }
>
> It is allocating a single semaphore, thats correct, but at the same time it
> emits multiple wait commands. So if we want to synchronize with just one
> ring, we only wait once on the semaphore, but if we want to synchronize with
> N rings we also wait N times on the same semaphore. Since we don't really
> care about the order in which the signal operations arrive that is pretty
> much ok, because when execution continues after the last wait command it has
> been made sure that every signal operation has been fired. The extended
> semaphore test in my repository is also checking for this condition, and it
> really seems to work fine.

So how does the wait logic works, that's what i am asking.
If semaphore block wait for the addr to be non zero than it doesn't work
If semaphore block wait block until it get a signal than it should work thought
there is a possible race issue in hw if the signal ring wakeup the semaphore
block at the same time do the semaphore block consider this as a single
signal or as 2 signal.

radeon_semaphore_create(&p->ib->fence->semaphore)
radeon_semaphore_emit_signal(ringA, &p->ib->fence->semaphore)
radeon_semaphore_emit_signal(ringB, &p->ib->fence->semaphore)
radeon_semaphore_emit_wait(ringC, &p->ib->fence->semaphore)
radeon_semaphore_emit_wait(ringC, &p->ib->fence->semaphore)

ringC will stall until either ringA or ringB signal if it only wait on non zero

Code you put in you repo yesterday tested following case:
radeon_semaphore_emit_wait(ringA, &p->ib->fence->semaphore)
radeon_semaphore_emit_wait(ringB, &p->ib->fence->semaphore)
radeon_semaphore_emit_signal(ringC, &p->ib->fence->semaphore)

Which is fine, i don't argue on such use case. I argue on :
radeon_semaphore_emit_signal(ringA, &p->ib->fence->semaphore)
radeon_semaphore_emit_signal(ringB, &p->ib->fence->semaphore)
radeon_semaphore_emit_wait(ringC, &p->ib->fence->semaphore)

What i think would be safer is :
radeon_semaphore_emit_signal(ringA, semaphore_ringa_ringc)
radeon_semaphore_emit_signal(ringB, semaphore_ringb_ringc)
radeon_semaphore_emit_wait(ringC,semaphore_ringa_ringc)
radeon_semaphore_emit_wait(ringC,semaphore_ringb_ringc)

Which wouldn't be affected by any possible race.

>>> I should also note that the current algorithm will just emit multiple
>>> wait
>>> operations to a single ring, and spread the signal operations to all
>>> other
>>> rings we are interested in. That isn't very efficient, but should indeed
>>> work quite fine.
>>
>> Well the cs patch you posted don't do that. It create one semaphore
>> and emit a wait with that semaphore and emit signal to all ring. That
>> was my point. But i agree that creating a semaphore for each ring
>> couple you want to wait for will work.
>>
>>>> After retesting the first patch  drm/radeon: fix debugfs handling is NAK
>>>> a complete no go.
>>>>
>>>> Issue is that radeon_debugfs_cleanup is call after rdev is free. This
>>>> is why i used a static array. I forgot about that, i should have put a
>>>> comment. I guess you built your kernel without debugfs or that you
>>>> didn't tested to reload the module.
>>>
>>> Mhm, I have tested it, seen the crash, and didn't thought that this is a
>>> problem. Don't ask me why I can't understand it myself right now.
>>>
>>> Anyway, I moved the unregistering of the files into a separate function,
>>> which is now called from radeon_device_fini instead of
>>> radeon_debugfs_cleanup. That seems to work fine, at least if I haven't
>>> missed something else.
>>
>> Please don't touch the debugfs stuff, it's fine the way it is. No need to
>> change anything there.
>
> At least for me there is need to change something, because using a static
> variable causes the debugfs files to only appear on the first card in the
> system. And in my configuration the first card is always the onboard APU, so
> I wondered for half a day why a locked up IB doesn't contains any data,
> until i finally recognized that I'm looking at the wrong GPU in the system.
>
>>> I also merged your indention fixes and the fix for the never allocated
>>> semaphores and pushed the result into my public repository
>>> (http://cgit.freedesktop.org/~deathsimple/linux), so please take another
>>> look at it.
>>>
>> I will take a look
>>
> Thanks, but please be aware that I'm going to also integrate Alex patches
> ontop of it today and also give it another test round, just to make sure
> that I'm not doing anything stupid with the debugfs code. So if you haven't
> done already please wait a bit more.
>
> Thanks,
> Christian.
>