RFC: Radeon multi ring support branch

Tue Nov 15 15:19:20 PST 2011

On 15.11.2011 20:32, Jerome Glisse wrote:
> On Sat, Oct 29, 2011 at 03:00:28PM +0200, Christian König wrote:
>> Hello everybody,
>>
>> to support multiple compute rings, async DMA engines and UVD we need
>> to teach the radeon kernel module how to sync buffers between
>> different rings and make some changes to the command submission
>> ioctls.
>>
>> Since we can't release any documentation about async DMA or UVD
>> (yet), my current branch concentrates on getting the additional
>> compute rings on cayman running. Unfortunately those rings have
>> hardware bugs that can't be worked around, so they are actually not
>> very useful in a production environment, but they should do quite
>> well for this testing purpose.
>>
>> The branch can be found here:
>> http://cgit.freedesktop.org/~deathsimple/linux/log/
>>
>> Since some of the patches are quite intrusive, constantly rebaseing
>> them could get a bit painful. So I would like to see most of the
>> stuff included into drm-next, even if we don't make use of the new
>> functionality right now.
>>
>> Comments welcome,
>> Christian.
> So i have been looking back at all this and now there is somethings
> puzzling me. So if semaphore wait for a non null value at gpu address
> provided in the packet than the current implementation for the cs
> ioctl doesn't work when there is more than 2 rings to sync.
Semaphores are counters, so each signal operation is atomically 
incrementing the counter, while each wait operation is (atomically) 
checking if the counter is above zero and if it is decrement it. So you 
can signal/wait on the same address multiple times.

>
> As it will use only one semaphore so first ring to finish will
> mark the semaphore as done even if there is still other ring not
> done.
Nope, each wait operation will wait for a separate signal operation (at 
least I think so).
>
> This all make me wonder if some change to cs ioctl would make
> all this better. So idea of semaphore is to wait for some other
> ring to finish something. So let say we have following scenario:
> Application submit following to ring1: csA, csB
> Application now submit to ring2: cs1, cs2
> And application want csA to be done for cs1 and csB to be done
> for cs2.
>
> To achieve such usage pattern we would need to return fence seq
> or similar from the cs ioctl. So user application would know
> ringid+fence_seq for csA&  csB and provide this when scheduling
> cs1&  cs2. Here i am assuming MEM_WRITE/WAIT_REG_MEM packet
> are as good as MEM_SEMAPHORE packet. Ie the semaphore packet
> doesn't give us much more than MEM_WRITE/WAIT_REG_MEM would.
>
> To achieve that each ring got it's fence scratch addr where to
> write seq number. And then we use WAIT_REG_MEM on this addr
> and with the specific seq for the other ring that needs
> synchronization. This would simplify the semaphore code as
> we wouldn't need somethings new beside helper function and
> maybe extending the fence structure.
I played around with the same Idea before implementing the whole 
semaphore stuff, but the killer argument against it is that not all 
rings support the WAIT_REG_MEM command.

Also the semaphores are much more efficient than the WAIT_REG_MEM 
command, because all semaphore commands from the different rings are 
send to a central semaphore block, so that constant polling, and with it 
constant memory access can be avoided. Additional to that the 
WAIT_REG_MEM command has a minimum latency of Wait_Interval*16 clock 
cycles, while semaphore need 4 clock cycles for the command and 4 clock 
cycles for the result, so it definitely has a much lower latency.

We should also keep in mind that the semaphore block is not only capable 
to sync between different rings inside a single GPU, but can also 
communicate with another semaphore block in a second GPU. So it is a key 
part in a multi GPU environment.

> Anyway i put updating ring patch at :
> http://people.freedesktop.org/~glisse/mrings/
> It rebased on top of linus tree and it has several space
> indentation fixes and also a fix for no semaphore allocated
> issue (patch 5)
>
Thanks, I will try to integrate the changes tomorrow.

Christian.