RFC: Radeon multi ring support branch

Tue Nov 15 11:32:12 PST 2011

On Sat, Oct 29, 2011 at 03:00:28PM +0200, Christian König wrote:
> Hello everybody,
> 
> to support multiple compute rings, async DMA engines and UVD we need
> to teach the radeon kernel module how to sync buffers between
> different rings and make some changes to the command submission
> ioctls.
> 
> Since we can't release any documentation about async DMA or UVD
> (yet), my current branch concentrates on getting the additional
> compute rings on cayman running. Unfortunately those rings have
> hardware bugs that can't be worked around, so they are actually not
> very useful in a production environment, but they should do quite
> well for this testing purpose.
> 
> The branch can be found here:
> http://cgit.freedesktop.org/~deathsimple/linux/log/
> 
> Since some of the patches are quite intrusive, constantly rebaseing
> them could get a bit painful. So I would like to see most of the
> stuff included into drm-next, even if we don't make use of the new
> functionality right now.
> 
> Comments welcome,
> Christian.

So i have been looking back at all this and now there is somethings
puzzling me. So if semaphore wait for a non null value at gpu address
provided in the packet than the current implementation for the cs
ioctl doesn't work when there is more than 2 rings to sync.

http://cgit.freedesktop.org/~deathsimple/linux/commit/?h=multi-ring-testing&id=bae372811c697a889ff0cf9128f52fe914d0fe1b

As it will use only one semaphore so first ring to finish will
mark the semaphore as done even if there is still other ring not
done.

This all make me wonder if some change to cs ioctl would make
all this better. So idea of semaphore is to wait for some other
ring to finish something. So let say we have following scenario:
Application submit following to ring1: csA, csB
Application now submit to ring2: cs1, cs2 
And application want csA to be done for cs1 and csB to be done
for cs2.

To achieve such usage pattern we would need to return fence seq
or similar from the cs ioctl. So user application would know
ringid+fence_seq for csA & csB and provide this when scheduling
cs1 & cs2. Here i am assuming MEM_WRITE/WAIT_REG_MEM packet
are as good as MEM_SEMAPHORE packet. Ie the semaphore packet
doesn't give us much more than MEM_WRITE/WAIT_REG_MEM would.

To achieve that each ring got it's fence scratch addr where to
write seq number. And then we use WAIT_REG_MEM on this addr
and with the specific seq for the other ring that needs
synchronization. This would simplify the semaphore code as
we wouldn't need somethings new beside helper function and
maybe extending the fence structure.

Anyway i put updating ring patch at :
http://people.freedesktop.org/~glisse/mrings/
It rebased on top of linus tree and it has several space
indentation fixes and also a fix for no semaphore allocated
issue (patch 5)

Cheers,
Jerome