[Mesa-dev] [PATCH 16/20] radeonsi: add FMASK texture binding slots and resource setup

Sun Aug 11 18:25:37 PDT 2013

On Sat, Aug 10, 2013 at 11:09 AM, Christian König
<deathsimple at vodafone.de> wrote:
> Am 10.08.2013 15:53, schrieb Marek Olšák:
>
>> The RCU approach sounds good, but you can never know if 16 is enough.
>> We should release the buffer once it is full and allocate a new one.
>> The cache bufmgr in the winsys will assure there won't be any buffer
>> allocation overhead - it would work kinda a like a ring of buffers.
>
>
> Are you sure of that? The overhead of allocating a new buffer was what
> always looked so unfriendly to me with this approach.
>
> On the other hand the CP definitely can't handle more than 8 contexts at the
> same time (and one of them is always the clear context), so I strongly think
> we should be on the save side with 16 slots here. I'm just not sure if the
> SQ could add some more depth to our pipeline, maybe Alex knows more on this.
>

IIRC, it's 8 contexts on larger cards and 4 on smaller cards.  You can
look up the details in the kernel driver.  The contexts only affect
the CONTEXT registers from the 3D pipeline.

Alex

> Christian.
>
>
>> Marek
>>
>> On Sat, Aug 10, 2013 at 10:45 AM, Christian König
>> <deathsimple at vodafone.de> wrote:
>>>
>>> Am 09.08.2013 20:06, schrieb Marek Olšák:
>>>>
>>>> [SNIP]
>>>>
>>>> What if I kept the current emission code, and only allocated a new
>>>> buffer at the end of the emit function, copied all descriptors to it
>>>> using CP_DMA or COPY_DATA, and pointed SPI_SHADER_USER_DATA to it. The
>>>> buffer where the descriptors are updated using WRITE_DATA would act as
>>>> a staging buffer only and shaders would always read from the fresh new
>>>> copy. Does that sound good to you?
>>>
>>>
>>> That sounds like the solution with multiple buffers I already suggest,
>>> but I
>>> would rather use some RCU approach to it. Basically we just have to
>>> handle
>>> it like the context based resources on earlier asics. So to me a proper
>>> solution should look something like this:
>>>
>>> We allocate a ring of (let's say) 16 slots for descriptor arrays, fill
>>> the
>>> first slot with WRITE_DATA packets and then use it in a draw command.
>>>
>>> As soon as any of the descriptors is about to change we copy it's content
>>> to
>>> the next slot, let the SPI_SHADER_USER_DATA point to it, make the
>>> necessary
>>> updates using WRITE_DATA and then use it in a draw command.
>>>
>>> This repeats over and over again, all we need to make sure is that we
>>> have
>>> enough slots in the ring to be sure that we never override descriptors
>>> when
>>> they are still in use, but I'm pretty sure that we should be on the save
>>> side with 16 or so.
>>>
>>> We can even prepare the commands for the switch from one slot to the next
>>> only once and then use it for the whole lifetime of the driver.
>>>
>>> Christian.
>
>