[Mesa-dev] [PATCH 16/20] radeonsi: add FMASK texture binding slots and resource setup

Fri Aug 9 01:34:22 PDT 2013

Am 08.08.2013 21:38, schrieb Alex Deucher:
> On Thu, Aug 8, 2013 at 1:34 PM, Marek Olšák <maraeo at gmail.com> wrote:
>> On Thu, Aug 8, 2013 at 6:57 PM, Christian König <deathsimple at vodafone.de> wrote:
>>> Am 08.08.2013 16:33, schrieb Marek Olšák:
>>>> On Thu, Aug 8, 2013 at 3:09 PM, Christian König <deathsimple at vodafone.de>
>>>> wrote:
>>>>> Am 08.08.2013 14:38, schrieb Marek Olšák:
>>>>>
>>>>>> .On Thu, Aug 8, 2013 at 9:47 AM, Christian König
>>>>>> <deathsimple at vodafone.de> wrote:
>>>>>>> Am 08.08.2013 02:20, schrieb Marek Olšák:
>>>>>>>
>>>>>>>> FMASK is bound as a separate texture. For every texture, there can be
>>>>>>>> an FMASK. Therefore a separate array of resource slots has to be
>>>>>>>> added.
>>>>>>>>
>>>>>>>> This adds a new mechanism for emitting resource descriptors, its
>>>>>>>> features
>>>>>>>> are:
>>>>>>>> - resource descriptors are stored in an ordinary buffer (not in a CS)
>>>>>>>
>>>>>>> Having resource descriptors outside of the CS has two problems that we
>>>>>>> need
>>>>>>> to solve first:
>>>>>>>
>>>>>>> 1. Fine grained descriptor updates doesn't work, I already tried that.
>>>>>>> The
>>>>>>> problem is that unlike previous asics descriptors are now a memory
>>>>>>> block,
>>>>>>> so
>>>>>>> no longer part of the CP context. So when we (for example) have a draw
>>>>>>> command executing and the next draw command is using new resources for
>>>>>>> a
>>>>>>> specific slot we would either block until the first draw command is
>>>>>>> finished
>>>>>>> (which is bad for performance) or change the descriptors while they are
>>>>>>> still in use (which results in VM faults).
>>>>>> So what would the proper solution be here? Do I need to flush some
>>>>>> caches or would moving the descriptor updates to the constant IB fix
>>>>>> that?
>>>>>
>>>>> Actually the current implementation worked better than anything else I
>>>>> tried.
>>>>>
>>>>> When you really need the resource descriptors in a separate buffer you
>>>>> need
>>>>> to use one buffer for each draw call and always write the full buffer
>>>>> contents (no partial updates). Flushing anything won't really help
>>>>> either..
>>>>>
>>>>> The only solution I see using one buffer is to block until the last draw
>>>>> call is finished with WAIT_REG_MEM, but that would be quite disastrous
>>>>> for
>>>>> performance.
>>>>>
>>>>>
>>>>>>> 2. If my understand is correct when they are embedded the descriptors
>>>>>>> are
>>>>>>> preloaded into the caches while executing the IB, so to archive the
>>>>>>> same
>>>>>>> speed with descriptors outside of the IB you need to add additional
>>>>>>> commands
>>>>>>> to the constant IB which is new to SI and we currently doesn't support
>>>>>>> in
>>>>>>> the CS interface.
>>>>>> There seems to be support for the constant IB. The CS ioctl chunk ID
>>>>>> is RADEON_CHUNK_ID_CONST_IB and the allowed packets are listed in
>>>>>> si_vm_packet3_ce_check. Is there anything missing?
>>>>>
>>>>> The userspace side seems to be missing and except for throwing NOP
>>>>> packets
>>>>> into it we never tested it. I know from the closed source side that it
>>>>> actually was quite tricky for them to get working.
>>>>>
>>>>> Additional to that please note that I'm not 100% sure that just putting
>>>>> the
>>>>> descriptors into the IB is really helping here. It was just the most
>>>>> simplest solution to avoid allocating a new buffer on each draw call.
>>>> I understand. I don't really need to have resource descriptors in a
>>>> separate buffer, all I need is these 3 basic features a gallium driver
>>>> should support:
>>>> - fine-grained resource updates (mainly for performance, see below)
>>>> - ability to unbind resources (e.g. by setting IMG_RSRC_WORD1 to 0)
>>>> - no GPU crash if a shader is using SAMPLER[15] but there are no samplers
>>>> bound
>>>>
>>>> FYI, partial sampler view and sampler state updates are coming to
>>>> gallium, Brian Paul already has some patches, it's just a matter of
>>>> time now. Vertex and constant buffer states already support partial
>>>> updates.
>>>
>>> That shouldn't be to much off a problem.
>>>
>>> Just allocate a state at startup and initialize it with the proper pm4
>>> commands for 16 samplers, then update the resource descriptors in that state
>>> when we change the bound textures/samplers/views/constants/whatever. All we
>>> need to do then is setting the emitted state to NULL so that it gets
>>> re-emitted in the next draw command.
>> That would re-emit all 16 shader resources even if just one of them
>> needs to be changed. I was trying to avoid this inefficiency. Is it
>> really impossible to emit just one resource descriptor and keep the
>> others unchanged? This is a basic D3D10/11 feature, for example:
>>
>> void ID3D11DeviceContext::VSSetShaderResources(
>>    [in]  UINT StartSlot,
>>    [in]  UINT NumViews,
>>    [in]  ID3D11ShaderResourceView *const *ppShaderResourceViews
>> );
>>
>> If the constant engine is required to implement this interface
>> efficiently, then I'd like to work on constant IB support.
> You'll need to either store them in memory or re-emit them if you
> store them in the IB.  The CE is mainly there so that it can prime the
> TC in parallel with the command stream processing.

Yeah indeed. The CE is just for prefetching everything into caches and 
doesn't really help here.

The only two options I see is either fully emitting it into the command 
stream whenever anything changes or allocating a new buffer for the 
resources on each new draw call, copying over the old state and then 
setting just the things that changed. Both options have their pro and 
cons, no idea what might be better.

Fact is the resource descriptors are not allowed to change as long as 
the shaders are running.

Christian.