[Mesa-dev] [PATCH 16/20] radeonsi: add FMASK texture binding slots and resource setup

Marek Olšák maraeo at gmail.com
Thu Aug 8 10:34:48 PDT 2013


On Thu, Aug 8, 2013 at 6:57 PM, Christian König <deathsimple at vodafone.de> wrote:
> Am 08.08.2013 16:33, schrieb Marek Olšák:
>>
>> On Thu, Aug 8, 2013 at 3:09 PM, Christian König <deathsimple at vodafone.de>
>> wrote:
>>>
>>> Am 08.08.2013 14:38, schrieb Marek Olšák:
>>>
>>>> .On Thu, Aug 8, 2013 at 9:47 AM, Christian König
>>>> <deathsimple at vodafone.de> wrote:
>>>>>
>>>>> Am 08.08.2013 02:20, schrieb Marek Olšák:
>>>>>
>>>>>> FMASK is bound as a separate texture. For every texture, there can be
>>>>>> an FMASK. Therefore a separate array of resource slots has to be
>>>>>> added.
>>>>>>
>>>>>> This adds a new mechanism for emitting resource descriptors, its
>>>>>> features
>>>>>> are:
>>>>>> - resource descriptors are stored in an ordinary buffer (not in a CS)
>>>>>
>>>>>
>>>>> Having resource descriptors outside of the CS has two problems that we
>>>>> need
>>>>> to solve first:
>>>>>
>>>>> 1. Fine grained descriptor updates doesn't work, I already tried that.
>>>>> The
>>>>> problem is that unlike previous asics descriptors are now a memory
>>>>> block,
>>>>> so
>>>>> no longer part of the CP context. So when we (for example) have a draw
>>>>> command executing and the next draw command is using new resources for
>>>>> a
>>>>> specific slot we would either block until the first draw command is
>>>>> finished
>>>>> (which is bad for performance) or change the descriptors while they are
>>>>> still in use (which results in VM faults).
>>>>
>>>> So what would the proper solution be here? Do I need to flush some
>>>> caches or would moving the descriptor updates to the constant IB fix
>>>> that?
>>>
>>>
>>> Actually the current implementation worked better than anything else I
>>> tried.
>>>
>>> When you really need the resource descriptors in a separate buffer you
>>> need
>>> to use one buffer for each draw call and always write the full buffer
>>> contents (no partial updates). Flushing anything won't really help
>>> either..
>>>
>>> The only solution I see using one buffer is to block until the last draw
>>> call is finished with WAIT_REG_MEM, but that would be quite disastrous
>>> for
>>> performance.
>>>
>>>
>>>>> 2. If my understand is correct when they are embedded the descriptors
>>>>> are
>>>>> preloaded into the caches while executing the IB, so to archive the
>>>>> same
>>>>> speed with descriptors outside of the IB you need to add additional
>>>>> commands
>>>>> to the constant IB which is new to SI and we currently doesn't support
>>>>> in
>>>>> the CS interface.
>>>>
>>>> There seems to be support for the constant IB. The CS ioctl chunk ID
>>>> is RADEON_CHUNK_ID_CONST_IB and the allowed packets are listed in
>>>> si_vm_packet3_ce_check. Is there anything missing?
>>>
>>>
>>> The userspace side seems to be missing and except for throwing NOP
>>> packets
>>> into it we never tested it. I know from the closed source side that it
>>> actually was quite tricky for them to get working.
>>>
>>> Additional to that please note that I'm not 100% sure that just putting
>>> the
>>> descriptors into the IB is really helping here. It was just the most
>>> simplest solution to avoid allocating a new buffer on each draw call.
>>
>> I understand. I don't really need to have resource descriptors in a
>> separate buffer, all I need is these 3 basic features a gallium driver
>> should support:
>> - fine-grained resource updates (mainly for performance, see below)
>> - ability to unbind resources (e.g. by setting IMG_RSRC_WORD1 to 0)
>> - no GPU crash if a shader is using SAMPLER[15] but there are no samplers
>> bound
>>
>> FYI, partial sampler view and sampler state updates are coming to
>> gallium, Brian Paul already has some patches, it's just a matter of
>> time now. Vertex and constant buffer states already support partial
>> updates.
>
>
> That shouldn't be to much off a problem.
>
> Just allocate a state at startup and initialize it with the proper pm4
> commands for 16 samplers, then update the resource descriptors in that state
> when we change the bound textures/samplers/views/constants/whatever. All we
> need to do then is setting the emitted state to NULL so that it gets
> re-emitted in the next draw command.

That would re-emit all 16 shader resources even if just one of them
needs to be changed. I was trying to avoid this inefficiency. Is it
really impossible to emit just one resource descriptor and keep the
others unchanged? This is a basic D3D10/11 feature, for example:

void ID3D11DeviceContext::VSSetShaderResources(
  [in]  UINT StartSlot,
  [in]  UINT NumViews,
  [in]  ID3D11ShaderResourceView *const *ppShaderResourceViews
);

If the constant engine is required to implement this interface
efficiently, then I'd like to work on constant IB support.

Marek


More information about the mesa-dev mailing list