[Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views

Marek Olšák maraeo at gmail.com
Tue Aug 27 12:33:03 PDT 2013


On Mon, Aug 26, 2013 at 8:59 PM, Marko Ristola
<marko.ristola at kolumbus.fi> wrote:
>
> Hi
>
>
> 15.08.2013 13:54, Marek Olšák wrote:
>>
>> On Thu, Aug 15, 2013 at 10:27 AM, Christian König
>> <deathsimple at vodafone.de> wrote:
>>>
>>> Am 15.08.2013 05:25, schrieb Marek Olšák:
>>>
>>>> (This should be applied before MSAA, which will need to be rebased.)
>>>>
>>>> It moves all sampler view descriptors to a buffer.
>>>> It supports partial resource updates and it can also unbind resources
>>>> (required for FMASK texturing).
>>>>
>>>> The buffer contains all sampler view descriptors for one shader stage,
>>>> represented as an array. On top of that, there are N arrays in the
>>>> buffer,
>>>> which are used to emulate context registers as implemented by the
>>>> previous
>>>> ASICs (each array is a context).
>>>>
>>>> This uses the RCU synchronization approach to avoid read-after-write
>>>> hazards
>>>> as discussed in the thread:
>>>> "radeonsi: add FMASK texture binding slots and resource setup"
>>>>
>>>> CP DMA is used to clear the descriptors at context initialization and to
>>>> copy
>>>> the descriptors from one context to the next.
>>>>
>>>> IMPORTANT:
>>>>     128 resource contexts are needed, 64 doesn't work. If I set
>>>>     SH_KCACHE_ACTION_ENA before every draw call, only 2 contexts are
>>>> needed.
>>>>     I don't have an explanation for this.
>>>> ---
>>>
>>>
>>>
>>> The idea itself looks really good to me, but we should probably also move
>>> the all resources and samplers to the new model and then rip out the code
>>> that stores them directly into the IB.
>>
>>
>> I'd like MSAA to land first, but yes, the plan is to eventually move
>> all resources and samplers to the new model.
>>
>>>
>>>
>>>> +/* Emit a CP DMA packet to do a copy from one buffer to another.
>>>> + * The size must fit in bits [20:0]. Notes:
>>>> + *
>>>> + * 1) Set sync to true if you want the 3D engine to wait until CP DMA
>>>> is
>>>> done.
>>>> + *
>>>> + * 2) Set raw_hazard_wait to true if the source data was used as a
>>>> destination
>>>> + *    in a previous CP DMA packet. It's for preventing a
>>>> read-after-write
>>>> hazard
>>>> + *    between two CP DMA packets.
>>>> + */
>>>> +static void si_emit_cp_dma_copy_buffer(struct r600_context *rctx,
>>>> +                                      uint64_t dst_va, uint64_t src_va,
>>>> +                                      unsigned size,
>>>> +                                      bool sync, bool raw_hazard_wait)
>>>> +{
>>>> +       struct radeon_winsys_cs *cs = rctx->cs;
>>>> +       uint32_t sync_flag = sync ? PKT3_CP_DMA_CP_SYNC : 0;
>>>> +       uint32_t raw_wait = raw_hazard_wait ? PKT3_CP_DMA_CMD_RAW_WAIT :
>>>> 0;
>>>> +
>>>> +       assert(size);
>>>> +       assert((size & ((1<<21)-1)) == size);
>>>> +
>>>> +       cs->buf[cs->cdw++] = PKT3(PKT3_CP_DMA, 4, 0);
>>>> +       cs->buf[cs->cdw++] = src_va;                    /* SRC_ADDR_LO
>>>> [31:0] */
>>>> +       cs->buf[cs->cdw++] = sync_flag | ((src_va >> 32) & 0xff); /*
>>>> CP_SYNC [31] | SRC_ADDR_HI [7:0] */
>>>> +       cs->buf[cs->cdw++] = dst_va;                    /* DST_ADDR_LO
>>>> [31:0] */
>>>> +       cs->buf[cs->cdw++] = (dst_va >> 32) & 0xff;     /* DST_ADDR_HI
>>>> [7:0] */
>>>> +       cs->buf[cs->cdw++] = size | raw_wait;           /* COMMAND
>>>> [29:22]
>>>> | BYTE_COUNT [20:0] */
>>>> +}
>>>> +
>>>> +/* Emit a CP DMA packet to clear a buffer. The size must fit in bits
>>>> [20:0]. */
>>>> +static void si_emit_cp_dma_clear_buffer(struct r600_context *rctx,
>>>> +                                       uint64_t dst_va, unsigned size,
>>>> +                                       uint32_t clear_value,
>>>> +                                       bool sync, bool raw_hazard_wait)
>>>> +{
>>>> +       struct radeon_winsys_cs *cs = rctx->cs;
>>>> +       uint32_t sync_flag = sync ? PKT3_CP_DMA_CP_SYNC : 0;
>>>> +       uint32_t raw_wait = raw_hazard_wait ? PKT3_CP_DMA_CMD_RAW_WAIT :
>>>> 0;
>>>> +
>>>> +       assert(size);
>>>> +       assert((size & ((1<<21)-1)) == size);
>>>> +
>>>> +       cs->buf[cs->cdw++] = PKT3(PKT3_CP_DMA, 4, 0);
>>>> +       cs->buf[cs->cdw++] = clear_value;               /* DATA [31:0]
>>>> */
>>>> +       cs->buf[cs->cdw++] = sync_flag | PKT3_CP_DMA_SRC_SEL(2); /*
>>>> CP_SYNC [31] | SRC_SEL[30:29] */
>>>> +       cs->buf[cs->cdw++] = dst_va;                    /* DST_ADDR_LO
>>>> [31:0] */
>>>> +       cs->buf[cs->cdw++] = (dst_va >> 32) & 0xff;     /* DST_ADDR_HI
>>>> [7:0] */
>>>> +       cs->buf[cs->cdw++] = size | raw_wait;           /* COMMAND
>>>> [29:22]
>>>> | BYTE_COUNT [20:0] */
>>>> +}
>>>
>>>
>>>
>>> Can we use some kind of macro or inline function instead of
>>> "cs->buf[cs->cdw++] " ? That should help of we need to port that over to
>>> a
>>> different CS mechanism.
>>
>>
>> How about this?
>>
>> static INLINE void
>> r600_write_value(struct radeon_winsys_cs *cs, unsigned value)
>> {
>>      cs->buf[cs->cdw++] = value;
>> }
>>
>>
>>>
>>> And IIRC the CP DMA is identical on all chipset generation (maybe
>>> excluding
>>> early R6xx, but I'm not 100% sure of that), so it might be a good idea to
>>> start sharing code again by putting this under
>>> "src/gallium/drivers/radeon/radeon_cp_dma.c". Not necessary now, but more
>>> as
>>> a general idea. What do you think?
>>
>>
>> I agree.
>>
>> CP DMA is indeed identical on all chipsets. The copying is supported
>> since R600 and the clearing is supported since Evergreen.
>
>
> Maybe you already thought: One way to emulate clearing is to
> copy with CP DMA from a constant cleared memory area.

We emulate clearing with streamout (AKA transform feedback) on
r600-r700. The clearing binds the clear value as a vertex buffer with
stride=0, so that it's broadcast to all elements in the destination.

Marek


More information about the mesa-dev mailing list