[Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views

Marek Olšák maraeo at gmail.com
Thu Aug 15 03:54:14 PDT 2013


On Thu, Aug 15, 2013 at 10:27 AM, Christian König
<deathsimple at vodafone.de> wrote:
> Am 15.08.2013 05:25, schrieb Marek Olšák:
>
>> (This should be applied before MSAA, which will need to be rebased.)
>>
>> It moves all sampler view descriptors to a buffer.
>> It supports partial resource updates and it can also unbind resources
>> (required for FMASK texturing).
>>
>> The buffer contains all sampler view descriptors for one shader stage,
>> represented as an array. On top of that, there are N arrays in the buffer,
>> which are used to emulate context registers as implemented by the previous
>> ASICs (each array is a context).
>>
>> This uses the RCU synchronization approach to avoid read-after-write
>> hazards
>> as discussed in the thread:
>> "radeonsi: add FMASK texture binding slots and resource setup"
>>
>> CP DMA is used to clear the descriptors at context initialization and to
>> copy
>> the descriptors from one context to the next.
>>
>> IMPORTANT:
>>    128 resource contexts are needed, 64 doesn't work. If I set
>>    SH_KCACHE_ACTION_ENA before every draw call, only 2 contexts are
>> needed.
>>    I don't have an explanation for this.
>> ---
>
>
> The idea itself looks really good to me, but we should probably also move
> the all resources and samplers to the new model and then rip out the code
> that stores them directly into the IB.

I'd like MSAA to land first, but yes, the plan is to eventually move
all resources and samplers to the new model.

>
>
>> +/* Emit a CP DMA packet to do a copy from one buffer to another.
>> + * The size must fit in bits [20:0]. Notes:
>> + *
>> + * 1) Set sync to true if you want the 3D engine to wait until CP DMA is
>> done.
>> + *
>> + * 2) Set raw_hazard_wait to true if the source data was used as a
>> destination
>> + *    in a previous CP DMA packet. It's for preventing a read-after-write
>> hazard
>> + *    between two CP DMA packets.
>> + */
>> +static void si_emit_cp_dma_copy_buffer(struct r600_context *rctx,
>> +                                      uint64_t dst_va, uint64_t src_va,
>> +                                      unsigned size,
>> +                                      bool sync, bool raw_hazard_wait)
>> +{
>> +       struct radeon_winsys_cs *cs = rctx->cs;
>> +       uint32_t sync_flag = sync ? PKT3_CP_DMA_CP_SYNC : 0;
>> +       uint32_t raw_wait = raw_hazard_wait ? PKT3_CP_DMA_CMD_RAW_WAIT :
>> 0;
>> +
>> +       assert(size);
>> +       assert((size & ((1<<21)-1)) == size);
>> +
>> +       cs->buf[cs->cdw++] = PKT3(PKT3_CP_DMA, 4, 0);
>> +       cs->buf[cs->cdw++] = src_va;                    /* SRC_ADDR_LO
>> [31:0] */
>> +       cs->buf[cs->cdw++] = sync_flag | ((src_va >> 32) & 0xff); /*
>> CP_SYNC [31] | SRC_ADDR_HI [7:0] */
>> +       cs->buf[cs->cdw++] = dst_va;                    /* DST_ADDR_LO
>> [31:0] */
>> +       cs->buf[cs->cdw++] = (dst_va >> 32) & 0xff;     /* DST_ADDR_HI
>> [7:0] */
>> +       cs->buf[cs->cdw++] = size | raw_wait;           /* COMMAND [29:22]
>> | BYTE_COUNT [20:0] */
>> +}
>> +
>> +/* Emit a CP DMA packet to clear a buffer. The size must fit in bits
>> [20:0]. */
>> +static void si_emit_cp_dma_clear_buffer(struct r600_context *rctx,
>> +                                       uint64_t dst_va, unsigned size,
>> +                                       uint32_t clear_value,
>> +                                       bool sync, bool raw_hazard_wait)
>> +{
>> +       struct radeon_winsys_cs *cs = rctx->cs;
>> +       uint32_t sync_flag = sync ? PKT3_CP_DMA_CP_SYNC : 0;
>> +       uint32_t raw_wait = raw_hazard_wait ? PKT3_CP_DMA_CMD_RAW_WAIT :
>> 0;
>> +
>> +       assert(size);
>> +       assert((size & ((1<<21)-1)) == size);
>> +
>> +       cs->buf[cs->cdw++] = PKT3(PKT3_CP_DMA, 4, 0);
>> +       cs->buf[cs->cdw++] = clear_value;               /* DATA [31:0] */
>> +       cs->buf[cs->cdw++] = sync_flag | PKT3_CP_DMA_SRC_SEL(2); /*
>> CP_SYNC [31] | SRC_SEL[30:29] */
>> +       cs->buf[cs->cdw++] = dst_va;                    /* DST_ADDR_LO
>> [31:0] */
>> +       cs->buf[cs->cdw++] = (dst_va >> 32) & 0xff;     /* DST_ADDR_HI
>> [7:0] */
>> +       cs->buf[cs->cdw++] = size | raw_wait;           /* COMMAND [29:22]
>> | BYTE_COUNT [20:0] */
>> +}
>
>
> Can we use some kind of macro or inline function instead of
> "cs->buf[cs->cdw++] " ? That should help of we need to port that over to a
> different CS mechanism.

How about this?

static INLINE void
r600_write_value(struct radeon_winsys_cs *cs, unsigned value)
{
    cs->buf[cs->cdw++] = value;
}


>
> And IIRC the CP DMA is identical on all chipset generation (maybe excluding
> early R6xx, but I'm not 100% sure of that), so it might be a good idea to
> start sharing code again by putting this under
> "src/gallium/drivers/radeon/radeon_cp_dma.c". Not necessary now, but more as
> a general idea. What do you think?

I agree.

CP DMA is indeed identical on all chipsets. The copying is supported
since R600 and the clearing is supported since Evergreen.

Marek


More information about the mesa-dev mailing list