[Mesa-dev] [PATCH] radeonsi: add flexible shader descriptor management and use it for sampler views

Marko Ristola marko.ristola at kolumbus.fi
Mon Aug 26 11:59:35 PDT 2013


Hi

15.08.2013 13:54, Marek Olšák wrote:
> On Thu, Aug 15, 2013 at 10:27 AM, Christian König
> <deathsimple at vodafone.de> wrote:
>> Am 15.08.2013 05:25, schrieb Marek Olšák:
>>
>>> (This should be applied before MSAA, which will need to be rebased.)
>>>
>>> It moves all sampler view descriptors to a buffer.
>>> It supports partial resource updates and it can also unbind resources
>>> (required for FMASK texturing).
>>>
>>> The buffer contains all sampler view descriptors for one shader stage,
>>> represented as an array. On top of that, there are N arrays in the buffer,
>>> which are used to emulate context registers as implemented by the previous
>>> ASICs (each array is a context).
>>>
>>> This uses the RCU synchronization approach to avoid read-after-write
>>> hazards
>>> as discussed in the thread:
>>> "radeonsi: add FMASK texture binding slots and resource setup"
>>>
>>> CP DMA is used to clear the descriptors at context initialization and to
>>> copy
>>> the descriptors from one context to the next.
>>>
>>> IMPORTANT:
>>>     128 resource contexts are needed, 64 doesn't work. If I set
>>>     SH_KCACHE_ACTION_ENA before every draw call, only 2 contexts are
>>> needed.
>>>     I don't have an explanation for this.
>>> ---
>>
>>
>> The idea itself looks really good to me, but we should probably also move
>> the all resources and samplers to the new model and then rip out the code
>> that stores them directly into the IB.
>
> I'd like MSAA to land first, but yes, the plan is to eventually move
> all resources and samplers to the new model.
>
>>
>>
>>> +/* Emit a CP DMA packet to do a copy from one buffer to another.
>>> + * The size must fit in bits [20:0]. Notes:
>>> + *
>>> + * 1) Set sync to true if you want the 3D engine to wait until CP DMA is
>>> done.
>>> + *
>>> + * 2) Set raw_hazard_wait to true if the source data was used as a
>>> destination
>>> + *    in a previous CP DMA packet. It's for preventing a read-after-write
>>> hazard
>>> + *    between two CP DMA packets.
>>> + */
>>> +static void si_emit_cp_dma_copy_buffer(struct r600_context *rctx,
>>> +                                      uint64_t dst_va, uint64_t src_va,
>>> +                                      unsigned size,
>>> +                                      bool sync, bool raw_hazard_wait)
>>> +{
>>> +       struct radeon_winsys_cs *cs = rctx->cs;
>>> +       uint32_t sync_flag = sync ? PKT3_CP_DMA_CP_SYNC : 0;
>>> +       uint32_t raw_wait = raw_hazard_wait ? PKT3_CP_DMA_CMD_RAW_WAIT :
>>> 0;
>>> +
>>> +       assert(size);
>>> +       assert((size & ((1<<21)-1)) == size);
>>> +
>>> +       cs->buf[cs->cdw++] = PKT3(PKT3_CP_DMA, 4, 0);
>>> +       cs->buf[cs->cdw++] = src_va;                    /* SRC_ADDR_LO
>>> [31:0] */
>>> +       cs->buf[cs->cdw++] = sync_flag | ((src_va >> 32) & 0xff); /*
>>> CP_SYNC [31] | SRC_ADDR_HI [7:0] */
>>> +       cs->buf[cs->cdw++] = dst_va;                    /* DST_ADDR_LO
>>> [31:0] */
>>> +       cs->buf[cs->cdw++] = (dst_va >> 32) & 0xff;     /* DST_ADDR_HI
>>> [7:0] */
>>> +       cs->buf[cs->cdw++] = size | raw_wait;           /* COMMAND [29:22]
>>> | BYTE_COUNT [20:0] */
>>> +}
>>> +
>>> +/* Emit a CP DMA packet to clear a buffer. The size must fit in bits
>>> [20:0]. */
>>> +static void si_emit_cp_dma_clear_buffer(struct r600_context *rctx,
>>> +                                       uint64_t dst_va, unsigned size,
>>> +                                       uint32_t clear_value,
>>> +                                       bool sync, bool raw_hazard_wait)
>>> +{
>>> +       struct radeon_winsys_cs *cs = rctx->cs;
>>> +       uint32_t sync_flag = sync ? PKT3_CP_DMA_CP_SYNC : 0;
>>> +       uint32_t raw_wait = raw_hazard_wait ? PKT3_CP_DMA_CMD_RAW_WAIT :
>>> 0;
>>> +
>>> +       assert(size);
>>> +       assert((size & ((1<<21)-1)) == size);
>>> +
>>> +       cs->buf[cs->cdw++] = PKT3(PKT3_CP_DMA, 4, 0);
>>> +       cs->buf[cs->cdw++] = clear_value;               /* DATA [31:0] */
>>> +       cs->buf[cs->cdw++] = sync_flag | PKT3_CP_DMA_SRC_SEL(2); /*
>>> CP_SYNC [31] | SRC_SEL[30:29] */
>>> +       cs->buf[cs->cdw++] = dst_va;                    /* DST_ADDR_LO
>>> [31:0] */
>>> +       cs->buf[cs->cdw++] = (dst_va >> 32) & 0xff;     /* DST_ADDR_HI
>>> [7:0] */
>>> +       cs->buf[cs->cdw++] = size | raw_wait;           /* COMMAND [29:22]
>>> | BYTE_COUNT [20:0] */
>>> +}
>>
>>
>> Can we use some kind of macro or inline function instead of
>> "cs->buf[cs->cdw++] " ? That should help of we need to port that over to a
>> different CS mechanism.
>
> How about this?
>
> static INLINE void
> r600_write_value(struct radeon_winsys_cs *cs, unsigned value)
> {
>      cs->buf[cs->cdw++] = value;
> }
>
>
>>
>> And IIRC the CP DMA is identical on all chipset generation (maybe excluding
>> early R6xx, but I'm not 100% sure of that), so it might be a good idea to
>> start sharing code again by putting this under
>> "src/gallium/drivers/radeon/radeon_cp_dma.c". Not necessary now, but more as
>> a general idea. What do you think?
>
> I agree.
>
> CP DMA is indeed identical on all chipsets. The copying is supported
> since R600 and the clearing is supported since Evergreen.

Maybe you already thought: One way to emulate clearing is to
copy with CP DMA from a constant cleared memory area.

Regards,
Marko Ristola

>
> Marek
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>



More information about the mesa-dev mailing list