[Mesa-dev] [PATCH 16/20] radeonsi: add FMASK texture binding slots and resource setup

Christian König deathsimple at vodafone.de
Thu Aug 8 09:57:45 PDT 2013


Am 08.08.2013 16:33, schrieb Marek Olšák:
> On Thu, Aug 8, 2013 at 3:09 PM, Christian König <deathsimple at vodafone.de> wrote:
>> Am 08.08.2013 14:38, schrieb Marek Olšák:
>>
>>> .On Thu, Aug 8, 2013 at 9:47 AM, Christian König
>>> <deathsimple at vodafone.de> wrote:
>>>> Am 08.08.2013 02:20, schrieb Marek Olšák:
>>>>
>>>>> FMASK is bound as a separate texture. For every texture, there can be
>>>>> an FMASK. Therefore a separate array of resource slots has to be added.
>>>>>
>>>>> This adds a new mechanism for emitting resource descriptors, its
>>>>> features
>>>>> are:
>>>>> - resource descriptors are stored in an ordinary buffer (not in a CS)
>>>>
>>>> Having resource descriptors outside of the CS has two problems that we
>>>> need
>>>> to solve first:
>>>>
>>>> 1. Fine grained descriptor updates doesn't work, I already tried that.
>>>> The
>>>> problem is that unlike previous asics descriptors are now a memory block,
>>>> so
>>>> no longer part of the CP context. So when we (for example) have a draw
>>>> command executing and the next draw command is using new resources for a
>>>> specific slot we would either block until the first draw command is
>>>> finished
>>>> (which is bad for performance) or change the descriptors while they are
>>>> still in use (which results in VM faults).
>>> So what would the proper solution be here? Do I need to flush some
>>> caches or would moving the descriptor updates to the constant IB fix
>>> that?
>>
>> Actually the current implementation worked better than anything else I
>> tried.
>>
>> When you really need the resource descriptors in a separate buffer you need
>> to use one buffer for each draw call and always write the full buffer
>> contents (no partial updates). Flushing anything won't really help either..
>> The only solution I see using one buffer is to block until the last draw
>> call is finished with WAIT_REG_MEM, but that would be quite disastrous for
>> performance.
>>
>>
>>>> 2. If my understand is correct when they are embedded the descriptors are
>>>> preloaded into the caches while executing the IB, so to archive the same
>>>> speed with descriptors outside of the IB you need to add additional
>>>> commands
>>>> to the constant IB which is new to SI and we currently doesn't support in
>>>> the CS interface.
>>> There seems to be support for the constant IB. The CS ioctl chunk ID
>>> is RADEON_CHUNK_ID_CONST_IB and the allowed packets are listed in
>>> si_vm_packet3_ce_check. Is there anything missing?
>>
>> The userspace side seems to be missing and except for throwing NOP packets
>> into it we never tested it. I know from the closed source side that it
>> actually was quite tricky for them to get working.
>>
>> Additional to that please note that I'm not 100% sure that just putting the
>> descriptors into the IB is really helping here. It was just the most
>> simplest solution to avoid allocating a new buffer on each draw call.
> I understand. I don't really need to have resource descriptors in a
> separate buffer, all I need is these 3 basic features a gallium driver
> should support:
> - fine-grained resource updates (mainly for performance, see below)
> - ability to unbind resources (e.g. by setting IMG_RSRC_WORD1 to 0)
> - no GPU crash if a shader is using SAMPLER[15] but there are no samplers bound
>
> FYI, partial sampler view and sampler state updates are coming to
> gallium, Brian Paul already has some patches, it's just a matter of
> time now. Vertex and constant buffer states already support partial
> updates.

That shouldn't be to much off a problem.

Just allocate a state at startup and initialize it with the proper pm4 
commands for 16 samplers, then update the resource descriptors in that 
state when we change the bound 
textures/samplers/views/constants/whatever. All we need to do then is 
setting the emitted state to NULL so that it gets re-emitted in the next 
draw command.

Christian.

> Marek
>
>> Cheers,
>> Christian.
>>
>>> Marek
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>
>>>>> - descriptors of disabled resources are set to zeros
>>>>> - fine-grained resource updates (it can update one resource slot while
>>>>> not
>>>>>      touching the other slots)
>>>>> - updates are done with the WRITE_DATA packet
>>>>> - it implements the si_atom interface for packet emission
>>>>> - only used for FMASK textures right now
>>>>>
>>>>> The primary motivation for this is that FMASK textures naturally need
>>>>> fine-grained resource updates and I also need to query in the shader
>>>>> if a resource is NULL.
>>>>> ---
>>>>>     src/gallium/drivers/radeonsi/Makefile.sources  |   1 +
>>>>>     src/gallium/drivers/radeonsi/r600_hw_context.c |   3 +
>>>>>     src/gallium/drivers/radeonsi/r600_resource.h   |   1 +
>>>>>     src/gallium/drivers/radeonsi/r600_texture.c    |   1 +
>>>>>     src/gallium/drivers/radeonsi/radeonsi_pipe.c   |   9 +-
>>>>>     src/gallium/drivers/radeonsi/radeonsi_pipe.h   |   6 +-
>>>>>     src/gallium/drivers/radeonsi/radeonsi_pm4.c    |   7 +
>>>>>     src/gallium/drivers/radeonsi/radeonsi_pm4.h    |   2 +
>>>>>     src/gallium/drivers/radeonsi/si_descriptors.c  | 188
>>>>> +++++++++++++++++++++++++
>>>>>     src/gallium/drivers/radeonsi/si_state.c        |  58 +++++++-
>>>>>     src/gallium/drivers/radeonsi/si_state.h        |  36 +++++
>>>>>     11 files changed, 305 insertions(+), 7 deletions(-)
>>>>>     create mode 100644 src/gallium/drivers/radeonsi/si_descriptors.c
>>>>>
>>>>> diff --git a/src/gallium/drivers/radeonsi/Makefile.sources
>>>>> b/src/gallium/drivers/radeonsi/Makefile.sources
>>>>> index b3ffa72..68c8282 100644
>>>>> --- a/src/gallium/drivers/radeonsi/Makefile.sources
>>>>> +++ b/src/gallium/drivers/radeonsi/Makefile.sources
>>>>> @@ -10,6 +10,7 @@ C_SOURCES := \
>>>>>           r600_translate.c \
>>>>>           radeonsi_pm4.c \
>>>>>           radeonsi_compute.c \
>>>>> +       si_descriptors.c \
>>>>>           si_state.c \
>>>>>           si_state_streamout.c \
>>>>>           si_state_draw.c \
>>>>> diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c
>>>>> b/src/gallium/drivers/radeonsi/r600_hw_context.c
>>>>> index 7ed7496..b595477 100644
>>>>> --- a/src/gallium/drivers/radeonsi/r600_hw_context.c
>>>>> +++ b/src/gallium/drivers/radeonsi/r600_hw_context.c
>>>>> @@ -289,6 +289,9 @@ void si_context_flush(struct r600_context *ctx,
>>>>> unsigned flags)
>>>>>            * next draw command
>>>>>            */
>>>>>           si_pm4_reset_emitted(ctx);
>>>>> +
>>>>> +       si_sampler_views_begin_new_cs(ctx,
>>>>> &ctx->fmask_sampler_views[PIPE_SHADER_VERTEX]);
>>>>> +       si_sampler_views_begin_new_cs(ctx,
>>>>> &ctx->fmask_sampler_views[PIPE_SHADER_FRAGMENT]);
>>>>>     }
>>>>>       void si_context_emit_fence(struct r600_context *ctx, struct
>>>>> si_resource *fence_bo, unsigned offset, unsigned value)
>>>>> diff --git a/src/gallium/drivers/radeonsi/r600_resource.h
>>>>> b/src/gallium/drivers/radeonsi/r600_resource.h
>>>>> index e5dd36a..ab5c7b7 100644
>>>>> --- a/src/gallium/drivers/radeonsi/r600_resource.h
>>>>> +++ b/src/gallium/drivers/radeonsi/r600_resource.h
>>>>> @@ -44,6 +44,7 @@ struct r600_fmask_info {
>>>>>           unsigned offset;
>>>>>           unsigned size;
>>>>>           unsigned alignment;
>>>>> +       unsigned pitch;
>>>>>           unsigned bank_height;
>>>>>           unsigned slice_tile_max;
>>>>>           unsigned tile_mode_index;
>>>>> diff --git a/src/gallium/drivers/radeonsi/r600_texture.c
>>>>> b/src/gallium/drivers/radeonsi/r600_texture.c
>>>>> index cd3d1aa..b613564 100644
>>>>> --- a/src/gallium/drivers/radeonsi/r600_texture.c
>>>>> +++ b/src/gallium/drivers/radeonsi/r600_texture.c
>>>>> @@ -463,6 +463,7 @@ static void r600_texture_get_fmask_info(struct
>>>>> r600_screen *rscreen,
>>>>>                   out->slice_tile_max -= 1;
>>>>>           out->tile_mode_index = fmask.tiling_index[0];
>>>>> +       out->pitch = fmask.level[0].nblk_x;
>>>>>           out->bank_height = fmask.bankh;
>>>>>           out->alignment = MAX2(256, fmask.bo_alignment);
>>>>>           out->size = fmask.bo_size;
>>>>> diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.c
>>>>> b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
>>>>> index ad955e3..3112124 100644
>>>>> --- a/src/gallium/drivers/radeonsi/radeonsi_pipe.c
>>>>> +++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
>>>>> @@ -178,6 +178,9 @@ static void r600_destroy_context(struct pipe_context
>>>>> *context)
>>>>>     {
>>>>>           struct r600_context *rctx = (struct r600_context *)context;
>>>>>     +
>>>>>
>>>>> si_release_sampler_views(&rctx->fmask_sampler_views[PIPE_SHADER_VERTEX]);
>>>>> +
>>>>>
>>>>> si_release_sampler_views(&rctx->fmask_sampler_views[PIPE_SHADER_FRAGMENT]);
>>>>> +
>>>>>           si_resource_reference(&rctx->border_color_table, NULL);
>>>>>           if (rctx->dummy_pixel_shader) {
>>>>> @@ -233,12 +236,16 @@ static struct pipe_context
>>>>> *r600_create_context(struct pipe_screen *screen, void
>>>>>                   rctx->context.create_video_buffer =
>>>>> vl_video_buffer_create;
>>>>>           }
>>>>>     +     rctx->cs = rctx->ws->cs_create(rctx->ws, RING_GFX, NULL);
>>>>> +
>>>>> +       si_init_sampler_views(rctx,
>>>>> &rctx->fmask_sampler_views[PIPE_SHADER_VERTEX]);
>>>>> +       si_init_sampler_views(rctx,
>>>>> &rctx->fmask_sampler_views[PIPE_SHADER_FRAGMENT]);
>>>>> +
>>>>>           switch (rctx->chip_class) {
>>>>>           case SI:
>>>>>           case CIK:
>>>>>                   si_init_state_functions(rctx);
>>>>>                   LIST_INITHEAD(&rctx->active_query_list);
>>>>> -               rctx->cs = rctx->ws->cs_create(rctx->ws, RING_GFX,
>>>>> NULL);
>>>>>                   rctx->max_db = 8;
>>>>>                   si_init_config(rctx);
>>>>>                   break;
>>>>> diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.h
>>>>> b/src/gallium/drivers/radeonsi/radeonsi_pipe.h
>>>>> index 5fa9bdc..fd4ca53 100644
>>>>> --- a/src/gallium/drivers/radeonsi/radeonsi_pipe.h
>>>>> +++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.h
>>>>> @@ -83,6 +83,7 @@ struct si_pipe_sampler_view {
>>>>>           struct pipe_sampler_view        base;
>>>>>           struct si_resource              *resource;
>>>>>           uint32_t                        state[8];
>>>>> +       uint32_t                        fmask_state[8];
>>>>>     };
>>>>>       struct si_pipe_sampler_state {
>>>>> @@ -94,9 +95,6 @@ struct si_cs_shader_state {
>>>>>           struct si_pipe_compute          *program;
>>>>>     };
>>>>>     -/* needed for blitter save */
>>>>> -#define NUM_TEX_UNITS 16
>>>>> -
>>>>>     struct r600_textures_info {
>>>>>           struct si_pipe_sampler_view     *views[NUM_TEX_UNITS];
>>>>>           struct si_pipe_sampler_state    *samplers[NUM_TEX_UNITS];
>>>>> @@ -149,6 +147,8 @@ struct r600_context {
>>>>>           struct si_atom                  *atoms[SI_MAX_ATOMS];
>>>>>           unsigned                        num_atoms;
>>>>>     +     struct si_sampler_views
>>>>> fmask_sampler_views[PIPE_SHADER_TYPES];
>>>>> +
>>>>>           struct si_vertex_element        *vertex_elements;
>>>>>           struct pipe_framebuffer_state   framebuffer;
>>>>>           unsigned                        fb_log_samples;
>>>>> diff --git a/src/gallium/drivers/radeonsi/radeonsi_pm4.c
>>>>> b/src/gallium/drivers/radeonsi/radeonsi_pm4.c
>>>>> index bbc62d3..d404d41 100644
>>>>> --- a/src/gallium/drivers/radeonsi/radeonsi_pm4.c
>>>>> +++ b/src/gallium/drivers/radeonsi/radeonsi_pm4.c
>>>>> @@ -91,6 +91,13 @@ void si_pm4_set_reg(struct si_pm4_state *state,
>>>>> unsigned reg, uint32_t val)
>>>>>           si_pm4_cmd_end(state, false);
>>>>>     }
>>>>>     +void si_pm4_set_reg_pointer(struct si_pm4_state *state, unsigned
>>>>> reg,
>>>>> +                           uint64_t va)
>>>>> +{
>>>>> +       si_pm4_set_reg(state, reg, va);
>>>>> +       si_pm4_set_reg(state, reg + 4, va >> 32);
>>>>> +}
>>>>> +
>>>>>     void si_pm4_add_bo(struct si_pm4_state *state,
>>>>>                        struct si_resource *bo,
>>>>>                        enum radeon_bo_usage usage)
>>>>> diff --git a/src/gallium/drivers/radeonsi/radeonsi_pm4.h
>>>>> b/src/gallium/drivers/radeonsi/radeonsi_pm4.h
>>>>> index 68aa36a..a5e91f9 100644
>>>>> --- a/src/gallium/drivers/radeonsi/radeonsi_pm4.h
>>>>> +++ b/src/gallium/drivers/radeonsi/radeonsi_pm4.h
>>>>> @@ -70,6 +70,8 @@ void si_pm4_cmd_add(struct si_pm4_state *state,
>>>>> uint32_t
>>>>> dw);
>>>>>     void si_pm4_cmd_end(struct si_pm4_state *state, bool predicate);
>>>>>       void si_pm4_set_reg(struct si_pm4_state *state, unsigned reg,
>>>>> uint32_t
>>>>> val);
>>>>> +void si_pm4_set_reg_pointer(struct si_pm4_state *state, unsigned reg,
>>>>> +                           uint64_t va);
>>>>>     void si_pm4_add_bo(struct si_pm4_state *state,
>>>>>                      struct si_resource *bo,
>>>>>                      enum radeon_bo_usage usage);
>>>>> diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c
>>>>> b/src/gallium/drivers/radeonsi/si_descriptors.c
>>>>> new file mode 100644
>>>>> index 0000000..84453f1
>>>>> --- /dev/null
>>>>> +++ b/src/gallium/drivers/radeonsi/si_descriptors.c
>>>>> @@ -0,0 +1,188 @@
>>>>> +/*
>>>>> + * Copyright 2013 Advanced Micro Devices, Inc.
>>>>> + *
>>>>> + * Permission is hereby granted, free of charge, to any person
>>>>> obtaining
>>>>> a
>>>>> + * copy of this software and associated documentation files (the
>>>>> "Software"),
>>>>> + * to deal in the Software without restriction, including without
>>>>> limitation
>>>>> + * on the rights to use, copy, modify, merge, publish, distribute, sub
>>>>> + * license, and/or sell copies of the Software, and to permit persons
>>>>> to
>>>>> whom
>>>>> + * the Software is furnished to do so, subject to the following
>>>>> conditions:
>>>>> + *
>>>>> + * The above copyright notice and this permission notice (including the
>>>>> next
>>>>> + * paragraph) shall be included in all copies or substantial portions
>>>>> of
>>>>> the
>>>>> + * Software.
>>>>> + *
>>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>>>>> EXPRESS OR
>>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>>>>> MERCHANTABILITY,
>>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT
>>>>> SHALL
>>>>> + * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
>>>>> + * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT
>>>>> OR
>>>>> + * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE
>>>>> OR
>>>>> THE
>>>>> + * USE OR OTHER DEALINGS IN THE SOFTWARE.
>>>>> + *
>>>>> + * Authors:
>>>>> + *      Marek Olšák <marek.olsak at amd.com>
>>>>> + */
>>>>> +
>>>>> +#include "radeonsi_pipe.h"
>>>>> +#include "radeonsi_resource.h"
>>>>> +#include "r600_hw_context_priv.h"
>>>>> +
>>>>> +#include "util/u_memory.h"
>>>>> +
>>>>> +
>>>>> +static void si_init_descriptors(struct r600_context *rctx,
>>>>> +                               struct si_descriptors *desc,
>>>>> +                               unsigned element_dw_size,
>>>>> +                               unsigned num_elements,
>>>>> +                               void (*emit_func)(struct r600_context
>>>>> *ctx, struct si_atom *state))
>>>>> +{
>>>>> +       void *map;
>>>>> +
>>>>> +       desc->atom.emit = emit_func;
>>>>> +       desc->element_dw_size = element_dw_size;
>>>>> +       desc->num_elements = num_elements;
>>>>> +       desc->buffer = (struct si_resource*)
>>>>> +                           pipe_buffer_create(rctx->context.screen,
>>>>> PIPE_BIND_CUSTOM,
>>>>> +                                              PIPE_USAGE_STATIC,
>>>>> +                                              num_elements *
>>>>> element_dw_size * 4);
>>>>> +
>>>>> +       map = rctx->ws->buffer_map(desc->buffer->cs_buf, NULL,
>>>>> PIPE_TRANSFER_WRITE);
>>>>> +       memset(map, 0, desc->buffer->b.b.width0);
>>>>> +
>>>>> +       r600_context_bo_reloc(rctx, desc->buffer,
>>>>> RADEON_USAGE_READWRITE);
>>>>> +       si_add_atom(rctx, &desc->atom);
>>>>> +}
>>>>> +
>>>>> +static void si_release_descriptors(struct si_descriptors *desc)
>>>>> +{
>>>>> +       pipe_resource_reference((struct pipe_resource**)&desc->buffer,
>>>>> NULL);
>>>>> +}
>>>>> +
>>>>> +static void si_update_descriptors(struct si_descriptors *desc)
>>>>> +{
>>>>> +       if (desc->dirty_mask) {
>>>>> +               desc->atom.num_dw = (4 + desc->element_dw_size) *
>>>>> +                                   util_bitcount(desc->dirty_mask);
>>>>> +               desc->atom.dirty = true;
>>>>> +       }
>>>>> +}
>>>>> +
>>>>> +static void si_emit_descriptors(struct r600_context *rctx,
>>>>> +                               struct si_descriptors *desc,
>>>>> +                               const uint32_t **descriptors)
>>>>> +{
>>>>> +       struct radeon_winsys_cs *cs = rctx->cs;
>>>>> +       uint64_t va_base;
>>>>> +       int packet_start;
>>>>> +       int packet_size = 0;
>>>>> +       int last_index = desc->num_elements;
>>>>> +       unsigned dirty_mask = desc->dirty_mask;
>>>>> +
>>>>> +       va_base = r600_resource_va(rctx->context.screen,
>>>>> &desc->buffer->b.b);
>>>>> +
>>>>> +       while (dirty_mask) {
>>>>> +               int i = u_bit_scan(&dirty_mask);
>>>>> +
>>>>> +               assert(i < desc->num_elements);
>>>>> +
>>>>> +               if (last_index+1 == i && packet_size) {
>>>>> +                       /* Append new data at the end of the last
>>>>> packet..
>>>>>
>>>>> */
>>>>> +                       packet_size += desc->element_dw_size;
>>>>> +                       cs->buf[packet_start] = PKT3(PKT3_WRITE_DATA,
>>>>> packet_size, 0);
>>>>> +               } else {
>>>>> +                       /* Start a new packet. */
>>>>> +                       uint64_t va = va_base + i *
>>>>> desc->element_dw_size
>>>>> * 4;
>>>>> +
>>>>> +                       packet_start = cs->cdw;
>>>>> +                       packet_size = 2 + desc->element_dw_size;
>>>>> +
>>>>> +                       cs->buf[cs->cdw++] = PKT3(PKT3_WRITE_DATA,
>>>>> packet_size, 0);
>>>>> +                       cs->buf[cs->cdw++] =
>>>>> PKT3_WRITE_DATA_DST_SEL(PKT3_WRITE_DATA_DST_SEL_MEM_SYNC) |
>>>>> +                                            PKT3_WRITE_DATA_WR_CONFIRM
>>>>> |
>>>>> +
>>>>> PKT3_WRITE_DATA_ENGINE_SEL(PKT3_WRITE_DATA_ENGINE_SEL_ME);
>>>>> +                       cs->buf[cs->cdw++] = va & 0xFFFFFFFFUL;
>>>>> +                       cs->buf[cs->cdw++] = (va >> 32UL) &
>>>>> 0xFFFFFFFFUL;
>>>>> +               }
>>>>> +
>>>>> +               memcpy(cs->buf+cs->cdw, descriptors[i],
>>>>> desc->element_dw_size * 4);
>>>>> +               cs->cdw += desc->element_dw_size;
>>>>> +
>>>>> +               last_index = i;
>>>>> +       }
>>>>> +       desc->dirty_mask = 0;
>>>>> +}
>>>>> +
>>>>> +/* SAMPLER VIEWS */
>>>>> +
>>>>> +static void si_emit_sampler_views(struct r600_context *rctx, struct
>>>>> si_atom *atom)
>>>>> +{
>>>>> +       struct si_sampler_views *views = (struct si_sampler_views*)atom;
>>>>> +
>>>>> +       si_emit_descriptors(rctx, &views->desc, views->desc_data);
>>>>> +}
>>>>> +
>>>>> +void si_init_sampler_views(struct r600_context *rctx, struct
>>>>> si_sampler_views *views)
>>>>> +{
>>>>> +       si_init_descriptors(rctx, &views->desc, 8, 16,
>>>>> +                           si_emit_sampler_views);
>>>>> +}
>>>>> +
>>>>> +void si_release_sampler_views(struct si_sampler_views *views)
>>>>> +{
>>>>> +       int i;
>>>>> +
>>>>> +       for (i = 0; i < Elements(views->views); i++) {
>>>>> +               pipe_sampler_view_reference(&views->views[i], NULL);
>>>>> +       }
>>>>> +       si_release_descriptors(&views->desc);
>>>>> +}
>>>>> +
>>>>> +void si_update_sampler_views(struct si_sampler_views *views)
>>>>> +{
>>>>> +       si_update_descriptors(&views->desc);
>>>>> +}
>>>>> +
>>>>> +void si_sampler_views_begin_new_cs(struct r600_context *rctx, struct
>>>>> si_sampler_views *views)
>>>>> +{
>>>>> +       unsigned mask = views->desc.enabled_mask;
>>>>> +
>>>>> +       /* Add relocations to the CS. */
>>>>> +       while (mask) {
>>>>> +               int i = u_bit_scan(&mask);
>>>>> +               struct si_pipe_sampler_view *rview =
>>>>> +                       (struct si_pipe_sampler_view*)views->views[i];
>>>>> +
>>>>> +               r600_context_bo_reloc(rctx, rview->resource,
>>>>> RADEON_USAGE_READ);
>>>>> +       }
>>>>> +
>>>>> +       r600_context_bo_reloc(rctx, views->desc.buffer,
>>>>> RADEON_USAGE_READWRITE);
>>>>> +}
>>>>> +
>>>>> +void si_set_fmask_sampler_view(struct r600_context *rctx, unsigned
>>>>> shader,
>>>>> +                              unsigned slot, struct pipe_sampler_view
>>>>> *view)
>>>>> +{
>>>>> +       static const uint32_t null_desc[8];
>>>>> +       struct si_sampler_views *views =
>>>>> &rctx->fmask_sampler_views[shader];
>>>>> +
>>>>> +       if (views->views[slot] == view)
>>>>> +               return;
>>>>> +
>>>>> +       if (view) {
>>>>> +               struct si_pipe_sampler_view *rview =
>>>>> +                       (struct si_pipe_sampler_view*)view;
>>>>> +
>>>>> +               r600_context_bo_reloc(rctx, rview->resource,
>>>>> RADEON_USAGE_READ);
>>>>> +
>>>>> +               pipe_sampler_view_reference(&views->views[slot], view);
>>>>> +               views->desc_data[slot] = rview->fmask_state;
>>>>> +               views->desc.enabled_mask |= 1 << slot;
>>>>> +       } else {
>>>>> +               pipe_sampler_view_reference(&views->views[slot], NULL);
>>>>> +               views->desc_data[slot] = null_desc;
>>>>> +               views->desc.enabled_mask &= ~(1 << slot);
>>>>> +       }
>>>>> +
>>>>> +       views->desc.dirty_mask |= 1 << slot;
>>>>> +       si_update_sampler_views(views);
>>>>> +}
>>>>> diff --git a/src/gallium/drivers/radeonsi/si_state.c
>>>>> b/src/gallium/drivers/radeonsi/si_state.c
>>>>> index 6965745..1cc0813 100644
>>>>> --- a/src/gallium/drivers/radeonsi/si_state.c
>>>>> +++ b/src/gallium/drivers/radeonsi/si_state.c
>>>>> @@ -2701,6 +2701,44 @@ static struct pipe_sampler_view
>>>>> *si_create_sampler_view(struct pipe_context *ctx
>>>>>           view->state[6] = 0;
>>>>>           view->state[7] = 0;
>>>>>     +     /* Initialize the sampler view for FMASK. */
>>>>> +       if (tmp->fmask.size) {
>>>>> +               uint64_t va = r600_resource_va(ctx->screen, texture) +
>>>>> tmp->fmask.offset;
>>>>> +               uint32_t fmask_format;
>>>>> +
>>>>> +               switch (texture->nr_samples) {
>>>>> +               case 2:
>>>>> +                       fmask_format =
>>>>> V_008F14_IMG_DATA_FORMAT_FMASK8_S2_F2;
>>>>> +                       break;
>>>>> +               case 4:
>>>>> +                       fmask_format =
>>>>> V_008F14_IMG_DATA_FORMAT_FMASK8_S4_F4;
>>>>> +                       break;
>>>>> +               case 8:
>>>>> +                       fmask_format =
>>>>> V_008F14_IMG_DATA_FORMAT_FMASK32_S8_F8;
>>>>> +                       break;
>>>>> +               default:
>>>>> +                       assert(0);
>>>>> +               }
>>>>> +
>>>>> +               view->fmask_state[0] = va >> 8;
>>>>> +               view->fmask_state[1] = S_008F14_BASE_ADDRESS_HI(va >>
>>>>> 40)
>>>>> |
>>>>> +
>>>>> S_008F14_DATA_FORMAT(fmask_format)
>>>>> |
>>>>> +
>>>>> S_008F14_NUM_FORMAT(V_008F14_IMG_NUM_FORMAT_UINT);
>>>>> +               view->fmask_state[2] = S_008F18_WIDTH(width - 1) |
>>>>> +                                      S_008F18_HEIGHT(height - 1);
>>>>> +               view->fmask_state[3] =
>>>>> S_008F1C_DST_SEL_X(V_008F1C_SQ_SEL_X) |
>>>>> +
>>>>> S_008F1C_DST_SEL_Y(V_008F1C_SQ_SEL_X) |
>>>>> +
>>>>> S_008F1C_DST_SEL_Z(V_008F1C_SQ_SEL_X) |
>>>>> +
>>>>> S_008F1C_DST_SEL_W(V_008F1C_SQ_SEL_X) |
>>>>> +
>>>>> S_008F1C_TILING_INDEX(tmp->fmask.tile_mode_index) |
>>>>> +
>>>>> S_008F1C_TYPE(si_tex_dim(texture->target, 0));
>>>>> +               view->fmask_state[4] = S_008F20_PITCH(tmp->fmask.pitch -
>>>>> 1);
>>>>> +               view->fmask_state[5] =
>>>>> S_008F24_BASE_ARRAY(state->u.tex.first_layer) |
>>>>> +
>>>>> S_008F24_LAST_ARRAY(state->u.tex.last_layer);
>>>>> +               view->fmask_state[6] = 0;
>>>>> +               view->fmask_state[7] = 0;
>>>>> +       }
>>>>> +
>>>>>           return &view->base;
>>>>>     }
>>>>>     @@ -2775,7 +2813,7 @@ static void *si_create_sampler_state(struct
>>>>> pipe_context *ctx,
>>>>>     }
>>>>>       static struct si_pm4_state *si_set_sampler_views(struct
>>>>> r600_context
>>>>> *rctx,
>>>>> -                                                unsigned count,
>>>>> +                                                unsigned shader,
>>>>> unsigned
>>>>> count,
>>>>>                                                    struct
>>>>> pipe_sampler_view
>>>>> **views,
>>>>>                                                    struct
>>>>> r600_textures_info
>>>>> *samplers,
>>>>>                                                    unsigned
>>>>> user_data_reg)
>>>>> @@ -2812,6 +2850,9 @@ static struct si_pm4_state
>>>>> *si_set_sampler_views(struct r600_context *rctx,
>>>>>                           } else {
>>>>>                                   samplers->compressed_colortex_mask &=
>>>>> ~(1
>>>>> << i);
>>>>>                           }
>>>>> +
>>>>> +                       si_set_fmask_sampler_view(rctx, shader, i,
>>>>> +                                                 rtex->fmask.size ?
>>>>> views[i] : NULL);
>>>>>                   } else {
>>>>>                           samplers->depth_texture_mask &= ~(1 << i);
>>>>>                           samplers->compressed_colortex_mask &= ~(1 <<
>>>>> i);
>>>>> @@ -2827,6 +2868,7 @@ static struct si_pm4_state
>>>>> *si_set_sampler_views(struct r600_context *rctx,
>>>>>                           pipe_sampler_view_reference((struct
>>>>> pipe_sampler_view **)&samplers->views[i], NULL);
>>>>>                           samplers->depth_texture_mask &= ~(1 << i);
>>>>>                           samplers->compressed_colortex_mask &= ~(1 <<
>>>>> i);
>>>>> +                       si_set_fmask_sampler_view(rctx, shader, i,
>>>>> NULL);
>>>>>                   }
>>>>>           }
>>>>>     @@ -2843,7 +2885,7 @@ static void si_set_vs_sampler_views(struct
>>>>> pipe_context *ctx, unsigned count,
>>>>>           struct r600_context *rctx = (struct r600_context *)ctx;
>>>>>           struct si_pm4_state *pm4;
>>>>>     -     pm4 = si_set_sampler_views(rctx, count, views,
>>>>> &rctx->vs_samplers,
>>>>> +       pm4 = si_set_sampler_views(rctx, PIPE_SHADER_VERTEX, count,
>>>>> views,
>>>>> &rctx->vs_samplers,
>>>>>                               R_00B130_SPI_SHADER_USER_DATA_VS_0);
>>>>>           si_pm4_set_state(rctx, vs_sampler_views, pm4);
>>>>>     }
>>>>> @@ -2854,7 +2896,7 @@ static void si_set_ps_sampler_views(struct
>>>>> pipe_context *ctx, unsigned count,
>>>>>           struct r600_context *rctx = (struct r600_context *)ctx;
>>>>>           struct si_pm4_state *pm4;
>>>>>     -     pm4 = si_set_sampler_views(rctx, count, views,
>>>>> &rctx->ps_samplers,
>>>>> +       pm4 = si_set_sampler_views(rctx, PIPE_SHADER_FRAGMENT, count,
>>>>> views, &rctx->ps_samplers,
>>>>>                                     R_00B030_SPI_SHADER_USER_DATA_PS_0);
>>>>>           si_pm4_set_state(rctx, ps_sampler_views, pm4);
>>>>>     }
>>>>> @@ -3292,5 +3334,15 @@ void si_init_config(struct r600_context *rctx)
>>>>>                   }
>>>>>           }
>>>>>     +     si_pm4_set_reg_pointer(pm4,
>>>>> +               R_00B130_SPI_SHADER_USER_DATA_VS_0 +
>>>>> SI_SGPR_FMASK_RESOURCE * 4,
>>>>> +               r600_resource_va(rctx->context.screen,
>>>>> +
>>>>> &rctx->fmask_sampler_views[PIPE_SHADER_VERTEX].desc.buffer->b.b));
>>>>> +
>>>>> +       si_pm4_set_reg_pointer(pm4,
>>>>> +               R_00B030_SPI_SHADER_USER_DATA_PS_0 +
>>>>> SI_SGPR_FMASK_RESOURCE * 4,
>>>>> +               r600_resource_va(rctx->context.screen,
>>>>> +
>>>>> &rctx->fmask_sampler_views[PIPE_SHADER_FRAGMENT].desc.buffer->b.b));
>>>>> +
>>>>>           si_pm4_set_state(rctx, init, pm4);
>>>>>     }
>>>>> diff --git a/src/gallium/drivers/radeonsi/si_state.h
>>>>> b/src/gallium/drivers/radeonsi/si_state.h
>>>>> index 4aabdef..9a89d8f 100644
>>>>> --- a/src/gallium/drivers/radeonsi/si_state.h
>>>>> +++ b/src/gallium/drivers/radeonsi/si_state.h
>>>>> @@ -116,6 +116,34 @@ union si_state {
>>>>>           struct si_pm4_state     *array[0];
>>>>>     };
>>>>>     +#define NUM_TEX_UNITS 16
>>>>> +
>>>>> +/* This represents resource descriptors in memory, such as buffer
>>>>> resources,
>>>>> + * image resources, and sampler states.
>>>>> + */
>>>>> +struct si_descriptors {
>>>>> +       struct si_atom atom;
>>>>> +
>>>>> +       /* The size of one resource descriptor. */
>>>>> +       unsigned element_dw_size;
>>>>> +       /* The maximum number of resource descriptors. */
>>>>> +       unsigned num_elements;
>>>>> +
>>>>> +       /* The buffer where resource descriptors are stored. */
>>>>> +       struct si_resource *buffer;
>>>>> +
>>>>> +       /* The i-th bit is set if that element is dirty (changed but not
>>>>> emitted). */
>>>>> +       unsigned dirty_mask;
>>>>> +       /* The i-th bit is set if that element is enabled (non-NULL
>>>>> resource). */
>>>>> +       unsigned enabled_mask;
>>>>> +};
>>>>> +
>>>>> +struct si_sampler_views {
>>>>> +       struct si_descriptors           desc;
>>>>> +       struct pipe_sampler_view        *views[NUM_TEX_UNITS];
>>>>> +       const uint32_t                  *desc_data[NUM_TEX_UNITS];
>>>>> +};
>>>>> +
>>>>>     #define si_pm4_block_idx(member) \
>>>>>           (offsetof(union si_state, named.member) / sizeof(struct
>>>>> si_pm4_state *))
>>>>>     @@ -146,6 +174,14 @@ union si_state {
>>>>>                   } \
>>>>>           } while(0)
>>>>>     +/* si_descriptors.c */
>>>>> +void si_init_sampler_views(struct r600_context *rctx, struct
>>>>> si_sampler_views *views);
>>>>> +void si_release_sampler_views(struct si_sampler_views *views);
>>>>> +void si_update_sampler_views(struct si_sampler_views *views);
>>>>> +void si_sampler_views_begin_new_cs(struct r600_context *rctx, struct
>>>>> si_sampler_views *views);
>>>>> +void si_set_fmask_sampler_view(struct r600_context *rctx, unsigned
>>>>> shader,
>>>>> +                              unsigned slot, struct pipe_sampler_view
>>>>> *view);
>>>>> +
>>>>>     /* si_state.c */
>>>>>     struct si_pipe_shader_selector;
>>>>>



More information about the mesa-dev mailing list