[Mesa-dev] [PATCH 2/2] mesa/st: pass 4-offset TG4 without lowering if supported

Tue May 6 10:36:34 PDT 2014

On Tue, May 6, 2014 at 1:29 PM, Roland Scheidegger <sroland at vmware.com> wrote:
> Am 06.05.2014 17:03, schrieb Ilia Mirkin:
>> On Tue, May 6, 2014 at 10:48 AM, Roland Scheidegger <sroland at vmware.com> wrote:
>>> Looks good to me.
>>
>> Thanks!
>>
>>> Does that mean if also the GATHER_SM5 cap is supported you have to
>>> support 4 independent, non-constant offsets?
>>
>> Not 100% sure what you're asking... but yes, for ARB_gs5 to work, you
>> have to support independent non-constant offsets. And if you have
>> PIPE_CAP_TEXTURE_GATHER_OFFSETS enabled, you're making the claim that
>> you can handle multiple independent offsets in a single texgather.
>> Without the cap, the 4 offsets get lowered into 4 separate texgathers
>> (with only one of the returned components used).
>>
>> With nvc0, the offsets are passed in via a register, so non-constant
>> is never an issue. And with nv50, the offsets must be immediates (and
>> there can be only 1 set of them), but it also has no hope of
>> supporting all of ARB_gs5.
>>
>>> Would it make sense to reorder the caps so the gather stuff is all
>>> together (now 5 cap bits just for this...)?
>>
>> The quantity of caps for texgather is a little ridiculous. I'm of the
>> opinion that this should be the default behaviour, and it should be up
>> to the driver to lower it into 4 texgathers if it can't handle them
>> directly. Furthermore, this functionality is only available (via GL)
>> with ARB_gs5, which in turn will require a whole bunch of stuff, so I
>> don't know whether the GATHER_SM5 cap is really that useful. And for
>> someone with a DX tracker, this functionality would again not be
>> useful on its own, the rest of SM5 would have to be supported as well
>> (I assume).
>>
>> But that's not what got implemented, and I don't care to modify
>> radeon, which can only support 1 offset at a time. (Although I don't
>> think the radeon impl got pushed...) I anticipate that llvmpipe
>> doesn't care one way or another (perhaps with even a minor preference
>> towards having it all in one instruction).
>>
>> If there's concensus, happy to switch this on by default and get rid
>> of the cap :) [And also get rid of the GATHER_SM5 cap.]
> Well I think the point was that there's really hw which can only do
> simple gather (what d3d10.1 could do or arb_texture_gather would do).
> This hw will not be able to do other stuff from newer gl versions anyway
> so it should not be required to support those new features.

Right. But since that hw will only ever expose ARB_texture_gather and
not ARB_gpu_shader5, it will never receive a TG4 instruciton with
non-const offsets or multiple offsets. So the cap to indicate that
non-const or quad offsets are supported isn't really necessary, since
those will only appear if ARB_gs5 support is claimed, which requires
more than just the texgather stuff. (The
PIPE_CAP_TEXTURE_GATHER_COMPONENTS cap _is_ necessary since it
indicates ARB_texture_gather support, and the value that should be
returned by some GL query about what tex gather supports.)

> I'm not entirely sure to what it's actually lowered but in any case
> llvmpipe if it implemented this definitely would want a non-lowered
> version.

Right now, it'll get lowered to 4 texgathers, with only one of the
returned 4 components used from each one. (And it can't use texfetch
since the min/max offsets are different, and there's probably some
other clever reason as well.)

> I think though some radeon hw could really do SM5 version but
> not independent offsets natively, though I'm not sure if it would really
> be all that complicated to handle it in the driver.

Well, I think the claim was that SM5 doesn't actually support the 4
separate offsets, but GL4 does with textureGatherOffsets(). Also, I
believe that radeon supports non-const natively, just not have 4
offsets in one instruction. Same deal with i965 (which is why that
lowering pass exists in the first place).

> I guess though this could be changed later rather easily.
>
> Roland
>
>
>>
>>>
>>> Roland
>>>
>>> Am 29.04.2014 01:30, schrieb Ilia Mirkin:
>>>> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
>>>> ---
>>>>
>>>> The handling of the 4 offsets is less-than-pretty. I had an alternate version
>>>> that created a new ir_dereference_array object and ran ->accept on that. This
>>>> worked as well, but for each offset it would create a separate new array, and
>>>> then deref just one item out of it. This seems incredibly wasteful. The
>>>> slightly open-coded version of that seems reasonable and uses the same array.
>>>>
>>>>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 55 ++++++++++++++++++++++--------
>>>>  1 file changed, 41 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>> index d1c3856..20d5e99 100644
>>>> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
>>>> @@ -87,8 +87,7 @@ extern "C" {
>>>>   */
>>>>  #define MAX_ARRAYS        256
>>>>
>>>> -/* if we support a native gallium TG4 with the ability to take 4 texoffsets then bump this */
>>>> -#define MAX_GLSL_TEXTURE_OFFSET 1
>>>> +#define MAX_GLSL_TEXTURE_OFFSET 4
>>>>
>>>>  class st_src_reg;
>>>>  class st_dst_reg;
>>>> @@ -2728,12 +2727,13 @@ glsl_to_tgsi_visitor::visit(ir_call *ir)
>>>>  void
>>>>  glsl_to_tgsi_visitor::visit(ir_texture *ir)
>>>>  {
>>>> -   st_src_reg result_src, coord, cube_sc, lod_info, projector, dx, dy, offset, sample_index, component;
>>>> +   st_src_reg result_src, coord, cube_sc, lod_info, projector, dx, dy, offset[MAX_GLSL_TEXTURE_OFFSET], sample_index, component;
>>>>     st_dst_reg result_dst, coord_dst, cube_sc_dst;
>>>>     glsl_to_tgsi_instruction *inst = NULL;
>>>>     unsigned opcode = TGSI_OPCODE_NOP;
>>>>     const glsl_type *sampler_type = ir->sampler->type;
>>>>     bool is_cube_array = false;
>>>> +   unsigned i;
>>>>
>>>>     /* if we are a cube array sampler */
>>>>     if ((sampler_type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE &&
>>>> @@ -2771,7 +2771,7 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir)
>>>>        opcode = (is_cube_array && ir->shadow_comparitor) ? TGSI_OPCODE_TEX2 : TGSI_OPCODE_TEX;
>>>>        if (ir->offset) {
>>>>           ir->offset->accept(this);
>>>> -         offset = this->result;
>>>> +         offset[0] = this->result;
>>>>        }
>>>>        break;
>>>>     case ir_txb:
>>>> @@ -2780,7 +2780,7 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir)
>>>>        lod_info = this->result;
>>>>        if (ir->offset) {
>>>>           ir->offset->accept(this);
>>>> -         offset = this->result;
>>>> +         offset[0] = this->result;
>>>>        }
>>>>        break;
>>>>     case ir_txl:
>>>> @@ -2789,7 +2789,7 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir)
>>>>        lod_info = this->result;
>>>>        if (ir->offset) {
>>>>           ir->offset->accept(this);
>>>> -         offset = this->result;
>>>> +         offset[0] = this->result;
>>>>        }
>>>>        break;
>>>>     case ir_txd:
>>>> @@ -2800,7 +2800,7 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir)
>>>>        dy = this->result;
>>>>        if (ir->offset) {
>>>>           ir->offset->accept(this);
>>>> -         offset = this->result;
>>>> +         offset[0] = this->result;
>>>>        }
>>>>        break;
>>>>     case ir_txs:
>>>> @@ -2814,7 +2814,7 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir)
>>>>        lod_info = this->result;
>>>>        if (ir->offset) {
>>>>           ir->offset->accept(this);
>>>> -         offset = this->result;
>>>> +         offset[0] = this->result;
>>>>        }
>>>>        break;
>>>>     case ir_txf_ms:
>>>> @@ -2828,9 +2828,17 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir)
>>>>        component = this->result;
>>>>        if (ir->offset) {
>>>>           ir->offset->accept(this);
>>>> -         /* this should have been lowered */
>>>> -         assert(ir->offset->type->base_type != GLSL_TYPE_ARRAY);
>>>> -         offset = this->result;
>>>> +         if (ir->offset->type->base_type == GLSL_TYPE_ARRAY) {
>>>> +            const glsl_type *elt_type = ir->offset->type->fields.array;
>>>> +            for (i = 0; i < ir->offset->type->length; i++) {
>>>> +               offset[i] = this->result;
>>>> +               offset[i].index += i * type_size(elt_type);
>>>> +               offset[i].type = elt_type->base_type;
>>>> +               offset[i].swizzle = swizzle_for_size(elt_type->vector_elements);
>>>> +            }
>>>> +         } else {
>>>> +            offset[0] = this->result;
>>>> +         }
>>>>        }
>>>>        break;
>>>>     case ir_lod:
>>>> @@ -2960,8 +2968,9 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir)
>>>>                                                  this->prog);
>>>>
>>>>     if (ir->offset) {
>>>> -      inst->tex_offset_num_offset = 1;
>>>> -      inst->tex_offsets[0] = offset;
>>>> +      for (i = 0; i < MAX_GLSL_TEXTURE_OFFSET && offset[i].file != PROGRAM_UNDEFINED; i++)
>>>> +         inst->tex_offsets[i] = offset[i];
>>>> +      inst->tex_offset_num_offset = i;
>>>>     }
>>>>
>>>>     switch (sampler_type->sampler_dimensionality) {
>>>> @@ -4479,6 +4488,8 @@ translate_tex_offset(struct st_translate *t,
>>>>  {
>>>>     struct tgsi_texture_offset offset;
>>>>     struct ureg_src imm_src;
>>>> +   struct ureg_dst dst;
>>>> +   int array;
>>>>
>>>>     switch (in_offset->file) {
>>>>     case PROGRAM_IMMEDIATE:
>>>> @@ -4500,6 +4511,20 @@ translate_tex_offset(struct st_translate *t,
>>>>        offset.SwizzleZ = GET_SWZ(in_offset->swizzle, 2);
>>>>        offset.Padding = 0;
>>>>        break;
>>>> +   case PROGRAM_ARRAY:
>>>> +      array = in_offset->index >> 16;
>>>> +
>>>> +      assert(array >= 0);
>>>> +      assert(array < (int) Elements(t->arrays));
>>>> +
>>>> +      dst = t->arrays[array];
>>>> +      offset.File = dst.File;
>>>> +      offset.Index = dst.Index + (in_offset->index & 0xFFFF) - 0x8000;
>>>> +      offset.SwizzleX = GET_SWZ(in_offset->swizzle, 0);
>>>> +      offset.SwizzleY = GET_SWZ(in_offset->swizzle, 1);
>>>> +      offset.SwizzleZ = GET_SWZ(in_offset->swizzle, 2);
>>>> +      offset.Padding = 0;
>>>> +      break;
>>>>     default:
>>>>        break;
>>>>     }
>>>> @@ -5350,6 +5375,7 @@ st_new_shader_program(struct gl_context *ctx, GLuint name)
>>>>  GLboolean
>>>>  st_link_shader(struct gl_context *ctx, struct gl_shader_program *prog)
>>>>  {
>>>> +   struct pipe_screen *pscreen = ctx->st->pipe->screen;
>>>>     assert(prog->LinkStatus);
>>>>
>>>>     for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
>>>> @@ -5388,7 +5414,8 @@ st_link_shader(struct gl_context *ctx, struct gl_shader_program *prog)
>>>>           lower_packing_builtins(ir, lower_inst);
>>>>        }
>>>>
>>>> -      lower_offset_arrays(ir);
>>>> +      if (!pscreen->get_param(pscreen, PIPE_CAP_TEXTURE_GATHER_OFFSETS))
>>>> +         lower_offset_arrays(ir);
>>>>        do_mat_op_to_vec(ir);
>>>>        lower_instructions(ir,
>>>>                           MOD_TO_FRACT |
>>>>