[Mesa-dev] [PATCH 18/25] i965: Import surface lowering code.

Thu Jan 2 19:58:50 PST 2014

On 2 December 2013 11:39, Francisco Jerez <currojerez at riseup.net> wrote:

> diff --git a/src/mesa/drivers/dri/i965/brw_surface_visitor.cpp
> b/src/mesa/drivers/dri/i965/brw_surface_visitor.cpp
> new file mode 100644
> index 0000000..07511b5
> --- /dev/null
> +++ b/src/mesa/drivers/dri/i965/brw_surface_visitor.cpp
> @@ -0,0 +1,1208 @@
> +/*
> + * Copyright 2013 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
> "Software"),
> + * to deal in the Software without restriction, including without
> limitation
> + * the rights to use, copy, modify, merge, publish, distribute,
> sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the
> next
> + * paragraph) shall be included in all copies or substantial portions of
> the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
> SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS
> + * IN THE SOFTWARE.
> + *
> + * Authors:
> + *    Francisco Jerez <currojerez at riseup.net>
> + */
> +
> +#include "brw_surface_visitor.h"
> +#include "brw_context.h"
> +
> +brw_surface_visitor::brw_surface_visitor(backend_visitor *v) :
> +   v(v)
> +{
> +}
> +
> +void
> +brw_surface_visitor::visit_atomic_counter_intrinsic(ir_call *ir) const
> +{
> +   const char *callee = ir->callee->function_name();
> +   ir_dereference *deref = static_cast<ir_dereference *>(
> +      ir->actual_parameters.get_head());
> +   const backend_reg offset = v->visit_result(deref);
> +   const backend_reg surface =
> +      brw_imm_ud(v->stage_prog_data->binding_table.abo_start +
> +                 deref->variable_referenced()->atomic.buffer_index);
> +   backend_reg tmp;
> +
> +   if (!strcmp("__intrinsic_atomic_read", callee)) {
> +      tmp = emit_untyped_read(backend_reg(), surface, offset, 1, 1);
> +
> +   } else if (!strcmp("__intrinsic_atomic_increment", callee)) {
> +      tmp = emit_untyped_atomic(backend_reg(), surface, offset,
> +                                backend_reg(), backend_reg(),
> +                                1, BRW_AOP_INC);
> +
> +   } else if (!strcmp("__intrinsic_atomic_predecrement", callee)) {
> +      tmp = emit_untyped_atomic(backend_reg(), surface, offset,
> +                                backend_reg(), backend_reg(),
> +                                1, BRW_AOP_PREDEC);
> +   }
>

Are these the only calls to emit_untyped_atomic()?  If so, why do the flag,
src0, and src1 parameters to this function exist, if we always populate
them with backend_reg() (causing the logic pertaining to them to be
skipped)?

> +
> +   if (ir->return_deref) {
> +      backend_reg dst = v->visit_result(ir->return_deref);
> +      emit_assign_vector(dst, tmp, 1);
> +   }
> +}
> +
> +namespace {
> +   /**
> +    * Process the parameters passed to an image intrinsic call.
> +    */
>

This sounds like a description of a function, not a structure.  How about
just "The parameters passed to an image intrinsic call"?

> +   struct image_intrinsic_parameters {
> +      image_intrinsic_parameters(backend_visitor *v, ir_call *ir)
> +      {
> +         exec_list_iterator it = ir->actual_parameters.iterator();
> +
> +         image_var = static_cast<ir_dereference *>(it.get())->
> +            variable_referenced();
> +
> +         image = visit_next(v, it);
> +         addr = visit_next(v, it);
> +
> +         if (image_var->type->fields.image.dimension == GLSL_IMAGE_DIM_MS)
> +            sample = visit_next(v, it);
> +
> +         for (int i = 0; it.has_next(); ++i)
> +            src[i] = visit_next(v, it);
>

A bug in the front end could cause us to overflow the src array.  To catch
that, let's add:

assert(i < Elements(src));

to the body of the loop.

> +
> +         if (ir->return_deref)
> +            dst = v->visit_result(ir->return_deref);
> +      }
> +
> +      ir_variable *image_var;
> +
> +      backend_reg image;
> +      backend_reg addr;
> +      backend_reg sample;
> +      backend_reg src[2];
>

It looks like there are not too many possibilities for what will be
contained in the "src" array:

- For imageSize and imageLoad, src[0] and src[1] are unused.
- For imageAtomicCompSwap, src[0] is the "compare" parameter, and src[1] is
the "data" parameter.
- For all other functions, src[0] is the "data" parameter, and src[1] is
unused.

At the very least we should have a comment above the declaration of "src"
with this information.

However, it seems like all the other functions that have to deal with these
src values would be easier to read if we replaced the "src" array with a
pair of fields called "data" and "compare".  Then we could use those names
uniformly elsewhere (e.g. in emit_typed_atomic(), emit_image_atomic(), and
emit_image_store()).

> +      backend_reg dst;
> +
> +   private:
> +      backend_reg
> +      visit_next(backend_visitor *v, exec_list_iterator &it) const
> +      {
> +         ir_dereference *deref = static_cast<ir_dereference *>(it.get());
>

To help catch bugs, it would be nice to add:

assert(it.has_next());

> +         it.next();
> +         return v->visit_result(deref);
> +      }
> +   };
> +
> +   /**
> +    * Get the appropriate atomic op for an image atomic intrinsic.
> +    */
> +   unsigned
> +   get_image_atomic_op(const char *callee, ir_variable *image)
>

"callee" implies that there's a caller/callee relationship that's important
to consider.  How about renaming this to something like "func_name"?

> +   {
> +      const glsl_base_type base_type = image->type->fields.image.type;
> +
> +      if (!strcmp("__intrinsic_image_atomic_add", callee))
> +         return BRW_AOP_ADD;
> +
> +      else if (!strcmp("__intrinsic_image_atomic_min", callee))
> +         return (base_type == GLSL_TYPE_UINT ? BRW_AOP_UMIN :
> BRW_AOP_IMIN);
> +
> +      else if (!strcmp("__intrinsic_image_atomic_max", callee))
> +         return (base_type == GLSL_TYPE_UINT ? BRW_AOP_UMAX :
> BRW_AOP_IMAX);
> +
> +      else if (!strcmp("__intrinsic_image_atomic_and", callee))
> +         return BRW_AOP_AND;
> +
> +      else if (!strcmp("__intrinsic_image_atomic_or", callee))
> +         return BRW_AOP_OR;
> +
> +      else if (!strcmp("__intrinsic_image_atomic_xor", callee))
> +         return BRW_AOP_XOR;
> +
> +      else if (!strcmp("__intrinsic_image_atomic_exchange", callee))
> +         return BRW_AOP_MOV;
> +
> +      else if (!strcmp("__intrinsic_image_atomic_comp_swap", callee))
> +         return BRW_AOP_CMPWR;
> +
> +      else
> +         unreachable();
>

Change to an assert.

> +   }
> +}
> +
> +void
> +brw_surface_visitor::visit_image_intrinsic(ir_call *ir) const
> +{
> +   image_intrinsic_parameters p(v, ir);
> +   const char *callee = ir->callee->function_name();
> +   const unsigned dims = p.image_var->type->coordinate_components();
> +   const GLenum format = (p.image_var->image.write_only ? GL_NONE :
> +                          p.image_var->image.format);
>

Can you add a comment explaining that GL_NONE is a magic value meaning "no
format conversion needs to be applied, since the hadware natively supports
all formats for write-only access"?  Without that piece of context, it's
hard to guess what's going on here.

> +   backend_reg tmp;
> +
> +   if (!strcmp("__intrinsic_image_load", callee))
> +      tmp = emit_image_load(p.image, p.addr, format, dims);
> +
> +   else if (!strcmp("__intrinsic_image_store", callee))
> +      emit_image_store(p.image, p.addr, p.src[0], format, dims);
> +
> +   else
> +      tmp = emit_image_atomic(p.image, p.addr, p.src[0], p.src[1],
> +                              format, get_image_atomic_op(callee,
> p.image_var),
> +                              dims);
> +
> +   if (ir->return_deref) {
> +      const unsigned size = (ir->return_deref->variable_referenced()->
> +                             type->components());
> +      emit_assign_vector(p.dst, tmp, size);
> +   }
> +}
> +
> +void
> +brw_surface_visitor::visit_barrier_intrinsic(ir_call *ir) const
> +{
> +   emit_memory_fence();
> +}
> +
> +backend_reg
> +brw_surface_visitor::emit_image_load(backend_reg image, backend_reg addr,
> +                                     GLenum format, unsigned dims) const
>

I have several thoughts about this function.

a. How effectively is this function piglit tested?  There are a lot of
formats, and each one has special case behaviours for out-of-bound
accesses, so it seems like it would be good to have a piglit test that
verifies each format for both in-bound and out-of-bound accesses.

b. The bspec page "Graphics BSpec: 3D-Media-GPGPU Engine > Shared Functions
> Shared Functions – Data Port [Pre-SKL] > Messages > TypedUntyped Surface
ReadWrite and TypedUntyped Atomic Operation" includes this erratum: "The *Typed
Surface Read* message returns 0 in all channels for out-of-bounds
accesses."  This would seem to imply that the following formats need to do
emit_coordinate_check() on hsw: GL_RGBA16UI, GL_RGBA16I, GL_RGBA8UI, and
GL_RGBA8I.

c. Why do the formats GL_R16UI and GL_R8UI need to use
emit_pack_homogeneous() and emit_pad() for IVB but not for HSW?  Since the
hardware surface format matches the GLSL format for both IVB and HSW, it
seems like in both cases we ought to be able to use the HSW code path.  If
the difference is due to an IVB-specific erratum, that should be documented
in a comment in the code.

d. I'm troubled by the number of cases in this function and the fact that
they need to be kept carefully synchronized with the cases in patch 3's
get_image_format() function.  In order to verify that everything was
correct, I had to cross reference with get_image_format(), think about the
properties of each GL type, and then figure out how to do the appropriate
conversion.  It seems to me that we would be better off having
emit_image_load() generate the conversion algorithmically based on the
properties of the hardware surface format and the GLSL format.  The logic I
have in mind would look something like this:

1. Use the functions in src/mesa/main/formats.c to interrogate the GLSL
format.  Figure out how many channels and bits/channel it has, as well as
its datatype (signed/unsigned normalized, signed/unsigned int, or float),
and whether it has a homogeneous number of bits/channel.

2. Call get_image_format() to find out the hardware surface format, and
then figure out how many channels and bits/channel it has.  Consider RAW to
be 32 bits/channel and N/32 channels, where N is the total number of bits
in the GLSL format.

3. If the GLSL format and the hardware surface format match exactly, this
is a special case: all we have to do is call emit_typed_read() and retype
the result to the appropriate type.  (Note: depending on what you say about
item "c" above, we may need to disable this special case on IVB for
GL_R16UI and GL_R8UI).  Otherwise, go on to steps 4-10.

4. If necessary, call emit_coordinate_check().  This is necessary if the
hardware surface format is RAW, the GLSL format has 4 channels, or the
hardware surface format has 4 channels and the GLSL format doesn't.  (Note:
these criteria may be affected by what you say about item "c" above).

5. If the hardware surface format is RAW, call
emit_coordinate_address_calculation() and emit_untyped_read() to read the
data.  Otherwise, use emit_typed_read().

6. If the GLSL format is homogeneous, and it's a different number of
bits/channel than the hardware surface format, call
emit_unpack_homogeneous() or emit_pack_homogeneous() to convert it.

7. If the GLSL format is homogeneous, and it's the same number of
bits/channel as the hardware surface format, sign extend the data if
necessary by calling emit_unpack_homogeneous().

8. If the GLSL format isn't homogeneous, call emit_unpack_generic() to
unpack it.  There are only two possibilities: 10/10/10/2 and 11/11/10.

9. If the GLSL format is signed/unsigned normalized or float, convert the
data by calling emit_convert_from_float() or emit_convert_from_scaled().
Exception: for 32 bits/channel formats, convert_from_float() isn't
needed--just retype to BRW_REGISTER_TYPE_F.

10. If necessary, call emit_pad().  This is necessary if
emit_coordinate_check() was used or if the GLSL format has <4 channels.

I think this would be a lot less code than what you've written, and it
would be a lot easier to have confidence in, since most of the code paths
would be exercised fairly frequently, as opposed to having totally
independent code paths for every single format.  Also, it would have the
advantage of having a lot fewer Ivy Bridge vs Haswell differences.

+void
> +brw_surface_visitor::emit_image_store(backend_reg image, backend_reg addr,
> +                                      backend_reg src,
> +                                      GLenum format, unsigned dims) const
>

Again, several thoughts:

a. It would be nice to have a comment above this function reiterating the
special meaning of passing GL_NONE for the format.

b. Why is it necessary to call emit_coordinate_check() in the handlers for
GL_RGB10_A2UI, GL_RGBA8UI (IVB), GL_RGBA8I (IVB), GL_RGB10_A2, GL_RGBA8
(IVB), and GL_RGBA8_SNORM (IVB)?  It seems to me that for writes, it should
only be necessary to call emit_coordinate_check() when the hardware surface
format is RAW.

c. It would be nice to have some comments to help people understand when
it's necessary to call emit_convert_to_integer() and when it's not.  It
looks like from your code like the GL spec requires out-of-range values to
be clamped, and the hardware does the necessary clamping--is this correct?
If so, then it seems like the handler for GL_RGBA16I (HSW) needs to call
emit_pack_homogeneous(); otherwise, negative values will get clamped by the
hardware to 0xffff.  There's a similar situation for GL_RGBA8I (HSW),
GL_RG16I (HSW), GL_RG8I (HSW), GL_R16I, GL_R8I.

d. As with the previous function, I think it would be better to generate
the code algorithmically rather than case by case.  I think the algorithm
would look something like this:

1. As in emit_image_load().

2. As in emit_image_load().

3. If the GLSL format and the hardware surface format match exactly, this
is a special case: all we have to do is emit_typed_write().  Otherwise, go
on to steps 4-9.

4. If the GLSL format is signed/unsigned normalized or float, convert the
data by calling emit_convert_to_float() or emit_convert_to_scaled().
Exception: for 32 bits/channel formats, emit_convert_to_float() isn't
needed--just retype the register.

5. If the GLSL format is signed/unsigned integer, clamp the data by calling
emit_convert_to_integer().  Exception: not needed if the GLSL format is 32
bits/channel.

6. If the GLSL format is homogeneous, and it's a different number of
bits/channel than the hardware surface format, call
emit_unpack_homogeneous() or emit_pack_homogeneous() to convert it.

7. If the GLSL format is homogeneous, it's the same number of bits/channel
as the hardware surface format, and it's a signed integer format with <32
bits/channel, call emit_pack_homogeneous() to make sure that sign extended
bits don't cause the hardware to clamp (see "c" above).

8. If the GLSL format isn't homogeneous, call emit_pack_generic() to pack
it.  There are only two possibilities: 10/10/10/2 and 11/11/10.

9. If the hardware surface format is RAW, call emit_coordinate_check(),
emit_coordinate_address_calculation(), and emit_untyped_write() to write
the data.  Otherwise, use emit_typed_write().

I'll send more comments in a separate email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140102/e1d2fe6b/attachment-0001.html>