[Mesa-dev] [PATCH 18/25] i965: Import surface lowering code.
Paul Berry
stereotype441 at gmail.com
Fri Jan 3 11:26:55 PST 2014
On 2 December 2013 11:39, Francisco Jerez <currojerez at riseup.net> wrote:
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_surface_visitor.cpp
> b/src/mesa/drivers/dri/i965/brw_vec4_surface_visitor.cpp
> new file mode 100644
> index 0000000..3528bbe
> --- /dev/null
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_surface_visitor.cpp
> @@ -0,0 +1,846 @@
> +/*
> + * Copyright 2013 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
> "Software"),
> + * to deal in the Software without restriction, including without
> limitation
> + * the rights to use, copy, modify, merge, publish, distribute,
> sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the
> next
> + * paragraph) shall be included in all copies or substantial portions of
> the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
> SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS
> + * IN THE SOFTWARE.
> + *
> + * Authors:
> + * Francisco Jerez <currojerez at riseup.net>
> + */
> +
> +#include "brw_vec4_surface_visitor.h"
> +
> +using namespace brw;
> +
> +namespace {
> + vec4_instruction &
> + exec_all(vec4_instruction &inst)
> + {
> + inst.force_writemask_all = true;
> + return inst;
> + }
> +
> + vec4_instruction &
> + exec_predicated(backend_reg flag, vec4_instruction &inst)
> + {
> + if (flag.file != BAD_FILE)
> + inst.predicate = BRW_PREDICATE_ALIGN16_ALL4H;
>
There should be a comment here explaining why you chose this predicate (or
at the very least referring the reader to emit_coordinate_check() for
context).
> +
> + return inst;
> + }
> +}
> +
> +
> +/**
> + * Copy a SIMD4x2 vector to its transpose SIMD8x4 vector.
>
Usually SIMDNxM means "M separate execution flows are simultaneously
processed, with each execution flow processing a packed vector of N
elements at a time". (Note that SIMD8 is considered an abbreviation for
SIMD1x8 in this interpretation). So SIMD8x4 doesn't make any sense.
How about instead saying "Copy a SIMD4x2 vector to execution channels 0 and
4 of a SIMD8 vector"?
Also, there should be a comment clarifying why it's ok to leave channels
1-3 and 5-7 uninitialized.
> + */
> +void
> +brw_vec4_surface_visitor::emit_assign_to_transpose(
> + dst_reg dst, src_reg src, unsigned size) const
> +{
> + for (unsigned i = 0; i < size; ++i) {
> + emit(BRW_OPCODE_MOV,
> + writemask(offset(dst, i), WRITEMASK_X),
> + swizzle(src, BRW_SWIZZLE4(i, i, i, i)));
> + }
>
+}
> +
> +/**
> + * Copy a SIMD4x2 vector from its transpose SIMD8x4 vector.
>
Similar clarification is needed here.
> + */
> +void
> +brw_vec4_surface_visitor::emit_assign_from_transpose(
> + dst_reg dst, src_reg src, unsigned size) const
> +{
> + for (unsigned i = 0; i < size; ++i) {
> + emit(BRW_OPCODE_MOV,
> + writemask(dst, 1 << i),
> + swizzle(offset(src, i), BRW_SWIZZLE_XXXX));
> + }
> +}
> +
> +/**
> + * Initialize the header present in some surface access messages.
> + */
> +void
> +brw_vec4_surface_visitor::emit_surface_header(struct dst_reg dst) const
> +{
> + assert(dst.file == MRF);
> + exec_all(emit(BRW_OPCODE_MOV, dst, 0));
> +
> + if (!v->brw->is_haswell) {
> + /* The sample mask is used on IVB for the SIMD8 messages that
> + * have no SIMD4x2 counterpart. We only use the two X channels
> + * in that case, mask everything else out.
> + */
> + exec_all(emit(BRW_OPCODE_MOV,
> + brw_writemask(brw_uvec_mrf(4, dst.reg, 4),
> WRITEMASK_W),
> + 0x11));
> + }
> +}
>
As with the corresponding fs function, it would be helpful for the comment
to say which messages this function is expected to be used for, or to
include a bspec reference that shows the header format for the messages you
care about.
> +backend_reg
> +brw_vec4_surface_visitor::emit_coordinate_address_calculation(
> + backend_reg image, backend_reg addr, unsigned dims) const
>
Most of my comments on
brw_fs_surface_visitor::emit_coordinate_address_calculation() apply here as
well.
> +{
> + const unsigned mask = (1 << dims) - 1;
> + src_reg off = offset(image, BRW_IMAGE_PARAM_OFFSET_OFFSET / 4);
> + src_reg stride = offset(image, BRW_IMAGE_PARAM_STRIDE_OFFSET / 4);
> + src_reg tile = offset(image, BRW_IMAGE_PARAM_TILING_OFFSET / 4);
> + src_reg swz = offset(image, BRW_IMAGE_PARAM_SWIZZLING_OFFSET / 4);
> + src_reg dst = make_grf(BRW_REGISTER_TYPE_UD, 1);
> + src_reg tmp = make_grf(BRW_REGISTER_TYPE_UD, 4);
> +
> + /* Shift the coordinates by the fixed surface offset. */
> + emit(BRW_OPCODE_ADD, writemask(addr, WRITEMASK_XY & mask),
> + addr, off);
> +
> + if (dims > 2) {
> + /* Decompose z into a major (tmp.w) and a minor (tmp.z)
> + * index.
> + */
> + emit(BRW_OPCODE_SHL, writemask(tmp, WRITEMASK_Z),
> + addr, negate(tile));
> +
> + emit(BRW_OPCODE_SHR, writemask(tmp, WRITEMASK_Z),
> + tmp, negate(tile));
>
The use of negate() with a shift is sufficiently clever that it could use a
comment explaining what's going on. In partuclar, that the SHL and SHR are
effectively preserving the lowest-order tile.z bits of addr.z and
discarding the rest, except in the case where tile.z == 0, in which case
they are keeping all the bits.
> +
> + emit(BRW_OPCODE_SHR, writemask(tmp, WRITEMASK_W),
> + swizzle(addr, BRW_SWIZZLE_ZZZZ),
> + swizzle(tile, BRW_SWIZZLE_ZZZZ));
> +
> + /* Calculate the horizontal (tmp.z) and vertical (tmp.w) slice
> + * offset.
> + */
> + emit(BRW_OPCODE_MUL, writemask(tmp, WRITEMASK_ZW),
> + stride, tmp);
> + emit(BRW_OPCODE_ADD, writemask(addr, WRITEMASK_XY),
> + addr, swizzle(tmp, BRW_SWIZZLE_ZWZW));
>
There should be a comment clarifying that in the case where tile.z == 0,
since tmp.z == addr.z, we are relying on the fact that stride.z is 0 in
order to avoid corrupting the contents of addr.x. Additionally, in patch
04/25, a comment should be added to
brw_miptree_get_horizontal_slice_pitch() to clarify that in the
non-GL_TEXTURE_3D case,
brw_vec4_surface_visitor::emit_coordinate_address_calculation() relies on
the return value being 0.
> + }
> +
> + if (dims > 1) {
> + /* Calculate the minor x (tmp.x) and y (tmp.y) indices. */
> + emit(BRW_OPCODE_SHL, writemask(tmp, WRITEMASK_XY),
> + addr, negate(tile));
> +
> + emit(BRW_OPCODE_SHR, writemask(tmp, WRITEMASK_XY),
> + tmp, negate(tile));
> +
> + /* Calculate the major x (tmp.z) and y (tmp.w) indices. */
> + emit(BRW_OPCODE_SHR, writemask(tmp, WRITEMASK_ZW),
> + swizzle(addr, BRW_SWIZZLE_XYXY),
> + swizzle(tile, BRW_SWIZZLE_XYXY));
> +
> + /* Multiply the minor indices and the major x index (tmp.x,
> + * tmp.y and tmp.w) by the Bpp, and the major y index (tmp.w) by
> + * the vertical stride.
> + */
> + emit(BRW_OPCODE_MUL, writemask(tmp, WRITEMASK_XYZW),
> + swizzle(stride, BRW_SWIZZLE_XXXY), tmp);
> +
> + /* Multiply by the tile dimensions using two shift instructions.
> + * Equivalent to:
> + * minor.y = minor.y << tile.x
> + * major.x = major.x << tile.x << tile.y
> + * major.y = major.y << tile.y
> + */
> + emit(BRW_OPCODE_SHL, writemask(tmp, WRITEMASK_ZW),
> + swizzle(tmp, BRW_SWIZZLE_ZWZW),
> + swizzle(tile, BRW_SWIZZLE_YYYY));
> +
> + emit(BRW_OPCODE_SHL, writemask(tmp, WRITEMASK_YZ),
> + swizzle(tmp, BRW_SWIZZLE_YYZZ),
> + swizzle(tile, BRW_SWIZZLE_XXXX));
> +
> + /* Add everything up. */
> + emit(BRW_OPCODE_ADD, writemask(tmp, WRITEMASK_XY),
> + swizzle(tmp, BRW_SWIZZLE_XYXY),
> + swizzle(tmp, BRW_SWIZZLE_ZWZW));
> +
> + emit(BRW_OPCODE_ADD, writemask(dst, WRITEMASK_X),
> + swizzle(tmp, BRW_SWIZZLE_XXXX),
> + swizzle(tmp, BRW_SWIZZLE_YYYY));
>
I believe this code calculates the wrong value when tile.x == tile.y == 0,
because all the shifts shift by zero, so just before the line "/* Add
everything up. */", tmp has the values:
tmp.x = addr.x * stride.x
tmp.y = addr.y * stride.x
tmp.z = addr.x * stride.x
tmp.w = addr.y * stride.y
And then when you add it all up you get
dst.x = addr.x * (2 * stride.x) + addr.y * (stride.x + stride.y)
Which isn't correct. What you want is:
dst.x = addr.x * stride.x + addr.y * stride.y
> +
> + if (v->brw->has_swizzling) {
> + /* Take into account the two dynamically specified shifts. */
> + emit(BRW_OPCODE_SHR, writemask(tmp, WRITEMASK_XY),
> + swizzle(dst, BRW_SWIZZLE_XXXX), swz);
> +
> + /* XOR tmp.x and tmp.y with bit 6 of the memory address. */
> + emit(BRW_OPCODE_XOR, writemask(tmp, WRITEMASK_X),
> + swizzle(tmp, BRW_SWIZZLE_XXXX),
> + swizzle(tmp, BRW_SWIZZLE_YYYY));
> +
> + emit(BRW_OPCODE_AND, writemask(tmp, WRITEMASK_X),
> + tmp, 1 << 6);
> +
> + emit(BRW_OPCODE_XOR, writemask(dst, WRITEMASK_X),
> + dst, tmp);
> + }
> +
> + } else {
> + /* Multiply by the Bpp value. */
> + emit(BRW_OPCODE_MUL, writemask(dst, WRITEMASK_X),
> + addr, stride);
> + }
> +
> + return dst;
> +}
> +
> +backend_reg
> +brw_vec4_surface_visitor::emit_untyped_read(
> + backend_reg flag, backend_reg surface, backend_reg addr,
> + unsigned dims, unsigned size) const
>
It would be nice to have a comment in this function saying that for both
IVB and HSW, we use SIMD4x2 messages for untyped surface read.
> +
> +void
> +brw_vec4_surface_visitor::emit_untyped_write(
> + backend_reg flag, backend_reg surface, backend_reg addr,
> + backend_reg src, unsigned dims, unsigned size) const
> +{
> + const unsigned mask = (v->brw->is_haswell ? (1 << size) - 1 : 1);
> + unsigned mlen = 0;
>
It would be nice to move the comment about IVB using a SIMD8 message up
here, since without that context, the code to emit the surface write
address and the source value don't make sense. A similar comment applies
to emit_untyped_atomic(), emit_typed_read(), emit_typed_write(), and
emit_typed_atomic().
> +
> + /* Set the surface write address. */
> + if (v->brw->is_haswell) {
> + emit_assign_with_pad(make_mrf(mlen), addr, dims);
> + mlen++;
> + } else {
> + emit_assign_to_transpose(make_mrf(mlen), addr, dims);
> + mlen += dims;
> + }
> +
> + /* Set the source value. */
> + if (v->brw->is_haswell) {
> + emit_assign_with_pad(make_mrf(mlen), src, size);
> + mlen++;
> + } else {
> + emit_assign_to_transpose(make_mrf(mlen), src, size);
> + mlen += size;
> + }
> +
> + /* Emit the instruction. Note that this is translated into the
> + * SIMD8 untyped surface write message on IVB because the
> + * hardware lacks a SIMD4x2 counterpart.
> + */
> + vec4_instruction &inst = exec_predicated(
> + flag, emit(SHADER_OPCODE_UNTYPED_SURFACE_WRITE,
> + brw_writemask(brw_null_reg(), mask),
> + surface, size));
> + inst.base_mrf = 0;
> + inst.mlen = mlen;
> +}
> +
> +backend_reg
> +brw_vec4_surface_visitor::emit_untyped_atomic(
> + backend_reg flag, backend_reg surface, backend_reg addr,
> + backend_reg src0, backend_reg src1,
> + unsigned dims, unsigned op) const
> +{
> + src_reg dst = make_grf(BRW_REGISTER_TYPE_UD, 1);
> + unsigned mlen = 0;
> +
> + /* Set the atomic operation address. */
> + if (v->brw->is_haswell) {
> + emit_assign_with_pad(make_mrf(mlen), addr, dims);
> + mlen++;
> + } else {
> + emit_assign_to_transpose(make_mrf(mlen), addr, dims);
> + mlen += dims;
> + }
> +
> + /* Set the source arguments. */
> + if (v->brw->is_haswell) {
> + if (src0.file != BAD_FILE)
> + emit(BRW_OPCODE_MOV, writemask(make_mrf(mlen), WRITEMASK_X),
> + src0);
> +
> + if (src1.file != BAD_FILE)
> + emit(BRW_OPCODE_MOV, writemask(make_mrf(mlen), WRITEMASK_Y),
> + swizzle(src1, BRW_SWIZZLE_XXXX));
> +
> + mlen++;
>
According to Graphics BSpec: 3D-Media-GPGPU Engine > Shared Functions >
Shared Functions – Data Port [Pre-SKL] > Messages > TypedUntyped Surface
ReadWrite and TypedUntyped Atomic Operation [IVB+] > Message Payload >
SIMD4x2 Source Payload (Atomic Operation message only), "The following
atomic operations require no sources, thus the source payload is not
delivered: AOP_INC, AOP_DEC, AOP_PREDEC".
I think this means that if both src0.file and src1.file are BAD_FILE, we
shouldn't increment mlen.
A similar comment applies to emit_typed_atomic().
> +
> + } else {
> + if (src0.file != BAD_FILE) {
> + emit(BRW_OPCODE_MOV, writemask(make_mrf(mlen), WRITEMASK_X),
> + src0);
> + mlen++;
> + }
> +
> + if (src1.file != BAD_FILE) {
> + emit(BRW_OPCODE_MOV, writemask(make_mrf(mlen), WRITEMASK_X),
> + src1);
> + mlen++;
> + }
> + }
> +
> + /* Emit the instruction. Note that this is translated into the
> + * SIMD8 untyped atomic message on IVB because the hardware lacks a
> + * SIMD4x2 counterpart.
> + */
> + vec4_instruction &inst = exec_predicated(
> + flag, emit(SHADER_OPCODE_UNTYPED_ATOMIC,
> + writemask(dst, WRITEMASK_X),
> + surface, op));
> + inst.base_mrf = 0;
> + inst.mlen = mlen;
> +
> + return dst;
> +}
> +
> +backend_reg
> +brw_vec4_surface_visitor::emit_typed_read(
> + backend_reg flag, backend_reg surface, backend_reg addr,
> + unsigned dims, unsigned size) const
> +{
> + const unsigned rlen = size * (v->brw->is_haswell ? 1 : 8);
>
I think this is 2x larger than necessary for Ivy Bridge.
brw_vec4_surface_visitor::make_grf() will divide rlen by 4 to determine how
many registers to allocate. So, for example, if size == 4, it will
allocate 8 consecutive registers to hold the result. But the reply from
the untyped read message will only fill up 4 of them, and the call to
emit_assign_from_transpose() will only cause 4 of them to be read.
> +backend_reg
> +brw_vec4_surface_visitor::emit_pack_generic(
> + backend_reg src,
> + unsigned shift_r, unsigned width_r,
> + unsigned shift_g, unsigned width_g,
> + unsigned shift_b, unsigned width_b,
> + unsigned shift_a, unsigned width_a) const
> +{
> + const unsigned mask = (!!width_r << 0 | !!width_g << 1 |
> + !!width_b << 2 | !!width_a << 3);
> + const bool homogeneous = ((!width_g || width_r == width_g) &&
> + (!width_b || width_g == width_b) &&
> + (!width_a || width_b == width_a));
> + const unsigned bits = width_r + width_g + width_b + width_a;
> + src_reg shift = make_grf(BRW_REGISTER_TYPE_UD, 4);
> +
> + /* Shift left to discard the most significant bits. */
> + emit(BRW_OPCODE_MOV, writemask(shift, mask),
> + (homogeneous ? brw_imm_ud(32 - width_r) :
> + brw_imm_vf4(32 - width_r, 32 - width_g,
> + 32 - width_b, 32 - width_a)));
>
It seems a little silly to go to extra effort to switch between
brw_imm_ud() and brw_imm_vc4(), given that SHL and SHR only pay attention
to the bottom 5 bits of their shift argument, and brw_imm_vf4 is capable of
representing values up to 31. How about instead:
emit(BRW_OPCODE_MOV, writemask(shift, mask),
brw_imm_vf4((32 - width_r) % 32, (32 - width_g) % 32,
(32 - width_b) % 32, (32 - width_a) % 32));
A similar comment applies to emit_unpack_generic().
> +
> + emit(BRW_OPCODE_SHL, writemask(src, mask), src, shift);
>
As in my comments on brw_fs_surface_visitor.cpp, I don't like writing to
src. It seems like there's too big a risk that a user of this function
will assume that src isn't written to (and can thus be used again). I'd
prefer to allocate a new temporary and then let register coalescing take
care of getting rid of it.
> +
> + /* Shift right to the final bit field positions. */
> + emit(BRW_OPCODE_MOV, writemask(shift, mask),
> + brw_imm_vf4(32 - shift_r % 32 - width_r,
> + 32 - shift_g % 32 - width_g,
> + 32 - shift_b % 32 - width_b,
> + 32 - shift_a % 32 - width_a));
>
Won't this have trouble with unused channels? For example, if shift_a ==
width_a == 0 (i.e. because channel a doesn't exist in the format we're
packing), then we'll pass 32 to brw_imm_vf4(), which isn't allowed.
I think we need to do:
emit(BRW_OPCODE_MOV, writemask(shift, mask),
brw_imm_vf4((32 - shift_r % 32 - width_r) % 32,
(32 - shift_g % 32 - width_g) % 32,
(32 - shift_b % 32 - width_b) % 32,
(32 - shift_a % 32 - width_a) % 32));
A similar comment applies to emit_unpack_generic().
> +
> + emit(BRW_OPCODE_SHR, writemask(src, mask), src, shift);
> +
> + /* Add everything up. */
>
Since we're not adding, let's change the comment to "OR everything
together."
> + if (mask >> 2)
> + emit(BRW_OPCODE_OR,
> + writemask(src, WRITEMASK_XY),
> + swizzle(src, BRW_SWIZZLE_XZXZ),
> + swizzle(src, (mask >> 3 ? BRW_SWIZZLE_YWYW :
> + BRW_SWIZZLE_YZYZ)));
> +
> + if (mask >> 1 && bits <= 32)
> + emit(BRW_OPCODE_OR,
> + writemask(src, WRITEMASK_X),
> + swizzle(src, BRW_SWIZZLE_XXXX),
> + swizzle(src, BRW_SWIZZLE_YYYY));
>
This code only works if a number of assumptions about the shift and width
values are correct. Can we document (and check) those assumptions with
assertions? E.g. something like:
assert(shift_r + width_r <= 32);
if (bits <= 32) {
// Pack all channels into result.x
assert(!width_g || shift_g + width_g <= 32);
assert(!width_b || shift_b + width_b <= 32);
assert(!width_a || shift_a + width_a <= 32);
} else if (mask >> 2) {
// Pack rg into result.x and bw into result.y
assert(shift_g + width_g <= 32);
assert(width_b >= 32 && shift_b + width_b <= 64);
assert(!width_a || (width_a >= 32 && shift_a + width_a <= 64));
} else {
// Pack r into result.x and g into result.y
assert(width_g >= 32 &7 shift_g + width_g <= 64);
}
>
> +
> + return src;
> +}
> +
> +backend_reg
> +brw_vec4_surface_visitor::emit_unpack_generic(
> + backend_reg src,
> + unsigned shift_r, unsigned width_r,
> + unsigned shift_g, unsigned width_g,
> + unsigned shift_b, unsigned width_b,
> + unsigned shift_a, unsigned width_a) const
> +{
> + const unsigned mask = (!!width_r << 0 | !!width_g << 1 |
> + !!width_b << 2 | !!width_a << 3);
> + const bool homogeneous = ((!width_g || width_r == width_g) &&
> + (!width_b || width_g == width_b) &&
> + (!width_a || width_b == width_a));
> + src_reg shift = make_grf(BRW_REGISTER_TYPE_UD, 4);
> + src_reg dst = make_grf(src.type, 4);
> +
> + /* Shift left to discard the most significant bits. */
> + emit(BRW_OPCODE_MOV, writemask(shift, mask),
> + brw_imm_vf4(32 - shift_r % 32 - width_r,
> + 32 - shift_g % 32 - width_g,
> + 32 - shift_b % 32 - width_b,
> + 32 - shift_a % 32 - width_a));
> +
> + emit(BRW_OPCODE_SHL, writemask(dst, mask),
> + swizzle(src, BRW_SWIZZLE4(shift_r / 32, shift_g / 32,
> + shift_b / 32, shift_a / 32)),
> + shift);
> +
> + /* Shift back to the least significant bits using an arithmetic
> + * shift to get sign extension on signed types.
> + */
> + emit(BRW_OPCODE_MOV, writemask(shift, mask),
> + (homogeneous ? brw_imm_ud(32 - width_r) :
> + brw_imm_vf4(32 - width_r, 32 - width_g,
> + 32 - width_b, 32 - width_a)));
> +
> + emit(BRW_OPCODE_ASR, writemask(dst, mask), dst, shift);
> +
> + return dst;
> +}
> +
> +backend_reg
> +brw_vec4_surface_visitor::emit_pack_homogeneous(
> + backend_reg src,
> + unsigned shift_r, unsigned width_r,
> + unsigned shift_g, unsigned width_g,
> + unsigned shift_b, unsigned width_b,
> + unsigned shift_a, unsigned width_a) const
> +{
> + /* We could do the same with less instructions if we had some way
> + * to use Align1 addressing in the VEC4 visitor. Just use the
> + * general path for now...
> + */
> + return emit_pack_generic(src, shift_r, width_r, shift_g, width_g,
> + shift_b, width_b, shift_a, width_a);
> +}
> +
> +backend_reg
> +brw_vec4_surface_visitor::emit_unpack_homogeneous(
> + backend_reg src,
> + unsigned shift_r, unsigned width_r,
> + unsigned shift_g, unsigned width_g,
> + unsigned shift_b, unsigned width_b,
> + unsigned shift_a, unsigned width_a) const
> +{
> + /* We could do the same with less instructions if we had some way
> + * to use Align1 addressing in the VEC4 visitor. Just use the
> + * general path for now...
> + */
> + return emit_unpack_generic(src, shift_r, width_r, shift_g, width_g,
> + shift_b, width_b, shift_a, width_a);
> +}
>
It seems a little odd that for fs, the generic and homogeneous packing
functions are completely separate; but for vec4, they wind up calling the
same function, which then dynamically determines whether or not we're doing
homogeneous packing/unpacking.
How about if instead we simplify the interface so that there's just an
emit_pack() function and an emit_unpack() function. In the vec4
implementation, we can just drop the emit_{pack,unpack}_homogeneous()
functions and rename emit_{pack,unpack}_generic() to emit_{pack,unpack}.
In the fs implementation, we can do:
brw_fs_surface_visitor::emit_{pack,unpack}(...)
{
const bool homogeneous = ((!width_g || width_r == width_g) &&
(!width_b || width_g == width_b) &&
(!width_a || width_b == width_a));
if (homogeneous) {
...homogeneous logic...
} else {
...generic logic...
}
}
> +
> +backend_reg
> +brw_vec4_surface_visitor::emit_convert_to_float(
> + backend_reg src,
> + unsigned mask0, unsigned width0,
> + unsigned mask1, unsigned width1) const
>
My comments about the fs implementation of emit_convert_to_float() apply
here as well.
Final thoughts about this patch, now that I've gotten through it:
1. The patch is very large, and that made reviewing it difficult. Please
break it out into separate patches. One possible way of breaking it up
would be: a. add the brw_surface_visitor base class. b. Add the
brw_fs_surface_visitor class. c. Add the brw_vec4_surface_visitor class.
That would have the added advantage that it would lead people to review the
base class before reviewing the derived classes, which I think would have
been an easier order to review the patch in.
2. The object splicing concerns I brought up in "[PATCH 06/23] i965: Define
common register base class shared between both back-ends." apply to this
patch as well. I will try to talk to some folks at Intel next week about
my ideas to mitigate this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140103/84523212/attachment-0001.html>
More information about the mesa-dev
mailing list