[Mesa-dev] [PATCH 10/20] i965/fs: Implement image load, store and atomic.
Jason Ekstrand
jason at jlekstrand.net
Fri Aug 7 15:11:44 PDT 2015
On Fri, Jul 31, 2015 at 9:58 AM, Francisco Jerez <currojerez at riseup.net> wrote:
> Jason Ekstrand <jason at jlekstrand.net> writes:
>
>> On Fri, Jul 31, 2015 at 6:15 AM, Francisco Jerez <currojerez at riseup.net> wrote:
>>> Jason Ekstrand <jason at jlekstrand.net> writes:
>>>
>>>> On Thu, Jul 23, 2015 at 4:38 AM, Francisco Jerez <currojerez at riseup.net> wrote:
>>>>> Jason Ekstrand <jason at jlekstrand.net> writes:
>>>>>
>>>>>> This all looks correct as far as I can tell. However, I'm very
>>>>>> concerned about the number of checks such as
>>>>>> has_matching_typed_format() that are built-in to the compiler (via
>>>>>> surface_builder) where we then go on to do something that is highly
>>>>>> dependant on state setup doing the exact same check (not via
>>>>>> surface_builder) and doing the right thing. Another example would be
>>>>>> the interplay with has_split_bit_layout. How do we know these won't
>>>>>> get out of sync?
>>>>>>
>>>>> They cannot get out of sync because the policy which surface format to
>>>>> use for a given GLSL format is shared between the compiler and the
>>>>> state-setup code, see brw_lower_mesa_image_format() in "i965: Implement
>>>>> surface state set-up for shader images.".
>>>>
>>>> I came back to this today and looked at it again. I think my main
>>>> objection is that all of these metadata functions work in terms of a
>>>> devinfo and a format and then immediately call
>>>> brw_lower_mesa_image_format. Why don't they work instead in terms
>>>> comparing the lowered format to the nominal format? Then it would be
>>>> a lot more obvious that you're doing a transformation from one format
>>>> to another. As is, it's a bunch of magic "what do I have to do next"
>>>> queries
>>>
>>> I don't think any of these functions implies what to do next, they all
>>> encode clearly defined device-specific properties of the formats:
>>>
>>> - Does the device support any format with the same bit layout?
>>> (has_supported_bit_layout())
>>>
>>> - Does the device support any format with the same bit layout *and*
>>> encoding? (is_conversion_trivial())
>>>
>>> - Does the device support a typed format of the same size?
>>> (has_matching_typed_format())
>>>
>>> - Does the device split the pixel data into a number of discontiguous
>>> segments? (has_split_bit_layout())
>>>
>>> - Does the device return garbage in the unused bits when reading from
>>> the given format? (has_undefined_high_bits())
>>>
>>> - Do all fields of the format have the same size? (This is not even
>>> device-specific)
>>>
>>> - Is the format represented as a signed integer in memory? (Ditto)
>>>
>>> You may have noticed that except for the last two the natural subject
>>> and object of all questions are "device" and "format" respectively --
>>> I'm not sure I see how rephrasing them to involve the lowered format
>>> (that in fact a number of them don't care about) would make any of these
>>> questions easier to understand.
>>
>> I *understand* the questions being asked. My claim is this is the
>> *wrong* set of questions. By the time you get to this stage, someone
>> higher up has made the decision that they're not going to give you
>> format A but will rather give you format B which they know will also
>> work. (This decision is made by code that understands the hardware
>> limitations and you're guaranteed that B is a format you can read and
>> transform into A.) The fact that that decision is passed down through
>> brw_lower_mesa_image_format is secondary. At this stage, you have two
>> formats, A and B and need to figure out how to get from one to the
>> other.
>>
>> For example, instead of "Does the device support a typed format of the
>> same size?" the better question would be "Are these two formats the
>> same size?".
>
> Those two questions are not equivalent, lowered formats are always the
> same size as the original, whether a format is supported for typed
> surface access or not is inherently devinfo-specific, the devinfo
> argument couldn't be replaced with a lower_format argument. The same
> goes for "Does the device return garbage in the unused bits when reading
> From the given format?".
>
>> Instead of asking "Does the device support any format with the same
>> bit layout *and* encoding?" you should ask "Do these two formats have
>> the same bit layout *and* encoding?"
>>
> *Shrug*, I think they're only wrong questions if you start from the
> assumption that the policy that computes the lowered format from a given
> image format is broken -- E.g. that it gives you a suboptimal format of
> different encoding even though the hardware supports an identically
> encoded one. You can quickly rule out that assumption by looking up the
> definition of is_conversion_trivial() and seeing that it does little
> more than call brw_lower_mesa_image_format().
>
> I'm afraid that I was slightly misleading earlier:
> is_conversion_trivial() is really about the encoding and only has
> indirect bit layout implications. The current implementation is based
> on the assumption that the lowered format we're converting from or to is
> just the canonical one -- I.e. it cannot be readily extended to check
> whether the conversion between any two arbitrary formats is trivial, so
> even though replacing the devinfo argument with a lower_format would be
> technically possible it would lead to an IMO a more disturbing problem.
> is_conversion_trivial(mesa_format, mesa_format) would be a partial
> function with ill-defined results for certain combinations of its
> arguments, and it wouldn't have any way to assert() that the pair is
> sensible without an additional devinfo pointer. To make the matter
> worse the C++ type system cannot distinguish between a lowered and an
> unlowered format while it can distinguish between a devinfo and a
> format. The current image_format_info queries are total functions of
> their arguments and they are all consistent about taking the original
> format *only*. I'd like to preserve those properties.
>
>> Does the difference I'm making make sense? In the current code, it
>> looks like you're doing device queries and making decisions about how
>> to work around device limitations.
>
> emit_image_load and store don't make any decisions themselves, and I'm
> not sure I fully understand what made you think they were -- The
> code-paths taken through them are driven by the image_format_info
> queries so I understand that you may have thought that *those* were the
> ones making decisions, or not, either way it's not an obstacle to
> convince oneself of the correctness of emit_image_load/store if you
> temporarily assume that the results returned by image_format_info are
> sensible and consistent with the canonical format lowering policy --
> Assumption you can quickly verify afterwards because their semantics are
> clearly defined and their result is usually a simple function of
> brw_lower_mesa_image_format().
>
> There's one case however in which I think it would be fine to replace
> the devinfo argument with a mesa_format one (although it would somewhat
> break the rule of only passing unlowered formats to the queries). It
> may also be the one that contributed the most to your impression that
> emit_image_load and store were making decisions: We could change
> "has_supported_bit_layout(devinfo, format)" to
> "has_same_bit_layout(format, lower_format)". has_same_bit_layout()
> would be total and commutative, so my objections above don't apply. For
> the other ones I could mention more explicitly in the doxygen comments
> that the functions don't make any decisions independently and the result
> is based on the usual lowering policy for the given device.
I can see that this is another spot where we're having trouble seeing
eye-to-eye. I am convinced that it is correct (and have was several
e-mails ago), but I still don't like the way it's structured. I may
send a "clean-up" patch or two after things have landed and we can
argue about the structure then. However, for the sake of getting
things merged,
Reviewed-by: Jason Ekstrand <jason.ekstrand at intel.com>
>> Hence my original comment about things getting out of sync. In
>> reality, all of those decisions have already been made by
>> brw_lower_mesa_image_format, this code is just doing what it needs to
>> do to go from format A to format B.
>>
>>>> which, while in the same namespace, are in a different file.
>>>
>>> I happen to have cleaned that up to be in the same file a short while
>>> ago -- They're no longer used by anything else outside of
>>> brw_fs_surface_builder.cpp:
>>> http://cgit.freedesktop.org/~currojerez/mesa/commit/?h=image-load-store-lower&id=6b83876a343cabb89083212d20d5b3721b10c200
>>>
>>>> It would also be more efficient because you would only call
>>>> brw_lower_mesa_image_format once.
>>>
>>> A more elegant way to "fix" that would be to mark
>>> brw_lower_mesa_image_format() as pure, or enable LTO -- It's not like it
>>> would make any measurable difference in practice though.
>>>
>>>>
>>>> Does that make more sense?
>>>> --Jason
>>>>
>>>>> Thanks.
>>>>>
>>>>>> On Tue, Jul 21, 2015 at 9:38 AM, Francisco Jerez <currojerez at riseup.net> wrote:
>>>>>>> v2: Drop VEC4 suport.
>>>>>>> v3: Rebase.
>>>>>>> ---
>>>>>>> .../drivers/dri/i965/brw_fs_surface_builder.cpp | 216 +++++++++++++++++++++
>>>>>>> src/mesa/drivers/dri/i965/brw_fs_surface_builder.h | 17 ++
>>>>>>> 2 files changed, 233 insertions(+)
>>>>>>>
>>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>>>>>>> index ea1c4aa..46b449f 100644
>>>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>>>>>>> @@ -587,3 +587,219 @@ namespace {
>>>>>>> }
>>>>>>> }
>>>>>>> }
>>>>>>> +
>>>>>>> +namespace brw {
>>>>>>> + namespace image_access {
>>>>>>> + /**
>>>>>>> + * Load a vector from a surface of the given format and dimensionality
>>>>>>> + * at the given coordinates.
>>>>>>> + */
>>>>>>> + fs_reg
>>>>>>> + emit_image_load(const fs_builder &bld,
>>>>>>> + const fs_reg &image, const fs_reg &addr,
>>>>>>> + mesa_format format, unsigned dims)
>>>>>>> + {
>>>>>>> + using namespace image_format_info;
>>>>>>> + using namespace image_format_conversion;
>>>>>>> + using namespace image_validity;
>>>>>>> + using namespace image_coordinates;
>>>>>>> + using namespace surface_access;
>>>>>>> + const brw_device_info *devinfo = bld.shader->devinfo;
>>>>>>> + const mesa_format lower_format =
>>>>>>> + brw_lower_mesa_image_format(devinfo, format);
>>>>>>> + fs_reg tmp;
>>>>>>> +
>>>>>>> + if (has_matching_typed_format(devinfo, format)) {
>>>>>>> + /* Hopefully we get here most of the time... */
>>>>>>> + tmp = emit_typed_read(bld, image, addr, dims,
>>>>>>> + _mesa_format_num_components(lower_format));
>>>>>>> + } else {
>>>>>>> + /* Untyped surface reads return 32 bits of the surface per
>>>>>>> + * component, without any sort of unpacking or type conversion,
>>>>>>> + */
>>>>>>> + const unsigned size = _mesa_get_format_bytes(format) / 4;
>>>>>>> +
>>>>>>> + /* they don't properly handle out of bounds access, so we have to
>>>>>>> + * check manually if the coordinates are valid and predicate the
>>>>>>> + * surface read on the result,
>>>>>>> + */
>>>>>>> + const brw_predicate pred =
>>>>>>> + emit_bounds_check(bld, image, addr, dims);
>>>>>>> +
>>>>>>> + /* and they don't know about surface coordinates, we need to
>>>>>>> + * convert them to a raw memory offset.
>>>>>>> + */
>>>>>>> + const fs_reg laddr = emit_address_calculation(bld, image, addr, dims);
>>>>>>> +
>>>>>>> + tmp = emit_untyped_read(bld, image, laddr, 1, size, pred);
>>>>>>> +
>>>>>>> + /* An out of bounds surface access should give zero as result. */
>>>>>>> + for (unsigned c = 0; c < 4; ++c)
>>>>>>> + set_predicate(pred, bld.SEL(offset(tmp, bld, c),
>>>>>>> + offset(tmp, bld, c), fs_reg(0)));
>>>>>>> + }
>>>>>>> +
>>>>>>> + /* Set the register type to D instead of UD if the data type is
>>>>>>> + * represented as a signed integer in memory so that sign extension
>>>>>>> + * is handled correctly by unpack.
>>>>>>> + */
>>>>>>> + if (needs_sign_extension(format))
>>>>>>> + tmp = retype(tmp, BRW_REGISTER_TYPE_D);
>>>>>>> +
>>>>>>> + if (!has_supported_bit_layout(devinfo, format)) {
>>>>>>> + /* Unpack individual vector components from the bitfield if the
>>>>>>> + * hardware is unable to do it for us.
>>>>>>> + */
>>>>>>> + if (has_split_bit_layout(devinfo, format))
>>>>>>> + tmp = emit_pack(bld, tmp, get_bit_shifts(lower_format),
>>>>>>> + get_bit_widths(lower_format));
>>>>>>> + else
>>>>>>> + tmp = emit_unpack(bld, tmp, get_bit_shifts(format),
>>>>>>> + get_bit_widths(format));
>>>>>>> +
>>>>>>> + } else if ((needs_sign_extension(format) &&
>>>>>>> + !is_conversion_trivial(devinfo, format)) ||
>>>>>>> + has_undefined_high_bits(devinfo, format)) {
>>>>>>> + /* Perform a trivial unpack even though the bit layout matches in
>>>>>>> + * order to get the most significant bits of each component
>>>>>>> + * initialized properly.
>>>>>>> + */
>>>>>>> + tmp = emit_unpack(bld, tmp, color_u(0, 32, 64, 96),
>>>>>>> + get_bit_widths(format));
>>>>>>> + }
>>>>>>> +
>>>>>>> + if (!_mesa_is_format_integer(format)) {
>>>>>>> + if (is_conversion_trivial(devinfo, format)) {
>>>>>>> + /* Just need to cast the vector to the target type. */
>>>>>>> + tmp = retype(tmp, BRW_REGISTER_TYPE_F);
>>>>>>> + } else {
>>>>>>> + /* Do the right sort of type conversion to float. */
>>>>>>> + if (_mesa_get_format_datatype(format) == GL_FLOAT)
>>>>>>> + tmp = emit_convert_from_float(
>>>>>>> + bld, tmp, get_bit_widths(format));
>>>>>>> + else
>>>>>>> + tmp = emit_convert_from_scaled(
>>>>>>> + bld, tmp, get_bit_widths(format),
>>>>>>> + _mesa_is_format_signed(format));
>>>>>>> + }
>>>>>>> + }
>>>>>>> +
>>>>>>> + /* Initialize missing components of the result. */
>>>>>>> + return emit_pad(bld, tmp, get_bit_widths(format));
>>>>>>> + }
>>>>>>> +
>>>>>>> + /**
>>>>>>> + * Store a vector in a surface of the given format and dimensionality at
>>>>>>> + * the given coordinates.
>>>>>>> + */
>>>>>>> + void
>>>>>>> + emit_image_store(const fs_builder &bld, const fs_reg &image,
>>>>>>> + const fs_reg &addr, const fs_reg &src,
>>>>>>> + mesa_format format, unsigned dims)
>>>>>>> + {
>>>>>>> + using namespace image_format_info;
>>>>>>> + using namespace image_format_conversion;
>>>>>>> + using namespace image_validity;
>>>>>>> + using namespace image_coordinates;
>>>>>>> + using namespace surface_access;
>>>>>>> + const brw_device_info *devinfo = bld.shader->devinfo;
>>>>>>> +
>>>>>>> + if (format == MESA_FORMAT_NONE) {
>>>>>>> + /* We don't know what the format is, but that's fine because it
>>>>>>> + * implies write-only access, and typed surface writes are always
>>>>>>> + * able to take care of type conversion and packing for us.
>>>>>>> + */
>>>>>>> + emit_typed_write(bld, image, addr, src, dims, 4);
>>>>>>> +
>>>>>>> + } else {
>>>>>>> + const mesa_format lower_format =
>>>>>>> + brw_lower_mesa_image_format(devinfo, format);
>>>>>>> + fs_reg tmp = src;
>>>>>>> +
>>>>>>> + if (!is_conversion_trivial(devinfo, format)) {
>>>>>>> + /* Do the right sort of type conversion. */
>>>>>>> + if (_mesa_get_format_datatype(format) == GL_FLOAT)
>>>>>>> + tmp = emit_convert_to_float(bld, tmp, get_bit_widths(format));
>>>>>>> +
>>>>>>> + else if (_mesa_is_format_integer(format))
>>>>>>> + tmp = emit_convert_to_integer(bld, tmp, get_bit_widths(format),
>>>>>>> + _mesa_is_format_signed(format));
>>>>>>> +
>>>>>>> + else
>>>>>>> + tmp = emit_convert_to_scaled(bld, tmp, get_bit_widths(format),
>>>>>>> + _mesa_is_format_signed(format));
>>>>>>> + }
>>>>>>> +
>>>>>>> + /* We're down to bit manipulation at this point. */
>>>>>>> + tmp = retype(tmp, BRW_REGISTER_TYPE_UD);
>>>>>>> +
>>>>>>> + if (!has_supported_bit_layout(devinfo, format)) {
>>>>>>> + /* Pack the vector components into a bitfield if the hardware
>>>>>>> + * is unable to do it for us.
>>>>>>> + */
>>>>>>> + if (has_split_bit_layout(devinfo, format))
>>>>>>> + tmp = emit_unpack(bld, tmp, get_bit_shifts(lower_format),
>>>>>>> + get_bit_widths(lower_format));
>>>>>>> +
>>>>>>> + else
>>>>>>> + tmp = emit_pack(bld, tmp, get_bit_shifts(format),
>>>>>>> + get_bit_widths(format));
>>>>>>> + }
>>>>>>> +
>>>>>>> + if (has_matching_typed_format(devinfo, format)) {
>>>>>>> + /* Hopefully we get here most of the time... */
>>>>>>> + emit_typed_write(bld, image, addr, tmp, dims,
>>>>>>> + _mesa_format_num_components(lower_format));
>>>>>>> +
>>>>>>> + } else {
>>>>>>> + /* Untyped surface writes store 32 bits of the surface per
>>>>>>> + * component, without any sort of packing or type conversion,
>>>>>>> + */
>>>>>>> + const unsigned size = _mesa_get_format_bytes(format) / 4;
>>>>>>> +
>>>>>>> + /* they don't properly handle out of bounds access, so we have
>>>>>>> + * to check manually if the coordinates are valid and predicate
>>>>>>> + * the surface write on the result,
>>>>>>> + */
>>>>>>> + const brw_predicate pred =
>>>>>>> + emit_bounds_check(bld, image, addr, dims);
>>>>>>> +
>>>>>>> + /* and, phew, they don't know about surface coordinates, we
>>>>>>> + * need to convert them to a raw memory offset.
>>>>>>> + */
>>>>>>> + const fs_reg laddr = emit_address_calculation(bld, image, addr, dims);
>>>>>>> +
>>>>>>> + emit_untyped_write(bld, image, laddr, tmp, 1, size, pred);
>>>>>>> + }
>>>>>>> + }
>>>>>>> + }
>>>>>>> +
>>>>>>> + /**
>>>>>>> + * Perform an atomic read-modify-write operation in a surface of the
>>>>>>> + * given dimensionality at the given coordinates. Main building block
>>>>>>> + * of the imageAtomic GLSL built-ins.
>>>>>>> + */
>>>>>>> + fs_reg
>>>>>>> + emit_image_atomic(const fs_builder &bld,
>>>>>>> + const fs_reg &image, const fs_reg &addr,
>>>>>>> + const fs_reg &src0, const fs_reg &src1,
>>>>>>> + unsigned dims, unsigned rsize, unsigned op)
>>>>>>> + {
>>>>>>> + using namespace image_validity;
>>>>>>> + using namespace image_coordinates;
>>>>>>> + using namespace surface_access;
>>>>>>> + /* Avoid performing an atomic operation on an unbound surface. */
>>>>>>> + const brw_predicate pred = emit_surface_check(bld, image);
>>>>>>> +
>>>>>>> + /* Thankfully we can do without untyped atomics here. */
>>>>>>> + const fs_reg tmp = emit_typed_atomic(bld, image, addr, src0, src1,
>>>>>>> + dims, rsize, op, pred);
>>>>>>> +
>>>>>>> + /* An unbound surface access should give zero as result. */
>>>>>>> + if (rsize)
>>>>>>> + set_predicate(pred, bld.SEL(tmp, tmp, fs_reg(0)));
>>>>>>> +
>>>>>>> + return tmp;
>>>>>>> + }
>>>>>>> + }
>>>>>>> +}
>>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h
>>>>>>> index 5424be6..cad8c27 100644
>>>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h
>>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h
>>>>>>> @@ -213,5 +213,22 @@ namespace brw {
>>>>>>> _mesa_get_format_datatype(format) == GL_INT);
>>>>>>> }
>>>>>>> }
>>>>>>> +
>>>>>>> + namespace image_access {
>>>>>>> + fs_reg
>>>>>>> + emit_image_load(const fs_builder &bld,
>>>>>>> + const fs_reg &image, const fs_reg &addr,
>>>>>>> + mesa_format format, unsigned dims);
>>>>>>> +
>>>>>>> + void
>>>>>>> + emit_image_store(const fs_builder &bld, const fs_reg &image,
>>>>>>> + const fs_reg &addr, const fs_reg &src,
>>>>>>> + mesa_format format, unsigned dims);
>>>>>>> + fs_reg
>>>>>>> + emit_image_atomic(const fs_builder &bld,
>>>>>>> + const fs_reg &image, const fs_reg &addr,
>>>>>>> + const fs_reg &src0, const fs_reg &src1,
>>>>>>> + unsigned dims, unsigned rsize, unsigned op);
>>>>>>> + }
>>>>>>> }
>>>>>>> #endif
>>>>>>> --
>>>>>>> 2.4.3
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> mesa-dev mailing list
>>>>>>> mesa-dev at lists.freedesktop.org
>>>>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list