[Mesa-dev] [PATCH v2 11/30] i965/fs: add shuffle_32bit_load_result_to_64bit_data helper
Francisco Jerez
currojerez at riseup.net
Fri May 13 22:02:40 UTC 2016
Iago Toral <itoral at igalia.com> writes:
> On Thu, 2016-05-12 at 20:01 -0700, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>>
>> > From: Iago Toral Quiroga <itoral at igalia.com>
>> >
>> > There will be a few places where we need to shuffle the result of a 32-bit
>> > load into valid 64-bit data, so extract this logic into a separate helper
>> > that we can reuse.
>> >
>> > The shuffling needs to operate with WE_all set because we are changing the
>> > layout of the data across the various channels. Otherwise we will run into
>> > problems in non-uniform control-flow scenarios.
>> >
>> I guess you could remove this paragraph because it no longer applies to
>> the last version of the patch.
>
> Ooops, yes!
>
>> > v2 (Curro):
>> > - Use subscript() instead of stride()
>> > - Assert on the input types rather than retyping.
>> > - Use offset() instead of horiz_offset(), drop the multiplier definition.
>> > - Do not use a temporary for the writes and drop force_writemask_all.
>>
>> Don't pretend you took my "don't use a temporary" suggestion into
>> account. :P
>
> Oh right, I did write the patch with that change included but then run
> into the issue we discussed when src == dst and forgot to remove this.
>
>> > - Mark component_i as const.
>>
>> Did you forget to git-add something?
>
> No, somehow I lost the change in the process... I'll put it back.
>
>> > - Make the function name lower case.
>> > ---
>> > src/mesa/drivers/dri/i965/brw_fs.cpp | 53 ++++++++++++++++++++++++++++++++++++
>> > src/mesa/drivers/dri/i965/brw_fs.h | 5 ++++
>> > 2 files changed, 58 insertions(+)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > index 15d5759..4827dea 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > @@ -212,6 +212,59 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder &bld,
>> > }
>> >
>> > /**
>> > + * This helper takes the result of a load operation that reads 32-bit elements
>> > + * in this format:
>> > + *
>> > + * x x x x x x x x
>> > + * y y y y y y y y
>> > + * z z z z z z z z
>> > + * w w w w w w w w
>> > + *
>> > + * and shuffles the data to get this:
>> > + *
>> > + * x y x y x y x y
>> > + * x y x y x y x y
>> > + * z w z w z w z w
>> > + * z w z w z w z w
>> > + *
>> > + * Which is exactly what we want if the load is reading 64-bit components
>> > + * like doubles, where x represents the low 32-bit of the x double component
>> > + * and y represents the high 32-bit of the x double component (likewise with
>> > + * z and w for double component y). The parameter @components represents
>> > + * the number of 64-bit components present in @src. This would typically be
>> > + * 2 at most, since we can only fit 2 double elements in the result of a
>> > + * vec4 load.
>> > + *
>> > + * Notice that @dst and @src can be the same register.
>> > + */
>> > +void
>> > +fs_visitor::shuffle_32bit_load_result_to_64bit_data(const fs_builder &bld,
>>
>> There was a second reason I had in mind when I suggested it would
>> improve encapsulation to take this out of fs_visitor: The function has
>> absolutely nothing to do with visiting, it uses no internal or external
>> fs_visitor data structures or interfaces, it doesn't even use the "this"
>> pointer. Defining a function that could perfectly be stand-alone inside
>> an object it has no need to be in (in this case it doesn't even have any
>> logical relation with) actually *decreases* encapsulation because it
>> exposes the object's internals to the function unnecessarily.
>>
>> Either way the back-end code is already plagued by this anti-pattern so
>> I'm not going to complain if you keep the code as-is -- You could argue
>> you're just being consistent with the existing practice. ;)
>
> I agree with all your reasoning, my only disagreement is with the fact
> that brw_fs_nir.cpp is a better place for it considering that this has
> nothing to do with NIR, but since we have to put this somewhere let's
> put it there for now.
>
Oh, I don't really mind where you put the function, the only reason I
suggested that was that I assumed you were only using it from the NIR
front-end, but brw_fs.cpp does seem like a better fit if you're using it
elsewhere.
>> > + const fs_reg dst,
>> > + const fs_reg src,
>>
>> Pass by reference.
>
> Ok.
>
>> > + uint32_t components)
>> > +{
>> > + assert(type_sz(src.type) == 4);
>> > + assert(type_sz(dst.type) == 8);
>> > +
>> > + /* A temporary that we will use to shuffle the 32-bit data of each
>> > + * component in the vector into valid 64-bit data. We can't write directly
>> > + * to dst because dst can be (and would usually be) the same as src
>> > + * and in that case the first MOV in the loop below would overwrite the
>> > + * data read in the second MOV.
>> > + */
>> > + fs_reg tmp = bld.vgrf(dst.type);
>> > +
>> > + for (unsigned i = 0; i < components; i++) {
>> > + fs_reg component_i = offset(src, bld, 2 * i);
>> > +
>> > + bld.MOV(subscript(tmp, src.type, 0), component_i);
>> > + bld.MOV(subscript(tmp, src.type, 1), offset(component_i, bld, 1));
>> > +
>> > + bld.MOV(offset(dst, bld, i), tmp);
>>
>> Ah, yes, this looks much better,
>
> Yes, I agree :)
>
>> Reviewed-by: Francisco Jerez <currojerez at riseup.net>
>>
>> > + }
>> > +}
>> > +
>> > +/**
>> > * A helper for MOV generation for fixing up broken hardware SEND dependency
>> > * handling.
>> > */
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
>> > index b906f3d..1970ad0 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs.h
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs.h
>> > @@ -105,6 +105,11 @@ public:
>> > uint32_t const_offset);
>> > void DEP_RESOLVE_MOV(const brw::fs_builder &bld, int grf);
>> >
>> > + void shuffle_32bit_load_result_to_64bit_data(const brw::fs_builder &bld,
>> > + const fs_reg dst,
>> > + const fs_reg src,
>> > + uint32_t components);
>> > +
>> > bool run_fs(bool do_rep_send);
>> > bool run_vs(gl_clip_plane *clip_planes);
>> > bool run_tcs_single_patch();
>> > --
>> > 2.5.0
>> >
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev at lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160513/7627e390/attachment-0001.sig>
More information about the mesa-dev
mailing list