[Mesa-dev] [PATCH v2 06/20] i965/fs: fix dst stride in IVB/BYT type conversions
Francisco Jerez
currojerez at riseup.net
Wed Jan 18 20:44:19 UTC 2017
Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> On Tue, 2017-01-17 at 13:26 -0800, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>>
>> > From: "Juan A. Suarez Romero" <jasuarez at igalia.com>
>> >
>> > When converting a DF to F, we set dst stride to 2, to fulfil
>> > alignment
>> > restrictions.
>> >
>> > But in IVB/BYT, this is not necessary, as each DF conversion
>> > already
>> > writes 2 F, the first one the real value, and the second one a 0.
>> > That
>> > is, IVB/BYT already set stride = 2 implicitly, so we must set it to
>> > 1
>> > explicitly to avoid ending up with stride = 4.
>> >
>> > v2:
>> > - Fix typo (Matt)
>> > ---
>> > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 10 ++++++++++
>> > 1 file changed, 10 insertions(+)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> > index 45881e3ec95..487f2e90224 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> > @@ -1629,6 +1629,16 @@ fs_generator::generate_code(const cfg_t
>> > *cfg, int dispatch_width)
>> > inst->src[i].type != BRW_REGISTER_TYPE_UD ||
>> > !inst->src[i].negate);
>> > }
>> > + /* When converting from DF->F, we set destination's stride
>> > as 2 as an
>> > + * alignment requirement. But in IVB/BYT, each DF implicitly
>> > writes 2 F,
>> > + * being the first one the converted value. So we don't need
>> > to
>> > + * explicitly set stride 2, but 1.
>> > + */
>> > + if (devinfo->gen == 7 && !devinfo->is_haswell &&
>> > + type_sz(inst->src[0].type) > type_sz(inst->dst.type)) {
>>
>> This should be exec_type_size(inst) rather than
>> type_sz(inst->src[0].type).
>>
>
> Actually it is going to be the same as this is going to catch only DF
> -> 32-bit data type conversions. But yeah, you are right.
>
Are you sure? AFAICT the few lines of this patch were going to be
executed for *all* instructions, you didn't even have a check for the
execution type of the instruction being FP64...
>> > + assert(inst->dst.stride == 2 || inst->dst.stride == 1);
>> > + inst->dst.stride = 1;
>> > + }
>>
>> This is modifying the IR, please don't.
>>
>
> Right, I am going to do the change in brw_eu_emit.c inside the
> brw_MOV() function that Matt added in other patch and also when
> emitting the MOV indirect.
>
>> Also I don't think the above has the same semantics as a destination
>> region with stride 2... AFAIUI IVB will just write garbage into the
>> odd
>> channels when the destination type is narrower than a DF which is
>> really
>> not what a strided move is supposed to do. If that's the case it
>> would
>> probably be safer to add a new F64TO32 virtual opcode for type
>> conversions and assert(inst->dst.stride == 1) here...
>>
>
> I think adding a virtual opcode for this change is likely too much. I
> will keep the aforementioned changes. However, I don't have a strong
> opinion against it: if you prefer the virtual opcode, we can add it now
> or even later as a follow-up patch.
>
TBH I'm not particularly fond of the idea of adding a virtual conversion
opcode for IVB either, but if you don't, I hope you have some other plan
to deal with the subtle kind of breakage this could cause... E.g. some
future, seemingly unrelated and obviously correct (because it's the
codegen pass that will be breaking a contract with this patch) optimizer
improvement could rely on the (seemingly obvious) assumption that
strided destination regions don't corrupt any of the skipped components
(honestly I'm not fully convinced that we don't rely on this already),
which would probably work fine except on IVB for this infrequently used
feature, and even then the kind of failure mode is likely to be
non-deterministic and may not be caught by the simplest test-cases.
Some alternatives to adding a virtual opcode:
- Continue using regular moves with destination stride 2 as you're
doing here, but call the contents of the skipped components undefined
after the instruction is executed. Add some simple code to the FS
validator pass to warn us if we ever break this assumption -- The
most naive implementation of this could just make sure that the
destination VGRF of any strided narrowing conversion is only ever
written once on IVB (which could conceivably give false positives but
never false negatives).
- Same as above, but instead of the validation pass, write a
legalization pass to fix up invalid strided conversions to use a
temporary instead of the real destination which would then copied
into the real register (this may have benefits of its own because the
same logic could potentially by used to get rid of an amount of cruft
from the compiler back-end intended to address execution type
alignment restrictions of the destination and sources of various
instructions, which gets particularly annoying on CHV/BXT).
> Sam
>
>> > dst = brw_reg_from_fs_reg(compiler->devinfo, inst,
>> > &inst->dst, compressed);
>> >
>> > --
>> > 2.11.0
>> >
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev at lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170118/d628aeff/attachment.sig>
More information about the mesa-dev
mailing list