[Mesa-stable] [Mesa-dev] [PATCH] i965/gs: Avoid DW * DW mul
Ben Widawsky
ben at bwidawsk.net
Thu Dec 4 19:48:06 PST 2014
On Thu, Dec 04, 2014 at 05:08:21PM -0800, Matt Turner wrote:
> On Thu, Dec 4, 2014 at 3:37 PM, Ben Widawsky
> <benjamin.widawsky at intel.com> wrote:
> > The GS has an interesting use for mul. It's essentially used as a fancy mov (in
> > fact, I am not sure why a mov isn't used). The documentation in the function has
> > a very good explanation from Paul on the mechanics.
>
> What do you mean? The comment says
>
> * Therefore, we want to multiply DWORDs 0 and 4 of src0 (the x components
> * of the register for geometry shader invocations 0 and 1) by the
> * immediate value in src1, and store the result in DWORDs 3 and 4 of dst.
>
> I don't know why you'd think a MOV would do?
>
> I do see one of its uses is a MUL by 1u (the one in this patch), so
> that could be a move, but not in general. And a MOV and a MUL are
> equal in terms of execution speed, so there's no advantage to having a
> second opcode.
>
I'm pretty sure it's always 1u. In theory you would want a MUL for the operation
he is trying to perform, it's just unnecessary with the current code, and I
can't *really* see a reason to ever make it not 1u.
> > CHV has some quirks with regard to multiplication. While the documentation is
> > somewhat unclear, I've found that demoting the src1 operand in the GS mul solves
> > all the problems.
>
> > I'd ask that any potential reviewer ignore the other instances
> > of mul for now (I have more patches), and simply make sure that what this patch
> > does is correct.
>
> I'd cut this out of the commit message before you commit.
>
> > This fixes around 2000 piglit tests on BSW.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84777 (with many dupes)
> > Cc: "10.4" <mesa-stable at lists.freedesktop.org>
>
> Are we actually expecting usable BSW support to be in 10.4.x? (The
> commit message uses both CHV and BSW. Can we pick one?)
>
Why wouldn't we think it's usable? AFAIK, there is very little that wasn't
already in 10.4. With this patch, I believe in terms of stability we're on par
with BDW. I suppose we might want to add the depctrl patch (I have another patch
in my queue that limits it to CHV only, since I actually found the errata now).
As for the name, mixing the two wasn't intentionali, but, things generally
aren't great because the code calls it CHV, but the FDO bugs call in BSW. For
that reason I prefer BSW, but I believe it defies Ken's convention.
> > Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
> >
> > ---
> > src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 5 ++++-
> > src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 3 ++-
> > 2 files changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> > index b353539..4f60797 100644
> > --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> > @@ -534,8 +534,11 @@ vec4_generator::generate_gs_set_write_offset(struct brw_reg dst,
> > brw_push_insn_state(p);
> > brw_set_default_access_mode(p, BRW_ALIGN_1);
> > brw_set_default_mask_control(p, BRW_MASK_DISABLE);
> > + assert(src1.file == BRW_IMMEDIATE_VALUE &&
> > + src1.type == BRW_REGISTER_TYPE_UD &&
> > + src1.dw1.ud <= SHRT_MAX);
> > brw_MUL(p, suboffset(stride(dst, 2, 2, 1), 3), stride(src0, 8, 2, 4),
> > - src1);
> > + retype(src1, BRW_REGISTER_TYPE_UW));
>
> I feel a little weird about this, but the alternative is setting the
> type in the visitor and that doesn't seem nice either. I guess this is
> probably the best thing.
>
> One thing, you're comparing with SHRT_MAX but then setting the type to
> unsigned-word. Shouldn't you compare with USHRT_MAX?
Yes, thanks.
>
> With the commit message updated and trimmed and s/SHRT_MAX/USHRT_MAX/:
>
> Reviewed-by: Matt Turner <mattst88 at gmail.com>
More information about the mesa-stable
mailing list