[Mesa-dev] [PATCH 2/5] i965/fs: Lower 32x32 bit multiplication on BXT.

Tue Aug 11 08:56:09 PDT 2015

Ok, that makes sense, thanks for the detailed explanation. I think
you're right, given the extra restrictions avoiding the integer dword
multiplication seems like the simplest solution.

I can't say that I fully understand what the documentation is trying to
say here. Does it mean that the horizontal strides must be set such that
each element in the source is offset to the same qword within its
register as the corresponding element in the destination?

I tried changing the test case¹ so that one of the sources comes from a
uniform. That would make the register for the uniform source be a scalar
and thus all of its elements would come from the first qword, whereas
for the destination the elements would be written to different qwords.
That ends up with an instruction like this:

mul(16)         g120<1>D        g9<8,8,1>D      g6<0,1,0>D { align1
                                                            1H compacted };

However, the test still passes on my BXT so I've probably misunderstood
something. On the other hand this does look similar to the stride you
had in your example instruction that the simulator complained about.

Given that the documentation and simulator implies this shouldn't work
I'm happy to leave this patch and I'm not suggesting we do anything now.

Regards,
- Neil

1. https://github.com/bpeel/piglit/blob/test-integer-multiply-uniform/tests/general/mult32.c

Francisco Jerez <currojerez at riseup.net> writes:

> Neil Roberts <neil at linux.intel.com> writes:
>
>> Are you sure this patch is necessary? The documentation for the multiply
>> instruction on BDW+ says:
>>
>> SourceType : 	*D 	
>> DestinationType : 	*D 	
>> Project : 	EXCLUDE(CHV) 
>>
>> This to me implies that it should work on BXT because it doesn't say
>> EXCLUDE(BXT).
>>
>
> In fact both CHV and BXT *can* support 32x32 multiplication, that's not
> really the reason that motivated this patch -- The problem is that 32x32
> multiplication has QWORD execution type which has several restrictions
> on CHV and BXT, the annoying one is:
>
> "CHV, BXT
>
> When source or destination datatype is 64b or operation is integer DWord
> multiply, regioning in Align1 must follow these rules:
>
>     Source and Destination horizontal stride must be aligned to the same
>     qword.
>
>     Example:
> [...]
>     // mul (4) r10.0<2>:d r11.0<8;4,2>:d r12.0<8;4,2>:d // Source and Destination stride must be 2 since the execution type is Qword."
>
> So it sounds like we could use the native 32x32 multiply with some
> additional effort to pass the arguments with the correct stride, but
> it's not clear to me whether the solution would be any more better than
> splitting up the multiply into 32x16 multiplies, so doing the same as in
> CHV and pre-Gen8 seemed like the most KISS solution for now.
>
>> I made a little test case and ran it on my BXT and it seems to work even
>> without this patch. I looked at the assembler source and it is
>> definitely doing a MUL instruction with D types for the dst and two
>> sources.
>>
>> https://github.com/bpeel/piglit/blob/test-integer-multiply/tests/general/mult32.c
>>
> Hmm, I guess the docs could be wrong, but I'm not sure what the
> consequences would be of violating the alignment rule, I guess the
> failure may be non-deterministic.  What stepping of BXT did you test
> this on?
>
>> Regards,
>> - Neil
>>
>> Francisco Jerez <currojerez at riseup.net> writes:
>>
>>> AFAIK BXT has the same annoying alignment limitation as CHV on the
>>> source register regions of 32x32 bit MULs, give it the same treatment.
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs.cpp | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> index 244f299..fc9f007 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> @@ -3126,9 +3126,9 @@ fs_visitor::lower_integer_multiplication()
>>>     bool progress = false;
>>>  
>>>     /* Gen8's MUL instruction can do a 32-bit x 32-bit -> 32-bit operation
>>> -    * directly, but Cherryview cannot.
>>> +    * directly, but CHV/BXT cannot.
>>>      */
>>> -   if (devinfo->gen >= 8 && !devinfo->is_cherryview)
>>> +   if (devinfo->gen >= 8 && !devinfo->is_cherryview && !devinfo->is_broxton)
>>>        return false;
>>>  
>>>     foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
>>> -- 
>>> 2.4.6
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev