[Mesa-dev] [PATCH] i965/fs: Use UW-typed immediate in multiply inst.

Wed Jun 3 10:35:16 PDT 2015

On Wed, Jun 3, 2015 at 5:09 AM, Neil Roberts <neil at linux.intel.com> wrote:
> Looks good to me. Thanks for fixing this. I guess I still have more to
> learn about the ISA.
>
> However, should we not also fix the vec4 version? With that,
>
> Reviewed-by: Neil Roberts <neil at linux.intel.com>

Yes, I'll do that too. I started to... and then noticed that the code
coming out of the vec4 backend was entirely broken -- the send
instruction had a null src0 -- and forgot. After some investigation, I
realized that's what your other patch is fixing! (i965/vec4: Fix the
source register for indexed samplers).

>
> If we wanted to play safe and avoid the MUL, we could change it to this
> and still avoid having a temporary:
>
>       /* addr = (((sampler << 8) | sampler) +
>        *         base_binding_table_index) & 0xfff
>        */
>       brw_SHL(p, addr, sampler_reg, brw_imm_ud(8u));
>       brw_OR(p, addr, addr, sampler_reg);
>
> I just thought the MUL trick was more fun. I'm not sure which would be
> faster. The SHL+OR might take up fewer clock cycles but it also
> introduces a bubble because it is writing to a register and then
> immediately reading from it in the next instruction.
>
> On the other hand it's probably really not worth worrying about
> optimising this bit of code because literally nothing is using it in
> shader-db. We can just go with whatever works.

Nah, the multiply's fine. Most instructions have the same latency
(time before results are ready) and issue time (time before the EU can
issue another independent instruction). MUL/SHL/OR all are equal, from
what I've measured.