[Mesa-dev] [PATCH 18/21] i965/fs: Allow specifying arbitrary execution sizes up to 32 to FIND_LIVE_CHANNEL.
Francisco Jerez
currojerez at riseup.net
Wed May 25 00:53:17 UTC 2016
Kenneth Graunke <kenneth at whitecape.org> writes:
> On Tuesday, May 24, 2016 5:27:59 PM PDT Francisco Jerez wrote:
>> Jason Ekstrand <jason at jlekstrand.net> writes:
>>
>> > On Tue, May 24, 2016 at 12:18 AM, Francisco Jerez <currojerez at riseup.net>
>> > wrote:
>> >
>> >> Due to a Gen7-specific hardware bug native 32-wide instructions get
>> >> the lower 16 bits of the execution mask applied incorrectly to both
>> >> halves of the instruction, so the MOV trick we currently use wouldn't
>> >> work. Instead emit multiple 16-wide MOV instructions in 32-wide mode
>> >> in order to cover the whole execution mask.
>> >> ---
>> >> src/mesa/drivers/dri/i965/brw_eu_emit.c | 25 +++++++++++++++++--------
>> >> 1 file changed, 17 insertions(+), 8 deletions(-)
>> >>
>> >> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c
>> >> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
>> >> index af7caed..d36877c 100644
>> >> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
>> >> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
>> >> @@ -3330,6 +3330,7 @@ void
>> >> brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst)
>> >> {
>> >> const struct brw_device_info *devinfo = p->devinfo;
>> >> + const unsigned exec_size = 1 << brw_inst_exec_size(devinfo,
>> >> p->current);
>> >> brw_inst *inst;
>> >>
>> >> assert(devinfo->gen >= 7);
>> >> @@ -3359,15 +3360,23 @@ brw_find_live_channel(struct brw_codegen *p,
>> >> struct brw_reg dst)
>> >>
>> >> brw_MOV(p, flag, brw_imm_ud(0));
>> >>
>> >> - /* Run a 16-wide instruction returning zero with execution
>> >> masking
>> >> - * and a conditional modifier enabled in order to get the current
>> >> - * execution mask in f1.0.
>> >> + /* Run enough instructions returning zero with execution masking
>> >> and
>> >> + * a conditional modifier enabled in order to get the full
>> >> execution
>> >> + * mask in f1.0. We could use a single 32-wide move here if it
>> >> + * weren't because of the hardware bug that causes channel
>> >> enables to
>> >> + * be applied incorrectly to the second half of 32-wide
>> >> instructions
>> >> + * on Gen7.
>> >> */
>> >> - inst = brw_MOV(p, brw_null_reg(), brw_imm_ud(0));
>> >> - brw_inst_set_exec_size(devinfo, inst, BRW_EXECUTE_16);
>> >> - brw_inst_set_mask_control(devinfo, inst, BRW_MASK_ENABLE);
>> >> - brw_inst_set_cond_modifier(devinfo, inst, BRW_CONDITIONAL_Z);
>> >> - brw_inst_set_flag_reg_nr(devinfo, inst, 1);
>> >> + const unsigned lower_size = MIN2(16, exec_size);
>> >> + for (unsigned i = 0; i < exec_size / lower_size; i++) {
>> >> + inst = brw_MOV(p, retype(brw_null_reg(),
>> >> BRW_REGISTER_TYPE_UW),
>> >> + brw_imm_uw(0));
>> >>
>> >
>> > Is there a reason this is changing from D to UW?
>> >
>>
>> It's likely to have lower execution latency than an instruction with
>> 32-bit integer execution type. It shouldn't have any practical
>> implications other than that, the result of the instruction is only used
>> to set bits of the flag register.
>
> I've never heard anything about them having different latencies.
> That doesn't mean that you're wrong, though. :)
>
AFAIUI the FPU pipeline is 4-wide (i.e. it can process four elements per
clock at a given stage of the pipeline) when the execution type is
F/D/UD, 8-wide when it is HF/W/UW, and 2-wide when it is DF/Q/UQ (this
is not accounting for hybrid-issue and such). Other than that if the
execution type is D the instructions would have to be compressed when
the execution size of the FIND_LIVE_CHANNEL instruction is 16 or 32.
> --Ken
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160524/4435d65f/attachment-0001.sig>
More information about the mesa-dev
mailing list