<div dir="ltr">On 21 October 2013 18:34, Eric Anholt <span dir="ltr"><<a href="mailto:eric@anholt.net" target="_blank">eric@anholt.net</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>Paul Berry <<a href="mailto:stereotype441@gmail.com" target="_blank">stereotype441@gmail.com</a>> writes:<br>
<br>
> On 18 October 2013 17:04, Eric Anholt <<a href="mailto:eric@anholt.net" target="_blank">eric@anholt.net</a>> wrote:<br>
</div><div><div>>> Putting these fixups for a couple of weird cases in just MOV and ADD<br>
>> feels wrong to me, but maybe when I understand better what's going on<br>
>> it'll seem more natural.<br>
>><br>
><br>
> Another possibility I'd be equally happy with would be to put the fixup at<br>
> the top of vec4_generator::generate_vec4_instruction(), before the switch<br>
> statement. It would look something like this:<br>
><br>
> if (dst.width == BRW_WIDTH_4) {<br>
> /* This happens in attribute fixups for "dual instanced" geometry<br>
> * shaders, since they use attributes that are vec4's. Since the exec<br>
> * width is only 4, it's essential that the caller set<br>
> * force_writemask_all in order to make sure the instruction is<br>
> executed<br>
> * regardless of which channels are enabled.<br>
> */<br>
> assert(inst->force_writemask_all);<br>
><br>
> /* Fix up any <8;8,1> or <0;4,1> source registers to <4;4,1> to<br>
> satisfy<br>
> * the following register region restrictions (from Graphics BSpec:<br>
> * 3D-Media-GPGPU Engine > EU Overview > Registers and Register<br>
> Regions<br>
> * > Register Region Restrictions)<br>
> *<br>
> * 1. ExecSize must be greater than or equal to Width.<br>
> *<br>
> * 2. If ExecSize = Width and HorzStride != 0, VertStride must be<br>
> set<br>
> * to Width * HorzStride."<br>
> */<br>
> for (int i = 0; i < 3; i++) {<br>
> if (src[i].file == BRW_GENERAL_REGISTER_FILE)<br>
> src[i] = stride(src[i], 4, 4, 1);<br>
> }<br>
> }<br>
><br>
> Does that seem better to you? I actually think I like it slightly better<br>
> because by making the assertion more general, I caught another case where I<br>
> think I should be setting force_writemask_all to be on the safe side (the<br>
> "clear r0.2" instruction in the gs prolog).<br>
<br>
</div></div>I like this better -- it makes more sense to me for the fixup to be<br>
non-opcode-specific. Any I think I get the problem now -- our registers<br>
would make a ton of sense as <4;4,1> in general (that's how I think of<br>
our GRFs in align16, at least!), except that we can't because then we'd<br>
guess an execsize of 4.</blockquote><div><br></div><div>Agreed. And it's not just how you think of our GRF's in align16. <4;4,1> is what we actually emit to the hardware, once the fixups at the bottom of brw_set_src{0,1}() take effect.<br>
</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> I'm in favor of the kill-guess-execsize plan,<br>
even if we leave it in place for gen4/5 SF/CLIP threads (which bounce<br>
execsize all over the place iirc and would be a pain to convert) and<br>
only convert the new backends. <br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Another option: How about instead of that assert in brw_eu_emit.c, we<br>
just smash the vstride to be width * hstride? We know the vstride<br>
doesn't matter, because you're only using execsize components, so let's<br>
just not bother our brw_eu.c callers with this little problem.<br>
</blockquote></div><br></div><div class="gmail_extra">I'm not really a fan of this--it feels like it's perpetuating the "<8;8,1> instead of <4;4,1>" lie by introducing even more code in brw_eu_emit.c to fix things up when we use a bogus <8;8,1> region. Also, if we do this, there's no convenient place to put the assert(force_writemask_all), which I think is really valuable.<br>
<br>I would rather stick with the fixup at the top of generate_vec4_instruction() for now, and revisit the decision when we get rid of guess_execution_size().<br></div></div>