<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Feb 20, 2015 at 2:43 PM, Francisco Jerez <span dir="ltr"><<a href="mailto:currojerez@riseup.net" target="_blank">currojerez@riseup.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> writes:<br>
<br>
> One more comment here... This particularly regards your plan of separating<br>
> it into "things that match the destination" and "other things" and not copy<br>
> prop uniforms or immediates into the "other things". There is another case<br>
> we need to handle. On older gens (SNB maybe?) the SIMD16 FB write messages<br>
> are in the order (r0, g0, b0, a0, r1, g1, b1, a1) instead of the usual (r0,<br>
> r1, g0, g1, b0, b1, a0, a1). This isn't going to work if we expect SIMD16<br>
> load_payload instructions to be lowerable to a bunch of SIMD16 MOVs. On<br>
> those platforms, there is a special type of MOV-to-MRF that can move from<br>
> (gN, gN+1) to (mK, mK+4) and we handle that in lower_load_payload right<br>
> now. However, it doesn't follow the standard rules.<br>
> --Jason<br>
><br>
</span>I don't see why it couldn't be handled in roughly the same way you do it<br>
now? Recognize when src[i + 4] is the same 8-wide register as src[i]<br>
shifted by 8 and emit a COMPR4 copy in that case?<br>
</blockquote><div><br></div><div>Sure. I haven't thought about it hard enough to prove it can't be done. Just wanted to make sure you were aware of that particular monkey-wrench.<br></div><div>--Jason<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">
> On Fri, Feb 20, 2015 at 2:10 PM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>><br>
> wrote:<br>
><br>
>><br>
>><br>
>> On Fri, Feb 20, 2015 at 1:09 PM, Francisco Jerez <<a href="mailto:currojerez@riseup.net">currojerez@riseup.net</a>><br>
>> wrote:<br>
>><br>
>>> Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> writes:<br>
>>><br>
>>> > On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez <<a href="mailto:currojerez@riseup.net">currojerez@riseup.net</a><br>
>>> ><br>
>>> > wrote:<br>
>>> ><br>
>>> >> Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> writes:<br>
>>> >><br>
>>> >> > I'm still a little pensive. But<br>
>>> >> ><br>
>>> >> > Reviewed-by: Jason Ekstrand <<a href="mailto:jason.ekstrand@intel.com">jason.ekstrand@intel.com</a>><br>
>>> >> ><br>
>>> >> Thanks.<br>
>>> >><br>
>>> >> > Now for a little aside. I have come to the conclusion that I made a<br>
>>> >> grave<br>
>>> >> > mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I should<br>
>>> have<br>
>>> >> > just subclassed fs_inst for load_payload. The problem is that we<br>
>>> need to<br>
>>> >> > snag a bunch of information for the sources when we create the<br>
>>> >> > load_payload. In particular, we need to know the width of the<br>
>>> source so<br>
>>> >> > that we know how much space it consumes in the payload and we need to<br>
>>> >> know<br>
>>> >> > the information required to properly re-create the mov such as<br>
>>> >> > force_sechalf and force_writemask_all. Really, in order to do things<br>
>>> >> > properly, we need to gather this information *before* we do any<br>
>>> >> > optimizations. The nasty pile of code that you're editing together<br>
>>> with<br>
>>> >> > the "effective_width" parameter is a lame attempt to<br>
>>> capture/reconstruct<br>
>>> >> > this information. Really, we should just subclass, capture the<br>
>>> >> information<br>
>>> >> > up-front, and do it properly.<br>
>>> >> ><br>
>>> >> Yes, absolutely, this lowering pass is a real mess. There are four<br>
>>> more<br>
>>> >> unreviewed patches in the mailing list fixing bugs of the metadata<br>
>>> >> guessing of lower_load_payload() [1][2][3][4], you may want to take a<br>
>>> >> look at them. There are more bugs I'm aware of but it didn't seem<br>
>>> >> practical to fix them.<br>
>>> >><br>
>>> ><br>
>>> > Yeah, Matt pointed me at those. I'll give them a look later today.<br>
>>> ><br>
>>> ><br>
>>> >> That said, I don't think it would be worth subclassing fs_inst.<br>
>>> >><br>
>>> >> My suggestion would have been to keep it simple and lower LOAD_PAYLOAD<br>
>>> >> into a series of MOVs with force_writemask_all enabled in all cases,<br>
>>> >> simply rely on the optimizer to eliminate those where possible. Then<br>
>>> >> get rid of the metadata and effective_width guessing. Instead agree on<br>
>>> >> immediates and uniforms being exec_size-wide by convention<br>
>>> >> (i.e. LOAD_PAYLOAD's exec_size rather than the original instruction's),<br>
>>> >> then prevent constant propagation from propagating an immediate load<br>
>>> >> into a LOAD_PAYLOAD if it would lead to a change in the semantics of<br>
>>> the<br>
>>> >> program, and maybe just run constant propagation after lowering so we<br>
>>> >> can be sure those cases are cleaned up properly where register coalesce<br>
>>> >> isn't enough.<br>
>>> >><br>
>>> ><br>
>>> > First off, simply setting force_writemask_all isn't an option. Consider<br>
>>> > the following scenario:<br>
>>> ><br>
>>> > a = foo;<br>
>>> > if (bar) {<br>
>>> > b = 7;<br>
>>> > } else {<br>
>>> > use(a);<br>
>>> > }<br>
>>> > use(b);<br>
>>> ><br>
>>> > If "use(a)" is the last use of the variable a, then the live ranges of a<br>
>>> > and b don't actually over-lap and we can assign a and b to the same<br>
>>> > register. However, if force_writemask_all is set on b, it will destroy<br>
>>> the<br>
>>> > value in a before its last use. Right now, we don't actually do this<br>
>>> > because our liveness analysis pass flattens everything so it will think<br>
>>> > that b and a over-lap even though they don't. However, if we ever<br>
>>> decide<br>
>>> > to make the liveness information more accurate, having a pile of<br>
>>> > force_writemask_all instructions will be a major problem. So, while it<br>
>>> is<br>
>>> > *technically* safe for now, it's a really bad idea in the long-term.<br>
>>> ><br>
>>> The thing is we *will* have to deal with that scenario. Building a<br>
>>> message payload inherently involves crossing channel boundaries (because<br>
>>> of headers, unsupported SIMD modes by some shared functions, etc.). I'd<br>
>>> say it's a bug in the liveness analysis pass if it wouldn't consider the<br>
>>> liveness interval of a and b to overlap in your example if the<br>
>>> assignment of b had force_writemask_all set.<br>
>>><br>
>><br>
>> Yes, I'm aware of that. However, the part of that register that has to<br>
>> squash everything is usually only a couple of registers while the entire<br>
>> payload may be up to 13 (if I remmeber correctly). We'd rather not have to<br>
>> squash everything if we can. All that being said, our liveness/register<br>
>> allocation can't handle this and making register allocation handle parts of<br>
>> registers interfering but not other bits is going to be a real pain. Maybe<br>
>> not even worth it. If you'd rather force_writemask_all everything, I'll<br>
>> sign off on that for now. I just wanted to point out that it may not be a<br>
>> good permanent solution.<br>
>><br>
>><br>
>>> A reasonable compromise might be to leave it up to the caller whether to<br>
>>> set the force_writemask_all and force_sechalf flags or not. I.e. just<br>
>>> copy the same flags set on the LOAD_PAYLOAD instruction to the MOV<br>
>>> instructions. That would still allow reducing the liveness intervals in<br>
>>> cases where we know that the payload respects channel boundaries.<br>
>>><br>
>>> Tracking the flag information per-register in cases where the same<br>
>>> payload has well- and ill-behaved values seems rather premature and<br>
>>> rather useless to me because the optimizer is likely to be able to get<br>
>>> rid of redundant copies with force_writemask_all -- register coalesce<br>
>>> is doing this already AFAIK, maybe by accident.<br>
>>><br>
>><br>
>> Sure. I'm not worried about our ability to optimize. I'm primarily<br>
>> worried about register pressure. Like I said, it's an OK solution in the<br>
>> temporary. I think we'll want to give it more thought in the long-run but<br>
>> that's going to interact a lot with how we do register allocation etc.<br>
>> --Jason<br>
>><br>
>><br>
>>> > Regarding the other suggestion of just requiring width == exec_size for<br>
>>> > immediates and uniforms, that seems pretty reasonable to me. I'd like<br>
>>> to<br>
>>> > know what it will do to optimizations, but it seems ok initially. I'm<br>
>>> > still a bigger fan of just subclassing and stashing everything we need<br>
>>> to<br>
>>> > know up-front. If we do it right, the only things that will actually<br>
>>> need<br>
>>> > to know about the subclass are the function for creating a LOAD_PAYLOAD<br>
>>> and<br>
>>> > lower_load_payloads.<br>
>>> ><br>
>>> > --Jason<br>
>>> ><br>
>>> ><br>
>>> >><br>
>>> >> [1]<br>
>>> >><br>
>>> <a href="http://lists.freedesktop.org/archives/mesa-dev/2015-January/074614.html" target="_blank">http://lists.freedesktop.org/archives/mesa-dev/2015-January/074614.html</a><br>
>>> >> [2]<br>
>>> >><br>
>>> <a href="http://lists.freedesktop.org/archives/mesa-dev/2015-February/076094.html" target="_blank">http://lists.freedesktop.org/archives/mesa-dev/2015-February/076094.html</a><br>
>>> >> [3]<br>
>>> >><br>
>>> <a href="http://lists.freedesktop.org/archives/mesa-dev/2015-February/076097.html" target="_blank">http://lists.freedesktop.org/archives/mesa-dev/2015-February/076097.html</a><br>
>>> >> [4]<br>
>>> >><br>
>>> <a href="http://lists.freedesktop.org/archives/mesa-dev/2015-February/076098.html" target="_blank">http://lists.freedesktop.org/archives/mesa-dev/2015-February/076098.html</a><br>
>>> >><br>
>>> >><br>
>>> >> > --Jason<br>
>>> >> ><br>
>>> >> > On Thu, Feb 19, 2015 at 1:53 PM, Jason Ekstrand <<br>
>>> <a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>><br>
>>> >> > wrote:<br>
>>> >> ><br>
>>> >> >><br>
>>> >> >><br>
>>> >> >> On Thu, Feb 19, 2015 at 1:25 PM, Francisco Jerez <<br>
>>> <a href="mailto:currojerez@riseup.net">currojerez@riseup.net</a><br>
>>> >> ><br>
>>> >> >> wrote:<br>
>>> >> >><br>
>>> >> >>> Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> writes:<br>
>>> >> >>><br>
>>> >> >>> > On Thu, Feb 19, 2015 at 12:13 PM, Francisco Jerez <<br>
>>> >> >>> <a href="mailto:currojerez@riseup.net">currojerez@riseup.net</a>><br>
>>> >> >>> > wrote:<br>
>>> >> >>> ><br>
>>> >> >>> >> Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> writes:<br>
>>> >> >>> >><br>
>>> >> >>> >> > On Fri, Feb 6, 2015 at 4:01 PM, Francisco Jerez <<br>
>>> >> >>> <a href="mailto:currojerez@riseup.net">currojerez@riseup.net</a>><br>
>>> >> >>> >> > wrote:<br>
>>> >> >>> >> ><br>
>>> >> >>> >> >> Hey Matt,<br>
>>> >> >>> >> >><br>
>>> >> >>> >> >> Matt Turner <<a href="mailto:mattst88@gmail.com">mattst88@gmail.com</a>> writes:<br>
>>> >> >>> >> >><br>
>>> >> >>> >> >> > On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez <<br>
>>> >> >>> >> <a href="mailto:currojerez@riseup.net">currojerez@riseup.net</a>><br>
>>> >> >>> >> >> wrote:<br>
>>> >> >>> >> >> >> MRFs cannot be read from anyway so they cannot possibly<br>
>>> be a<br>
>>> >> >>> valid<br>
>>> >> >>> >> >> >> source of LOAD_PAYLOAD.<br>
>>> >> >>> >> >> >> ---<br>
>>> >> >>> >> >> ><br>
>>> >> >>> >> >> > The function only seems to test inst->dst.file == MRF. I<br>
>>> don't<br>
>>> >> >>> see any<br>
>>> >> >>> >> >> > code for handling MRF sources. What am I missing?<br>
>>> >> >>> >> >><br>
>>> >> >>> >> >> That test is for "handling" MRF sources -- More precisely,<br>
>>> it's<br>
>>> >> >>> >> >> collecting the writemask and half flags for MRF writes,<br>
>>> which can<br>
>>> >> >>> only<br>
>>> >> >>> >> >> possibly be useful if we're going to use them later on to<br>
>>> read<br>
>>> >> >>> something<br>
>>> >> >>> >> >> out of an MRF into a payload, which we shouldn't be doing in<br>
>>> the<br>
>>> >> >>> first<br>
>>> >> >>> >> >> place.<br>
>>> >> >>> >> >><br>
>>> >> >>> >> >> Aside from simplifying the function somewhat, that allows us<br>
>>> to<br>
>>> >> >>> drop the<br>
>>> >> >>> >> >> 16 register gap reserved for MRFs at register offset zero,<br>
>>> what<br>
>>> >> will<br>
>>> >> >>> >> >> allow us to drop the vgrf_to_reg[] offset calculation<br>
>>> completely<br>
>>> >> >>> (also<br>
>>> >> >>> >> >> in split_virtual_grfs()) in a future patch (not sent for<br>
>>> review<br>
>>> >> >>> yet).<br>
>>> >> >>> >> >><br>
>>> >> >>> >> ><br>
>>> >> >>> >> > No, we do read from MRF's sort-of... Send messages have an<br>
>>> >> implicit<br>
>>> >> >>> >> "read"<br>
>>> >> >>> >> > from an MRF.<br>
>>> >> >>> >><br>
>>> >> >>> >> Heh, and that's pretty much the only way you "read" from it.<br>
>>> >> >>> >><br>
>>> >> >>> >> > This was written precicely so that we could use LOAD_PAYLOAD<br>
>>> >> >>> >> > to build MRF payloads. We do on pre-GEN6.<br>
>>> >> >>> >> ><br>
>>> >> >>> >> I'm aware, but you don't need any of this meta-data to<br>
>>> LOAD_PAYLOAD<br>
>>> >> >>> >> *into* an MRF, and LOAD_PAYLOAD with an MRF as source should be<br>
>>> >> illegal<br>
>>> >> >>> >> anyway.<br>
>>> >> >>> >><br>
>>> >> >>> ><br>
>>> >> >>> > And no one is using it that way. All of the metadata checks you<br>
>>> are<br>
>>> >> >>> > deleting are checks on the *destination*.<br>
>>> >> >>> ><br>
>>> >> >>><br>
>>> >> >>> Didn't you write this code yourself? The only use for the<br>
>>> collected<br>
>>> >> >>> metadata is initializing the instruction flags of the MOVs<br>
>>> subsequent<br>
>>> >> >>> LOAD_PAYLOAD instructions are lowered to, based on the metadata<br>
>>> already<br>
>>> >> >>> collected for its source registers, which can never be MRFs, so the<br>
>>> >> >>> metadata you collect from MRF writes is never actually used.<br>
>>> >> >>><br>
>>> >> >><br>
>>> >> >> Right... I misred something initially. Yes, we should never be<br>
>>> tracking<br>
>>> >> >> MRF's as a source of a LOAD_PAYLOAD. I'll give it a better look a<br>
>>> bit<br>
>>> >> >> later, but it looks better.<br>
>>> >> >><br>
>>> >><br>
>>><br>
>><br>
>><br>
</div></div></blockquote></div><br></div></div>