[Mesa-dev] [HACK] i965/fs: Fix ordering of src0 alpha and oMask in the framebuffer write payload.

Jason Ekstrand jason at jlekstrand.net
Fri Jul 10 09:41:03 PDT 2015


On Fri, Jul 10, 2015 at 5:25 AM, Francisco Jerez <currojerez at riseup.net> wrote:
> Jason Ekstrand <jason at jlekstrand.net> writes:
>
>> On Jul 9, 2015 7:57 AM, "Francisco Jerez" <currojerez at riseup.net> wrote:
>>>
>>> We were passing src0 alpha and oMask in reverse order.  There seems to
>>> be no good way to pass them in the correct order to the new-style
>>> LOAD_PAYLOAD (how surprising) because src0 alpha is per-channel while
>>> oMask is not.  Just split src0 alpha in fixed-width registers and pass
>>> them to LOAD_PAYLOAD as if they were part of the header as work-around
>>> for now.
>>
>> Bah... I came across this when I did the LOAD_PAYLOAD rework but thought it
>> was only theoretical.  I wasn't very familiar with what omask actually did
>> and, since piglit didn't hit it, I wasn't sure if it was a real problem or
>> not.  I probably should have done more digging and written a piglit test at
>> the time. My bad.
>>
>> One solution that I proposed at the time was to turn header_size into
>> header_mask in the obvious way. We can still use 8 bits because we should
>> never have a header source higher than 8.
>>
>
> So your idea is to have one bit per source indicating whether it's
> header-like or per-channel?  I don't think that extends to instructions
> other than LOAD_PAYLOAD (e.g. FB_WRITE) where the same source is at the
> same time header and payload.

You're right, it doesn't.  We really shouldn't be conflating them.  We
should have header_mask and header_present be different fields.  Maybe
use a union to save space, but they should have different semantic
meaning and different names.  We should probably also have a
compr4_mask and get rid of the hackery there.

> One bit per 32B register would extend
> easily but it would be rather ugly to deal with if you want to keep your
> code SIMD width-invariant.
>
> I think if you go with the per-source flag you'll want it to be in its
> own subclass of fs_inst.  With its own subclass you could even have an
> array of per-source sizes determining the number of registers read for
> each source, which would be rather nice for the visitor (no need to
> split vectors into components while passing them to LOAD_PAYLOAD).
>
> Still I think the most elegant solution would be to simply get rid of
> the header/payload distinction by using force_writemask_all and, if it
> proves to be necessary, fix the optimizer to get rid of redundant
> force_writemask_all flags where it doesn't do it already.

I really don't think that's a good long-term or short-term solution.

How badly are you blocking on this?   I don't really have a lot of
extra time to work on this at the moment but can carve some out if
needed.
--jason

>> Thoughts?
>> --Jason
>>
>>> I've written a piglit test that demonstrates the problem by using
>>> gl_SampleMask from a fragment shader with multiple color outputs [1].
>>>
>>> [1] http://lists.freedesktop.org/archives/piglit/2015-July/016499.html
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 26
>> +++++++++++++++++---------
>>>  1 file changed, 17 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>>> index 94d6a58..304ae74 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>>> @@ -1535,6 +1535,19 @@ fs_visitor::emit_single_fb_write(const fs_builder
>> &bld,
>>>        length++;
>>>     }
>>>
>>> +   if (src0_alpha.file != BAD_FILE && color0.file != BAD_FILE) {
>>> +      /* Neat, we need to chop the src0 alpha component and pass it as
>> part of
>>> +       * the header even though it has per-channel semantics, because
>> the next
>>> +       * optional field is header-like and LOAD_PAYLOAD requires all such
>>> +       * fields to form a contiguous segment at the beginning of the
>> message.
>>> +       */
>>> +      for (unsigned i = 0; i < exec_size / 8; i++) {
>>> +         setup_color_payload(&sources[length], src0_alpha, 1, 8,
>>> +                             use_2nd_half || i == 1);
>>> +         length++;
>>> +      }
>>> +   }
>>> +
>>>     prog_data->uses_omask =
>>>        prog->OutputsWritten & BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK);
>>>     if (prog_data->uses_omask) {
>>> @@ -1561,19 +1574,14 @@ fs_visitor::emit_single_fb_write(const fs_builder
>> &bld,
>>>                               offset(this->outputs[0], bld, 3),
>>>                               1, exec_size, false);
>>>        length += 4;
>>> -   } else if (color1.file == BAD_FILE) {
>>> -      if (src0_alpha.file != BAD_FILE) {
>>> -         setup_color_payload(&sources[length], src0_alpha, 1, exec_size,
>> false);
>>> -         length++;
>>> -      }
>>> -
>>> -      setup_color_payload(&sources[length], color0, components,
>>> -                          exec_size, use_2nd_half);
>>> -      length += 4;
>>>     } else {
>>>        setup_color_payload(&sources[length], color0, components,
>>>                            exec_size, use_2nd_half);
>>>        length += 4;
>>> +
>>> +   }
>>> +
>>> +   if (color1.file != BAD_FILE) {
>>>        setup_color_payload(&sources[length], color1, components,
>>>                            exec_size, use_2nd_half);
>>>        length += 4;
>>> --
>>> 2.4.3
>>>


More information about the mesa-dev mailing list