[Mesa-dev] [PATCH 3/7] gallium: add FBFETCH opcode to retrieve the current sample value

Ilia Mirkin imirkin at alum.mit.edu
Thu Jan 5 16:59:55 UTC 2017


On Thu, Jan 5, 2017 at 11:30 AM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
> On 05.01.2017 17:02, Ilia Mirkin wrote:
>>
>> On Thu, Jan 5, 2017 at 10:48 AM, Nicolai Hähnle <nhaehnle at gmail.com>
>> wrote:
>>>
>>> On 02.01.2017 21:41, Marek Olšák wrote:
>>>>
>>>>
>>>> On Mon, Jan 2, 2017 at 7:01 AM, Ilia Mirkin <imirkin at alum.mit.edu>
>>>> wrote:
>>>>>
>>>>>
>>>>> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
>>>>> ---
>>>>>  src/gallium/auxiliary/tgsi/tgsi_info.c     |  2 +-
>>>>>  src/gallium/docs/source/tgsi.rst           | 11 +++++++++++
>>>>>  src/gallium/include/pipe/p_shader_tokens.h |  2 +-
>>>>>  3 files changed, 13 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>>> b/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>>> index 37549aa..e34b8c7 100644
>>>>> --- a/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>>> +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>>> @@ -106,7 +106,7 @@ static const struct tgsi_opcode_info
>>>>> opcode_info[TGSI_OPCODE_LAST] =
>>>>>     { 1, 3, 0, 0, 0, 0, 0, COMP, "CMP", TGSI_OPCODE_CMP },
>>>>>     { 1, 1, 0, 0, 0, 0, 0, CHAN, "SCS", TGSI_OPCODE_SCS },
>>>>>     { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXB", TGSI_OPCODE_TXB },
>>>>> -   { 0, 1, 0, 0, 0, 0, 1, NONE, "", 69 },      /* removed */
>>>>> +   { 1, 1, 0, 0, 0, 0, 0, OTHR, "FBFETCH", TGSI_OPCODE_FBFETCH },
>>>>>     { 1, 2, 0, 0, 0, 0, 0, COMP, "DIV", TGSI_OPCODE_DIV },
>>>>>     { 1, 2, 0, 0, 0, 0, 0, REPL, "DP2", TGSI_OPCODE_DP2 },
>>>>>     { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXL", TGSI_OPCODE_TXL },
>>>>> diff --git a/src/gallium/docs/source/tgsi.rst
>>>>> b/src/gallium/docs/source/tgsi.rst
>>>>> index d2d30b4..accbe1d 100644
>>>>> --- a/src/gallium/docs/source/tgsi.rst
>>>>> +++ b/src/gallium/docs/source/tgsi.rst
>>>>> @@ -2561,6 +2561,17 @@ Resource Access Opcodes
>>>>>    image, while .w will contain the number of samples for multi-sampled
>>>>>    images.
>>>>>
>>>>> +.. opcode:: FBFETCH - Load data from framebuffer
>>>>> +
>>>>> +  Syntax: ``FBFETCH dst, output``
>>>>> +
>>>>> +  Example: ``FBFETCH TEMP[0], OUT[0]``
>>>>> +
>>>>> +  Returns the color of the current position in the framebuffer from
>>>>> +  before this fragment shader invocation. Always returns the same
>>>>> +  value from multiple calls for a particular output within a single
>>>>> +  invocation.
>>>
>>>
>>>
>>> I'm not a fan of this last sentence. It's true that we could somehow bend
>>> things in the compiler to make the sentence true, but
>>>
>>> (a) The statement is clearly false with a straight-forward implementation
>>> of
>>> the instruction: multiple fragment shaders can be simultaneously
>>> in-flight
>>> on the same pixel/sample. A second FBFETCH could happen after an earlier
>>> invocation on the same pixel finished and get the new framebuffer value.
>>>
>>> (b) I'm not aware of an API that actually requires this guarantee.
>>>
>>>
>>>> If the value is always the same, can it be declared as a system value
>>>> instead?
>>>
>>>
>>>
>>> I don't know. I'd remove the statement about the value always being the
>>> same
>>> to begin with. And with an eye to how this actually ends up being
>>> implemented, and possible interactions with
>>> ARB_fragment_shader_interlock,
>>> I'd say it makes sense (and our lives easier) for the TGSI to define
>>> _when_
>>> the framebuffer value is supposed to be read, and for that it makes sense
>>> to
>>> have an instruction for it.
>>
>>
>> In case it's not obvious, this is primarily for
>> KHR_blend_equation_advanced. It's illegal to use it with overdraw
>> without a barrier first.
>
>
> Well, you get an undefined value :)
>
> My point is that there's a plausible implementation of FBFETCH as an
> instruction in which "the same value will be returned within a single
> invocation" is only guaranteed if the application follows the rules
> involving BlendBarrier.
>
>
>> There's also
>> KHR_blend_equation_advanced_coherent and EXT_shader_framebuffer_fetch,
>> which do allow overdraw. I'm not sure how the ordering is specified
>> (or, how it's specified for regular blending). The gl_LastFragData
>> stuff are a non-writable space, and as such, it makes sense that
>> they'd be computed once on invocation and then kept constant
>> throughout the shader run.
>
>
> It still leaves open the question of _when_ they should be computed,
> especially in the future when guarantees about the order of fragment shaders
> are required (i.e. KHR_blend_equation_advanced_coherent): If you load the
> data too early, you may lose some potential for parallelism because you have
> to spend more time waiting for earlier invocations to finish. This obviously
> depends on the details of the hardware, but ARB_fragment_shader_interlock
> suggests that it is (or will be?) quite common to have synchronization that
> is rather fine-grained.

This is a related to the implementation issues I had on nouveau with
the sysval - basically the code looks like

if (advanced blend is enabled) {
MOV TEMP[0], SV
IF
  use(sv)
ELSE
  use(sv)
use(sv some more)
}

And then the st_glsl_to_tgsi copy propagation logic pushes those SV's
down to the uses. This is further complicated by the fact that the
xyzw values are logically independent at the TGSI level. And now I'm
stuck either

(a) getting the value once at the top of the shader, even if advanced
blend might be disabled entirely
(b) fetching the texture once per bb
(c) figuring out some code motion heuristics which presently don't
exist in nouveau

So... the operation makes the most sense to me. How about "once you
call FBFETCH, all further invocations should return identical values"?
This could still present difficulties in a situation like

if (a) {
  fbfetch();
}
do stuff
if (b) {
  fbfetch();
}

But I don't have any concrete problems with it I can identify, just
fear of the unknown.

Cheers,

  -ilia


More information about the mesa-dev mailing list