[Mesa-dev] [PATCH 3/7] gallium: add FBFETCH opcode to retrieve the current sample value

Nicolai Hähnle nhaehnle at gmail.com
Thu Jan 5 16:30:41 UTC 2017


On 05.01.2017 17:02, Ilia Mirkin wrote:
> On Thu, Jan 5, 2017 at 10:48 AM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>> On 02.01.2017 21:41, Marek Olšák wrote:
>>>
>>> On Mon, Jan 2, 2017 at 7:01 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>>
>>>> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
>>>> ---
>>>>  src/gallium/auxiliary/tgsi/tgsi_info.c     |  2 +-
>>>>  src/gallium/docs/source/tgsi.rst           | 11 +++++++++++
>>>>  src/gallium/include/pipe/p_shader_tokens.h |  2 +-
>>>>  3 files changed, 13 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>> b/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>> index 37549aa..e34b8c7 100644
>>>> --- a/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>> +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>> @@ -106,7 +106,7 @@ static const struct tgsi_opcode_info
>>>> opcode_info[TGSI_OPCODE_LAST] =
>>>>     { 1, 3, 0, 0, 0, 0, 0, COMP, "CMP", TGSI_OPCODE_CMP },
>>>>     { 1, 1, 0, 0, 0, 0, 0, CHAN, "SCS", TGSI_OPCODE_SCS },
>>>>     { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXB", TGSI_OPCODE_TXB },
>>>> -   { 0, 1, 0, 0, 0, 0, 1, NONE, "", 69 },      /* removed */
>>>> +   { 1, 1, 0, 0, 0, 0, 0, OTHR, "FBFETCH", TGSI_OPCODE_FBFETCH },
>>>>     { 1, 2, 0, 0, 0, 0, 0, COMP, "DIV", TGSI_OPCODE_DIV },
>>>>     { 1, 2, 0, 0, 0, 0, 0, REPL, "DP2", TGSI_OPCODE_DP2 },
>>>>     { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXL", TGSI_OPCODE_TXL },
>>>> diff --git a/src/gallium/docs/source/tgsi.rst
>>>> b/src/gallium/docs/source/tgsi.rst
>>>> index d2d30b4..accbe1d 100644
>>>> --- a/src/gallium/docs/source/tgsi.rst
>>>> +++ b/src/gallium/docs/source/tgsi.rst
>>>> @@ -2561,6 +2561,17 @@ Resource Access Opcodes
>>>>    image, while .w will contain the number of samples for multi-sampled
>>>>    images.
>>>>
>>>> +.. opcode:: FBFETCH - Load data from framebuffer
>>>> +
>>>> +  Syntax: ``FBFETCH dst, output``
>>>> +
>>>> +  Example: ``FBFETCH TEMP[0], OUT[0]``
>>>> +
>>>> +  Returns the color of the current position in the framebuffer from
>>>> +  before this fragment shader invocation. Always returns the same
>>>> +  value from multiple calls for a particular output within a single
>>>> +  invocation.
>>
>>
>> I'm not a fan of this last sentence. It's true that we could somehow bend
>> things in the compiler to make the sentence true, but
>>
>> (a) The statement is clearly false with a straight-forward implementation of
>> the instruction: multiple fragment shaders can be simultaneously in-flight
>> on the same pixel/sample. A second FBFETCH could happen after an earlier
>> invocation on the same pixel finished and get the new framebuffer value.
>>
>> (b) I'm not aware of an API that actually requires this guarantee.
>>
>>
>>> If the value is always the same, can it be declared as a system value
>>> instead?
>>
>>
>> I don't know. I'd remove the statement about the value always being the same
>> to begin with. And with an eye to how this actually ends up being
>> implemented, and possible interactions with ARB_fragment_shader_interlock,
>> I'd say it makes sense (and our lives easier) for the TGSI to define _when_
>> the framebuffer value is supposed to be read, and for that it makes sense to
>> have an instruction for it.
>
> In case it's not obvious, this is primarily for
> KHR_blend_equation_advanced. It's illegal to use it with overdraw
> without a barrier first.

Well, you get an undefined value :)

My point is that there's a plausible implementation of FBFETCH as an 
instruction in which "the same value will be returned within a single 
invocation" is only guaranteed if the application follows the rules 
involving BlendBarrier.


> There's also
> KHR_blend_equation_advanced_coherent and EXT_shader_framebuffer_fetch,
> which do allow overdraw. I'm not sure how the ordering is specified
> (or, how it's specified for regular blending). The gl_LastFragData
> stuff are a non-writable space, and as such, it makes sense that
> they'd be computed once on invocation and then kept constant
> throughout the shader run.

It still leaves open the question of _when_ they should be computed, 
especially in the future when guarantees about the order of fragment 
shaders are required (i.e. KHR_blend_equation_advanced_coherent): If you 
load the data too early, you may lose some potential for parallelism 
because you have to spend more time waiting for earlier invocations to 
finish. This obviously depends on the details of the hardware, but 
ARB_fragment_shader_interlock suggests that it is (or will be?) quite 
common to have synchronization that is rather fine-grained.

Nicolai


> For nouveau, the sysval route actually introduces a number of
> annoyances which are nouveau-private. I wasn't going to complain about
> it, as nouveau impl details shouldn't dictate the TGSI, but
> implementation-wise, the op is definitely simpler for me.
>
> Happy to go with whatever route you guys think is best. The reason I'm
> looking for KHR_blend_equation_advanced is that my understanding is
> that Android requires it to expose ES 3.1 for some odd reason.
>
> Cheers,
>
>   -ilia
>


More information about the mesa-dev mailing list