[Mesa-dev] [PATCH 3/7] gallium: add FBFETCH opcode to retrieve the current sample value

Nicolai Hähnle nhaehnle at gmail.com
Fri Jan 6 11:05:26 UTC 2017


On 05.01.2017 17:59, Ilia Mirkin wrote:
> On Thu, Jan 5, 2017 at 11:30 AM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>> On 05.01.2017 17:02, Ilia Mirkin wrote:
>>>
>>> On Thu, Jan 5, 2017 at 10:48 AM, Nicolai Hähnle <nhaehnle at gmail.com>
>>> wrote:
>>>>
>>>> On 02.01.2017 21:41, Marek Olšák wrote:
>>>>>
>>>>>
>>>>> On Mon, Jan 2, 2017 at 7:01 AM, Ilia Mirkin <imirkin at alum.mit.edu>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
>>>>>> ---
>>>>>>  src/gallium/auxiliary/tgsi/tgsi_info.c     |  2 +-
>>>>>>  src/gallium/docs/source/tgsi.rst           | 11 +++++++++++
>>>>>>  src/gallium/include/pipe/p_shader_tokens.h |  2 +-
>>>>>>  3 files changed, 13 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>>>> b/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>>>> index 37549aa..e34b8c7 100644
>>>>>> --- a/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>>>> +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>>>> @@ -106,7 +106,7 @@ static const struct tgsi_opcode_info
>>>>>> opcode_info[TGSI_OPCODE_LAST] =
>>>>>>     { 1, 3, 0, 0, 0, 0, 0, COMP, "CMP", TGSI_OPCODE_CMP },
>>>>>>     { 1, 1, 0, 0, 0, 0, 0, CHAN, "SCS", TGSI_OPCODE_SCS },
>>>>>>     { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXB", TGSI_OPCODE_TXB },
>>>>>> -   { 0, 1, 0, 0, 0, 0, 1, NONE, "", 69 },      /* removed */
>>>>>> +   { 1, 1, 0, 0, 0, 0, 0, OTHR, "FBFETCH", TGSI_OPCODE_FBFETCH },
>>>>>>     { 1, 2, 0, 0, 0, 0, 0, COMP, "DIV", TGSI_OPCODE_DIV },
>>>>>>     { 1, 2, 0, 0, 0, 0, 0, REPL, "DP2", TGSI_OPCODE_DP2 },
>>>>>>     { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXL", TGSI_OPCODE_TXL },
>>>>>> diff --git a/src/gallium/docs/source/tgsi.rst
>>>>>> b/src/gallium/docs/source/tgsi.rst
>>>>>> index d2d30b4..accbe1d 100644
>>>>>> --- a/src/gallium/docs/source/tgsi.rst
>>>>>> +++ b/src/gallium/docs/source/tgsi.rst
>>>>>> @@ -2561,6 +2561,17 @@ Resource Access Opcodes
>>>>>>    image, while .w will contain the number of samples for multi-sampled
>>>>>>    images.
>>>>>>
>>>>>> +.. opcode:: FBFETCH - Load data from framebuffer
>>>>>> +
>>>>>> +  Syntax: ``FBFETCH dst, output``
>>>>>> +
>>>>>> +  Example: ``FBFETCH TEMP[0], OUT[0]``
>>>>>> +
>>>>>> +  Returns the color of the current position in the framebuffer from
>>>>>> +  before this fragment shader invocation. Always returns the same
>>>>>> +  value from multiple calls for a particular output within a single
>>>>>> +  invocation.
>>>>
>>>>
>>>>
>>>> I'm not a fan of this last sentence. It's true that we could somehow bend
>>>> things in the compiler to make the sentence true, but
>>>>
>>>> (a) The statement is clearly false with a straight-forward implementation
>>>> of
>>>> the instruction: multiple fragment shaders can be simultaneously
>>>> in-flight
>>>> on the same pixel/sample. A second FBFETCH could happen after an earlier
>>>> invocation on the same pixel finished and get the new framebuffer value.
>>>>
>>>> (b) I'm not aware of an API that actually requires this guarantee.
>>>>
>>>>
>>>>> If the value is always the same, can it be declared as a system value
>>>>> instead?
>>>>
>>>>
>>>>
>>>> I don't know. I'd remove the statement about the value always being the
>>>> same
>>>> to begin with. And with an eye to how this actually ends up being
>>>> implemented, and possible interactions with
>>>> ARB_fragment_shader_interlock,
>>>> I'd say it makes sense (and our lives easier) for the TGSI to define
>>>> _when_
>>>> the framebuffer value is supposed to be read, and for that it makes sense
>>>> to
>>>> have an instruction for it.
>>>
>>>
>>> In case it's not obvious, this is primarily for
>>> KHR_blend_equation_advanced. It's illegal to use it with overdraw
>>> without a barrier first.
>>
>>
>> Well, you get an undefined value :)
>>
>> My point is that there's a plausible implementation of FBFETCH as an
>> instruction in which "the same value will be returned within a single
>> invocation" is only guaranteed if the application follows the rules
>> involving BlendBarrier.
>>
>>
>>> There's also
>>> KHR_blend_equation_advanced_coherent and EXT_shader_framebuffer_fetch,
>>> which do allow overdraw. I'm not sure how the ordering is specified
>>> (or, how it's specified for regular blending). The gl_LastFragData
>>> stuff are a non-writable space, and as such, it makes sense that
>>> they'd be computed once on invocation and then kept constant
>>> throughout the shader run.
>>
>>
>> It still leaves open the question of _when_ they should be computed,
>> especially in the future when guarantees about the order of fragment shaders
>> are required (i.e. KHR_blend_equation_advanced_coherent): If you load the
>> data too early, you may lose some potential for parallelism because you have
>> to spend more time waiting for earlier invocations to finish. This obviously
>> depends on the details of the hardware, but ARB_fragment_shader_interlock
>> suggests that it is (or will be?) quite common to have synchronization that
>> is rather fine-grained.
>
> This is a related to the implementation issues I had on nouveau with
> the sysval - basically the code looks like
>
> if (advanced blend is enabled) {
> MOV TEMP[0], SV
> IF
>   use(sv)
> ELSE
>   use(sv)
> use(sv some more)
> }
>
> And then the st_glsl_to_tgsi copy propagation logic pushes those SV's
> down to the uses. This is further complicated by the fact that the
> xyzw values are logically independent at the TGSI level. And now I'm
> stuck either
>
> (a) getting the value once at the top of the shader, even if advanced
> blend might be disabled entirely
> (b) fetching the texture once per bb
> (c) figuring out some code motion heuristics which presently don't
> exist in nouveau
>
> So... the operation makes the most sense to me. How about "once you
> call FBFETCH, all further invocations should return identical values"?

I don't know about "should". What about "may"? This would allow the 
driver to "cache" the fbfetch result or fetch it only once at the start 
of the shader, but doesn't force it to do that. So things are 
straightforward for the driver both when a simple texture-fetch-based 
implementation should be used _and_ when there really is fixed function 
hardware. How about this:

 > Example: ``FBFETCH TEMP[0], OUT[0]``
 >
 > Returns the current color of the current position in the framebuffer.
 > The result is undefined if the current position is rendered to twice
 > without a blend barrier in between.

I think that's pretty much the condition at the API level, isn't it?

Nicolai

> This could still present difficulties in a situation like
>
> if (a) {
>   fbfetch();
> }
> do stuff
> if (b) {
>   fbfetch();
> }
>
> But I don't have any concrete problems with it I can identify, just
> fear of the unknown.
>
> Cheers,
>
>   -ilia
>


More information about the mesa-dev mailing list