[Mesa-dev] [PATCH 3/7] gallium: add FBFETCH opcode to retrieve the current sample value
Ilia Mirkin
imirkin at alum.mit.edu
Wed Jan 11 17:45:25 UTC 2017
On Fri, Jan 6, 2017 at 6:05 AM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
> On 05.01.2017 17:59, Ilia Mirkin wrote:
>>
>> On Thu, Jan 5, 2017 at 11:30 AM, Nicolai Hähnle <nhaehnle at gmail.com>
>> wrote:
>>>
>>> On 05.01.2017 17:02, Ilia Mirkin wrote:
>>>>
>>>>
>>>> On Thu, Jan 5, 2017 at 10:48 AM, Nicolai Hähnle <nhaehnle at gmail.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> On 02.01.2017 21:41, Marek Olšák wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 2, 2017 at 7:01 AM, Ilia Mirkin <imirkin at alum.mit.edu>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
>>>>>>> ---
>>>>>>> src/gallium/auxiliary/tgsi/tgsi_info.c | 2 +-
>>>>>>> src/gallium/docs/source/tgsi.rst | 11 +++++++++++
>>>>>>> src/gallium/include/pipe/p_shader_tokens.h | 2 +-
>>>>>>> 3 files changed, 13 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>>>>> b/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>>>>> index 37549aa..e34b8c7 100644
>>>>>>> --- a/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>>>>> +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>>>>> @@ -106,7 +106,7 @@ static const struct tgsi_opcode_info
>>>>>>> opcode_info[TGSI_OPCODE_LAST] =
>>>>>>> { 1, 3, 0, 0, 0, 0, 0, COMP, "CMP", TGSI_OPCODE_CMP },
>>>>>>> { 1, 1, 0, 0, 0, 0, 0, CHAN, "SCS", TGSI_OPCODE_SCS },
>>>>>>> { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXB", TGSI_OPCODE_TXB },
>>>>>>> - { 0, 1, 0, 0, 0, 0, 1, NONE, "", 69 }, /* removed */
>>>>>>> + { 1, 1, 0, 0, 0, 0, 0, OTHR, "FBFETCH", TGSI_OPCODE_FBFETCH },
>>>>>>> { 1, 2, 0, 0, 0, 0, 0, COMP, "DIV", TGSI_OPCODE_DIV },
>>>>>>> { 1, 2, 0, 0, 0, 0, 0, REPL, "DP2", TGSI_OPCODE_DP2 },
>>>>>>> { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXL", TGSI_OPCODE_TXL },
>>>>>>> diff --git a/src/gallium/docs/source/tgsi.rst
>>>>>>> b/src/gallium/docs/source/tgsi.rst
>>>>>>> index d2d30b4..accbe1d 100644
>>>>>>> --- a/src/gallium/docs/source/tgsi.rst
>>>>>>> +++ b/src/gallium/docs/source/tgsi.rst
>>>>>>> @@ -2561,6 +2561,17 @@ Resource Access Opcodes
>>>>>>> image, while .w will contain the number of samples for
>>>>>>> multi-sampled
>>>>>>> images.
>>>>>>>
>>>>>>> +.. opcode:: FBFETCH - Load data from framebuffer
>>>>>>> +
>>>>>>> + Syntax: ``FBFETCH dst, output``
>>>>>>> +
>>>>>>> + Example: ``FBFETCH TEMP[0], OUT[0]``
>>>>>>> +
>>>>>>> + Returns the color of the current position in the framebuffer from
>>>>>>> + before this fragment shader invocation. Always returns the same
>>>>>>> + value from multiple calls for a particular output within a single
>>>>>>> + invocation.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I'm not a fan of this last sentence. It's true that we could somehow
>>>>> bend
>>>>> things in the compiler to make the sentence true, but
>>>>>
>>>>> (a) The statement is clearly false with a straight-forward
>>>>> implementation
>>>>> of
>>>>> the instruction: multiple fragment shaders can be simultaneously
>>>>> in-flight
>>>>> on the same pixel/sample. A second FBFETCH could happen after an
>>>>> earlier
>>>>> invocation on the same pixel finished and get the new framebuffer
>>>>> value.
>>>>>
>>>>> (b) I'm not aware of an API that actually requires this guarantee.
>>>>>
>>>>>
>>>>>> If the value is always the same, can it be declared as a system value
>>>>>> instead?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I don't know. I'd remove the statement about the value always being the
>>>>> same
>>>>> to begin with. And with an eye to how this actually ends up being
>>>>> implemented, and possible interactions with
>>>>> ARB_fragment_shader_interlock,
>>>>> I'd say it makes sense (and our lives easier) for the TGSI to define
>>>>> _when_
>>>>> the framebuffer value is supposed to be read, and for that it makes
>>>>> sense
>>>>> to
>>>>> have an instruction for it.
>>>>
>>>>
>>>>
>>>> In case it's not obvious, this is primarily for
>>>> KHR_blend_equation_advanced. It's illegal to use it with overdraw
>>>> without a barrier first.
>>>
>>>
>>>
>>> Well, you get an undefined value :)
>>>
>>> My point is that there's a plausible implementation of FBFETCH as an
>>> instruction in which "the same value will be returned within a single
>>> invocation" is only guaranteed if the application follows the rules
>>> involving BlendBarrier.
>>>
>>>
>>>> There's also
>>>> KHR_blend_equation_advanced_coherent and EXT_shader_framebuffer_fetch,
>>>> which do allow overdraw. I'm not sure how the ordering is specified
>>>> (or, how it's specified for regular blending). The gl_LastFragData
>>>> stuff are a non-writable space, and as such, it makes sense that
>>>> they'd be computed once on invocation and then kept constant
>>>> throughout the shader run.
>>>
>>>
>>>
>>> It still leaves open the question of _when_ they should be computed,
>>> especially in the future when guarantees about the order of fragment
>>> shaders
>>> are required (i.e. KHR_blend_equation_advanced_coherent): If you load the
>>> data too early, you may lose some potential for parallelism because you
>>> have
>>> to spend more time waiting for earlier invocations to finish. This
>>> obviously
>>> depends on the details of the hardware, but ARB_fragment_shader_interlock
>>> suggests that it is (or will be?) quite common to have synchronization
>>> that
>>> is rather fine-grained.
>>
>>
>> This is a related to the implementation issues I had on nouveau with
>> the sysval - basically the code looks like
>>
>> if (advanced blend is enabled) {
>> MOV TEMP[0], SV
>> IF
>> use(sv)
>> ELSE
>> use(sv)
>> use(sv some more)
>> }
>>
>> And then the st_glsl_to_tgsi copy propagation logic pushes those SV's
>> down to the uses. This is further complicated by the fact that the
>> xyzw values are logically independent at the TGSI level. And now I'm
>> stuck either
>>
>> (a) getting the value once at the top of the shader, even if advanced
>> blend might be disabled entirely
>> (b) fetching the texture once per bb
>> (c) figuring out some code motion heuristics which presently don't
>> exist in nouveau
>>
>> So... the operation makes the most sense to me. How about "once you
>> call FBFETCH, all further invocations should return identical values"?
>
>
> I don't know about "should". What about "may"? This would allow the driver
> to "cache" the fbfetch result or fetch it only once at the start of the
> shader, but doesn't force it to do that. So things are straightforward for
> the driver both when a simple texture-fetch-based implementation should be
> used _and_ when there really is fixed function hardware. How about this:
I like may.
>
>> Example: ``FBFETCH TEMP[0], OUT[0]``
>>
>> Returns the current color of the current position in the framebuffer.
>> The result is undefined if the current position is rendered to twice
>> without a blend barrier in between.
>
> I think that's pretty much the condition at the API level, isn't it?
Yes. I just want to point out that this opcode is tied to the API operation.
Marek, are you good with this non-SV approach? If so, I'll respin my
original series to take Nicolai's suggestions into account.
Cheers,
-ilia
More information about the mesa-dev
mailing list