[Mesa-dev] [PATCH 3/7] gallium: add FBFETCH opcode to retrieve the current sample value
Marek Olšák
maraeo at gmail.com
Wed Jan 11 18:28:05 UTC 2017
On Wed, Jan 11, 2017 at 6:45 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> On Fri, Jan 6, 2017 at 6:05 AM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>> On 05.01.2017 17:59, Ilia Mirkin wrote:
>>>
>>> On Thu, Jan 5, 2017 at 11:30 AM, Nicolai Hähnle <nhaehnle at gmail.com>
>>> wrote:
>>>>
>>>> On 05.01.2017 17:02, Ilia Mirkin wrote:
>>>>>
>>>>>
>>>>> On Thu, Jan 5, 2017 at 10:48 AM, Nicolai Hähnle <nhaehnle at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 02.01.2017 21:41, Marek Olšák wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 2, 2017 at 7:01 AM, Ilia Mirkin <imirkin at alum.mit.edu>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
>>>>>>>> ---
>>>>>>>> src/gallium/auxiliary/tgsi/tgsi_info.c | 2 +-
>>>>>>>> src/gallium/docs/source/tgsi.rst | 11 +++++++++++
>>>>>>>> src/gallium/include/pipe/p_shader_tokens.h | 2 +-
>>>>>>>> 3 files changed, 13 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>>>>>> b/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>>>>>> index 37549aa..e34b8c7 100644
>>>>>>>> --- a/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>>>>>> +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
>>>>>>>> @@ -106,7 +106,7 @@ static const struct tgsi_opcode_info
>>>>>>>> opcode_info[TGSI_OPCODE_LAST] =
>>>>>>>> { 1, 3, 0, 0, 0, 0, 0, COMP, "CMP", TGSI_OPCODE_CMP },
>>>>>>>> { 1, 1, 0, 0, 0, 0, 0, CHAN, "SCS", TGSI_OPCODE_SCS },
>>>>>>>> { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXB", TGSI_OPCODE_TXB },
>>>>>>>> - { 0, 1, 0, 0, 0, 0, 1, NONE, "", 69 }, /* removed */
>>>>>>>> + { 1, 1, 0, 0, 0, 0, 0, OTHR, "FBFETCH", TGSI_OPCODE_FBFETCH },
>>>>>>>> { 1, 2, 0, 0, 0, 0, 0, COMP, "DIV", TGSI_OPCODE_DIV },
>>>>>>>> { 1, 2, 0, 0, 0, 0, 0, REPL, "DP2", TGSI_OPCODE_DP2 },
>>>>>>>> { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXL", TGSI_OPCODE_TXL },
>>>>>>>> diff --git a/src/gallium/docs/source/tgsi.rst
>>>>>>>> b/src/gallium/docs/source/tgsi.rst
>>>>>>>> index d2d30b4..accbe1d 100644
>>>>>>>> --- a/src/gallium/docs/source/tgsi.rst
>>>>>>>> +++ b/src/gallium/docs/source/tgsi.rst
>>>>>>>> @@ -2561,6 +2561,17 @@ Resource Access Opcodes
>>>>>>>> image, while .w will contain the number of samples for
>>>>>>>> multi-sampled
>>>>>>>> images.
>>>>>>>>
>>>>>>>> +.. opcode:: FBFETCH - Load data from framebuffer
>>>>>>>> +
>>>>>>>> + Syntax: ``FBFETCH dst, output``
>>>>>>>> +
>>>>>>>> + Example: ``FBFETCH TEMP[0], OUT[0]``
>>>>>>>> +
>>>>>>>> + Returns the color of the current position in the framebuffer from
>>>>>>>> + before this fragment shader invocation. Always returns the same
>>>>>>>> + value from multiple calls for a particular output within a single
>>>>>>>> + invocation.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm not a fan of this last sentence. It's true that we could somehow
>>>>>> bend
>>>>>> things in the compiler to make the sentence true, but
>>>>>>
>>>>>> (a) The statement is clearly false with a straight-forward
>>>>>> implementation
>>>>>> of
>>>>>> the instruction: multiple fragment shaders can be simultaneously
>>>>>> in-flight
>>>>>> on the same pixel/sample. A second FBFETCH could happen after an
>>>>>> earlier
>>>>>> invocation on the same pixel finished and get the new framebuffer
>>>>>> value.
>>>>>>
>>>>>> (b) I'm not aware of an API that actually requires this guarantee.
>>>>>>
>>>>>>
>>>>>>> If the value is always the same, can it be declared as a system value
>>>>>>> instead?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I don't know. I'd remove the statement about the value always being the
>>>>>> same
>>>>>> to begin with. And with an eye to how this actually ends up being
>>>>>> implemented, and possible interactions with
>>>>>> ARB_fragment_shader_interlock,
>>>>>> I'd say it makes sense (and our lives easier) for the TGSI to define
>>>>>> _when_
>>>>>> the framebuffer value is supposed to be read, and for that it makes
>>>>>> sense
>>>>>> to
>>>>>> have an instruction for it.
>>>>>
>>>>>
>>>>>
>>>>> In case it's not obvious, this is primarily for
>>>>> KHR_blend_equation_advanced. It's illegal to use it with overdraw
>>>>> without a barrier first.
>>>>
>>>>
>>>>
>>>> Well, you get an undefined value :)
>>>>
>>>> My point is that there's a plausible implementation of FBFETCH as an
>>>> instruction in which "the same value will be returned within a single
>>>> invocation" is only guaranteed if the application follows the rules
>>>> involving BlendBarrier.
>>>>
>>>>
>>>>> There's also
>>>>> KHR_blend_equation_advanced_coherent and EXT_shader_framebuffer_fetch,
>>>>> which do allow overdraw. I'm not sure how the ordering is specified
>>>>> (or, how it's specified for regular blending). The gl_LastFragData
>>>>> stuff are a non-writable space, and as such, it makes sense that
>>>>> they'd be computed once on invocation and then kept constant
>>>>> throughout the shader run.
>>>>
>>>>
>>>>
>>>> It still leaves open the question of _when_ they should be computed,
>>>> especially in the future when guarantees about the order of fragment
>>>> shaders
>>>> are required (i.e. KHR_blend_equation_advanced_coherent): If you load the
>>>> data too early, you may lose some potential for parallelism because you
>>>> have
>>>> to spend more time waiting for earlier invocations to finish. This
>>>> obviously
>>>> depends on the details of the hardware, but ARB_fragment_shader_interlock
>>>> suggests that it is (or will be?) quite common to have synchronization
>>>> that
>>>> is rather fine-grained.
>>>
>>>
>>> This is a related to the implementation issues I had on nouveau with
>>> the sysval - basically the code looks like
>>>
>>> if (advanced blend is enabled) {
>>> MOV TEMP[0], SV
>>> IF
>>> use(sv)
>>> ELSE
>>> use(sv)
>>> use(sv some more)
>>> }
>>>
>>> And then the st_glsl_to_tgsi copy propagation logic pushes those SV's
>>> down to the uses. This is further complicated by the fact that the
>>> xyzw values are logically independent at the TGSI level. And now I'm
>>> stuck either
>>>
>>> (a) getting the value once at the top of the shader, even if advanced
>>> blend might be disabled entirely
>>> (b) fetching the texture once per bb
>>> (c) figuring out some code motion heuristics which presently don't
>>> exist in nouveau
>>>
>>> So... the operation makes the most sense to me. How about "once you
>>> call FBFETCH, all further invocations should return identical values"?
>>
>>
>> I don't know about "should". What about "may"? This would allow the driver
>> to "cache" the fbfetch result or fetch it only once at the start of the
>> shader, but doesn't force it to do that. So things are straightforward for
>> the driver both when a simple texture-fetch-based implementation should be
>> used _and_ when there really is fixed function hardware. How about this:
>
> I like may.
>
>>
>>> Example: ``FBFETCH TEMP[0], OUT[0]``
>>>
>>> Returns the current color of the current position in the framebuffer.
>>> The result is undefined if the current position is rendered to twice
>>> without a blend barrier in between.
>>
>> I think that's pretty much the condition at the API level, isn't it?
>
> Yes. I just want to point out that this opcode is tied to the API operation.
>
> Marek, are you good with this non-SV approach? If so, I'll respin my
> original series to take Nicolai's suggestions into account.
Yeah, it's OK.
Marek
More information about the mesa-dev
mailing list