[Mesa-dev] [PATCH 1/7] gallium: add pipe_blend_state::srgb_enable and the CAP

Wed Jun 14 20:07:31 UTC 2017

On Wed, Jun 14, 2017 at 9:45 PM, Jose Fonseca <jfonseca at vmware.com> wrote:
> On 14/06/17 17:12, Marek Olšák wrote:
>>
>> On Tue, Jun 13, 2017 at 3:43 PM, Marek Olšák <maraeo at gmail.com> wrote:
>>>
>>> On Tue, Jun 13, 2017 at 1:40 PM, Jose Fonseca <jfonseca at vmware.com>
>>> wrote:
>>>>
>>>> On 12/06/17 22:56, Marek Olšák wrote:
>>>>>
>>>>>
>>>>> On Mon, Jun 12, 2017 at 10:43 PM, Jose Fonseca <jfonseca at vmware.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 12/06/17 21:25, Marek Olšák wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 12, 2017 at 9:51 PM, Jose Fonseca <jfonseca at vmware.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> How does this help exactly?
>>>>>>>>
>>>>>>>> Are applications actually rendering to the same FBO w/ and w/o SRGB
>>>>>>>> decoding?
>>>>>>>>
>>>>>>>> Or is the problem here GL_SRGB_WRITE state getting spuriously
>>>>>>>> dirtied
>>>>>>>> by
>>>>>>>> the
>>>>>>>> application?
>>>>>>>>
>>>>>>>> And even if they do, why is toggling surface views in framebuffer
>>>>>>>> state
>>>>>>>> so
>>>>>>>> expensive?
>>>>>>>>
>>>>>>>> I don't object per se, but it looks like an unusual thing to
>>>>>>>> optimize
>>>>>>>> for.
>>>>>>>>
>>>>>>>
>>>>>>> set_framebuffer_state is basically a memory barrier. We have
>>>>>>> different
>>>>>>> caches between FB and textures and we have to flush them when a
>>>>>>> texture is unbound from the framebuffer and set as a sampler view. To
>>>>>>> keep thing simple, set_framebuffer_state is the barrier. When we
>>>>>>> change the blend state, the barrier is avoided. Note that the barrier
>>>>>>> makes set_framebuffer_state a function that is always GPU-bound.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I see.
>>>>>>
>>>>>> And you're sure that the incoming set_framebuffer_state are not
>>>>>> spurious?
>>>>>>
>>>>>> I know cso_context always eliminates redundant
>>>>>> pipe_context::set_framebuffer_state calls, but it is perhaps possible
>>>>>> that
>>>>>> Mesa state tracker is reseting the framebuffer state with different
>>>>>> surface
>>>>>> views, but that in practice are exactly the same as the previous one?
>>>>>>
>>>>>> Like I said, it seems odd apps are doing this: it doesn't make much
>>>>>> sense
>>>>>> to
>>>>>> me to change colorspace of the fragments between draws. (Unless some
>>>>>> of
>>>>>> the
>>>>>> assets are already in SRGB and the app is trying to be too smart for
>>>>>> its
>>>>>> own
>>>>>> good to avoid the sRGB->RGB->sRGB.)  It seems much more likely that
>>>>>> these
>>>>>> framebuffer state changes are self-inflicted some where in our stack,
>>>>>> than
>>>>>> something truly demanded by the app.
>>>>>>
>>>>>> And if that's the case and we can fix it, then it would be a better
>>>>>> solution
>>>>>> all around.
>>>>>
>>>>>
>>>>>
>>>>> Yeah the funny part and the reason is that we have a microbenchmark in
>>>>> piglit (drawoverhead) changing this state between draw calls. :)
>>>>>
>>>>> Marek
>>>>>
>>>>
>>>> I couldn't find that piglit microbenchmark.  mesademos has
>>>> src/perf/drawoverhead.c but it doesn't set GL_SRGB_WRITE.  So if fbo is
>>>> changing internally, then it's a perf bug in Mesa state tracker.
>>>>
>>>> Unless it's mimicking something that real apps do, then it's probably
>>>> better
>>>> to fix the microbenchmark to use a more realistic tests.
>>>
>>>
>>> If you build piglit, it's in bin/drawoverhead.
>>>
>>> You're right that this subtest (switching GL_FRAMEBUFFER_SRGB) is
>>> rather artificial and fairly unlikely to occur with real apps.
>>
>>
>> FYI, I'm dropping this series and I don't have it in my repo anymore.
>> piglit/drawoverhead will be updated not to test this state change.
>>
>> Marek
>
>
> Great.
>
> BTW, I'm not sure what's a good state to change in such microbenchmark.
>
> There is of course, a myriad of states to pick, but they are not all the
> same: performance can vary wildly depending on the choice.   I'm not sure
> what's a good representative state change in such circumstances Perhaps
> toggling between two texture objects? Or some sampler state?

If you've ever run the microbenchmark, you know there are plenty of
state changes tested. I think there are like 15 state changes tested
in about 60 subtests at the moment. I'm adding more tests into it.
Currently I have 100 subtests in there locally. At the moment the
missing subtests are mostly just shader resources: immutable textures
(mutable textures i.e. not TexStorage-based are already tested), TBOs,
images, image buffers, SSBOs (maybe), atomic counters (maybe). The
methodology is 1 state change followed by 1 draw call in a loop,
measuring the number of draw calls per second for that case, and
comparing with the baseline draw rate (which is without the state
change).

Marek