[Mesa-dev] [Mesa3d-dev] ARB draw buffers + texenv program

Thu Apr 15 08:29:00 PDT 2010

On Thu, Apr 15, 2010 at 11:17 AM, Roland Scheidegger <sroland at vmware.com> wrote:
> On 15.04.2010 02:03, Alex Deucher wrote:
>> On Wed, Apr 14, 2010 at 5:05 PM, Brian Paul <brianp at vmware.com> wrote:
>>> Moving this to the new mesa-dev list...
>>>
>>> Roland Scheidegger wrote:
>>>> On 14.04.2010 00:38, Dave Airlie wrote:
>>>>> On Wed, Apr 14, 2010 at 8:33 AM, Roland Scheidegger <sroland at vmware.com>
>>>>> wrote:
>>>>>> On 13.04.2010 20:28, Alex Deucher wrote:
>>>>>>> On Tue, Apr 13, 2010 at 2:21 PM, Corbin Simpson
>>>>>>> <mostawesomedude at gmail.com> wrote:
>>>>>>>> On Tue, Apr 13, 2010 at 6:42 AM, Roland Scheidegger
>>>>>>>> <sroland at vmware.com> wrote:
>>>>>>>>> On 13.04.2010 02:52, Dave Airlie wrote:
>>>>>>>>>> On Tue, Apr 6, 2010 at 2:00 AM, Brian Paul <brianp at vmware.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> Dave Airlie wrote:
>>>>>>>>>>>> Just going down the r300g piglit failures and noticed
>>>>>>>>>>>> fbo-drawbuffers
>>>>>>>>>>>> failed, I've no idea
>>>>>>>>>>>> if this passes on Intel hw, but it appears the texenvprogram
>>>>>>>>>>>> really
>>>>>>>>>>>> needs to understand the
>>>>>>>>>>>> draw buffers. The attached patch fixes it here for me on r300g
>>>>>>>>>>>> anyone
>>>>>>>>>>>> want to test this on Intel
>>>>>>>>>>>> with the piglit test before/after?
>>>>>>>>>>> The piglit test passes as-is with Mesa/swrast and NVIDIA.
>>>>>>>>>>>
>>>>>>>>>>> It fails with gallium/softpipe both with and w/out your patch.
>>>>>>>>>>>
>>>>>>>>>>> I think that your patch is on the right track.  But multiple render
>>>>>>>>>>> targets
>>>>>>>>>>> are still a bit of an untested area in the st/mesa code.
>>>>>>>>>>>
>>>>>>>>>>> One thing: the patch introduces a dependency on buffer state in the
>>>>>>>>>>> texenvprogram code so in state.c we should check for the
>>>>>>>>>>> _NEW_BUFFERS flag.
>>>>>>>>>>>
>>>>>>>>>>> Otherwise, I'd like to debug the softpipe failure a bit further to
>>>>>>>>>>> see
>>>>>>>>>>> what's going on.  Perhaps you could hold off on committing this for
>>>>>>>>>>> a bit...
>>>>>>>>>> Well Eric pointed out to me the fun line in the spec
>>>>>>>>>>
>>>>>>>>>> (3) Should gl_FragColor be aliased to gl_FragData[0]?
>>>>>>>>>>
>>>>>>>>>>      RESOLUTION: No.  A shader should write either gl_FragColor, or
>>>>>>>>>>      gl_FragData[n], but not both.
>>>>>>>>>>
>>>>>>>>>>      Writing to gl_FragColor will write to all draw buffers
>>>>>>>>>> specified
>>>>>>>>>>      with DrawBuffersARB.
>>>>>>>>>>
>>>>>>>>>> So I was really just masking the issue with this. From what I can
>>>>>>>>>> see
>>>>>>>>>> softpipe messes up and I'm not sure where we should be fixing this.
>>>>>>>>>> swrast does okay, its just whether we should be doing something in
>>>>>>>>>> gallium
>>>>>>>>>> or in the drivers is open.
>>>>>>>>> Hmm yes looks like that's not really well defined. I guess there are
>>>>>>>>> several options here:
>>>>>>>>> 1) don't do anything at the state tracker level, and assume that if a
>>>>>>>>> fragment shader only writes to color 0 but has several color buffers
>>>>>>>>> bound the color is meant to go to all outputs. Looks like that's what
>>>>>>>>> nv50 is doing today. If a shader writes to FragData[0] but not
>>>>>>>>> others,
>>>>>>>>> in gallium that would mean that output still gets replicated to all
>>>>>>>>> outputs, but since the spec says unwritten outputs are undefined that
>>>>>>>>> would be just fine (for OpenGL - not sure about other APIs).
>>>>>>>>> 2) Use some explicit means to distinguish FragData[] from FragColor
>>>>>>>>> in
>>>>>>>>> gallium. For instance, could use different semantic name (like
>>>>>>>>> TGSI_SEMANTIC_COLOR and TGSI_SEMANTIC_GENERIC for the respective
>>>>>>>>> outputs). Or could have a flag somewhere (not quite sure where)
>>>>>>>>> saying
>>>>>>>>> if color output is to be replicated to all buffers.
>>>>>>>>> 3) Translate away the single color output in state tracker to
>>>>>>>>> multiple
>>>>>>>>> outputs.
>>>>>>>>>
>>>>>>>>> I don't like option 3) though. Means we need to recompile if the
>>>>>>>>> attached buffers change. Moreover, it seems both new nvidia and AMD
>>>>>>>>> chips (r600 has MULTIWRITE_ENABLE bit) handle this just fine in hw.
>>>>>>>>> I don't like option 1) neither, that kind of implicit behavior might
>>>>>>>>> be
>>>>>>>>> ok but this kind of guesswork isn't very nice imho.
>>>>>>>> Whatever's easiest, just document it. I'd be cool with:
>>>>>>>>
>>>>>>>> DECL IN[0], COLOR, PERSPECTIVE
>>>>>>>> DECL OUT[0], COLOR
>>>>>>>> MOV OUT[0], IN[0]
>>>>>>>> END
>>>>>>>>
>>>>>>>> Effectively being a write to all color buffers, however, this one from
>>>>>>>> progs/tests/drawbuffers:
>>>>>>>>
>>>>>>>> DCL IN[0], COLOR, LINEAR
>>>>>>>> DCL OUT[0], COLOR
>>>>>>>> DCL OUT[1], COLOR[1]
>>>>>>>> IMM FLT32 {     1.0000,     0.0000,     0.0000,     0.0000 }
>>>>>>>>  0: MOV OUT[0], IN[0]
>>>>>>>>  1: SUB OUT[1], IMM[0].xxxx, IN[0]
>>>>>>>>  2: END
>>>>>>>>
>>>>>>>> Would then double-write the second color buffer. Unpleasant. Language
>>>>>>>> like this would work, I suppose?
>>>>>>>>
>>>>>>>> """
>>>>>>>> If only one color output is declared, writes to the color output shall
>>>>>>>> be redirected to all bound color buffers. Otherwise, color outputs
>>>>>>>> shall be bound to their specific color buffer.
>>>>>>>> """
>>>>>>> Also, keep in mind that writing to multiple color buffers uses
>>>>>>> additional memory bandwidth, so for performance, we should only do so
>>>>>>> when required.
>>>>>> Do apps really have several color buffers bound but only write to one,
>>>>>> leaving the state of the others undefined in the process? Sounds like a
>>>>>> poor app to begin with to me.
>>>>>> Actually, I would restrict that language above further, so only color
>>>>>> output 0 will get redirected to all buffers if it's the only one
>>>>>> written. As said though I'd think some explicit bits somewhere are
>>>>>> cleaner. I'm not yet sure that the above would really work for all APIs,
>>>>>> it is possible some say other buffers not written to are left as is
>>>>>> instead of undefined.
>>>>> Who knows, the GL API allows for it, I don't see how we can
>>>>> arbitrarily decide to restrict it.
>>>>>
>>>>> I could write an app that uses multiple fragment programs, and
>>>>> switches between them, with two outputs buffers bound, though I'm
>>>>> possibly constructing something very arbitary.
>>>> I fail to see the problem. If you have two color buffers bound but only
>>>> write to one of them then the implementation is allowed to do anything
>>>> it wants with the other one as far as I can tell.
>>>>
>>>>
>>>>> The ARB_draw_buffers explicitly states that Data0 != Color.
>>>> Yes. I wonder though are there other differences somewhere (I couldn't
>>>> find any) that one gets replicated the other not?
>>>>
>>>> Anyway, it looks like noone likes that implicit option.
>>>> Hence let's make it explicit in gallium.
>>>> Not quite sure how yet - this seems to be some sort of shader state. We
>>>> could use new semantic for that special replicated output, or redefine
>>>> the existing ones (use generic ones for data outputs and color only for
>>>> the replicated one). Or maybe we should just make that a tgsi shader
>>>> property like those for pixel centers?
>>>
>>> Here's a key question: on NV and AMD with the "MULTIWRITE_ENABLE" flag, does
>>> that option perform faster than using N shader instructions to write to the
>>> N buffers?
>>>
>>
>> I can double check, but I suspect it uses the same amount of memory
>> bandwidth either way.  You still have to get the data to multiple
>> buffers.
> But you'll have more shader instructions for writing to all these
> outputs right? I think that could still make a difference, though it
> might be more theoretical rather than in practice.
>

Right, I was talking specifically about memory bandwidth. But it would
take a few extra instructions as well.  I think the major impact will
be the additional memory bandwidth rather than the added instructions.

Alex

>
>>
>> Alex
>>
>>> If so, that would point to a new shader semantic.
>>>
>>> Otherwise, if there's no gain, I think I'd rather solve the problem for all
>>> drivers in the state tracker.  We'd only have to generate a new fragment
>>> shader variant when the _number_ of attached color buffers changes, not when
>>> the buffer pointers change.  I think this is do-able.
>
> Hmm I'm not sure I really agree with that, though you're right it would
> only be dependent on the number of color buffers. Even hardware which
> can't do multiple render targets that way might benefit a bit. For
> example, i965 needs multiple render target write messages - but if you
> expand that in the state tracker you'll also have more shader
> instructions, even if it's only MOVs (which probably won't be easy or
> impossible to optimize away as these are outputs) it'll also consume
> register space.
> So most hw drivers we care about benefit from it, at least in theory,
> and for all these drivers it's trivial to handle. Granted, it might
> actually not be used that often (who uses MRT with old-style color
> output?), and it might be a bit more complex to handle in some of the
> other drivers.
> But if you think it's not worth it exposing it I guess I could live with
> that too.
>
> Roland
>
>