[Mesa-dev] [Mesa3d-dev] ARB draw buffers + texenv program

Thu Apr 15 08:17:03 PDT 2010

On 15.04.2010 02:03, Alex Deucher wrote:
> On Wed, Apr 14, 2010 at 5:05 PM, Brian Paul <brianp at vmware.com> wrote:
>> Moving this to the new mesa-dev list...
>>
>> Roland Scheidegger wrote:
>>> On 14.04.2010 00:38, Dave Airlie wrote:
>>>> On Wed, Apr 14, 2010 at 8:33 AM, Roland Scheidegger <sroland at vmware.com>
>>>> wrote:
>>>>> On 13.04.2010 20:28, Alex Deucher wrote:
>>>>>> On Tue, Apr 13, 2010 at 2:21 PM, Corbin Simpson
>>>>>> <mostawesomedude at gmail.com> wrote:
>>>>>>> On Tue, Apr 13, 2010 at 6:42 AM, Roland Scheidegger
>>>>>>> <sroland at vmware.com> wrote:
>>>>>>>> On 13.04.2010 02:52, Dave Airlie wrote:
>>>>>>>>> On Tue, Apr 6, 2010 at 2:00 AM, Brian Paul <brianp at vmware.com>
>>>>>>>>> wrote:
>>>>>>>>>> Dave Airlie wrote:
>>>>>>>>>>> Just going down the r300g piglit failures and noticed
>>>>>>>>>>> fbo-drawbuffers
>>>>>>>>>>> failed, I've no idea
>>>>>>>>>>> if this passes on Intel hw, but it appears the texenvprogram
>>>>>>>>>>> really
>>>>>>>>>>> needs to understand the
>>>>>>>>>>> draw buffers. The attached patch fixes it here for me on r300g
>>>>>>>>>>> anyone
>>>>>>>>>>> want to test this on Intel
>>>>>>>>>>> with the piglit test before/after?
>>>>>>>>>> The piglit test passes as-is with Mesa/swrast and NVIDIA.
>>>>>>>>>>
>>>>>>>>>> It fails with gallium/softpipe both with and w/out your patch.
>>>>>>>>>>
>>>>>>>>>> I think that your patch is on the right track.  But multiple render
>>>>>>>>>> targets
>>>>>>>>>> are still a bit of an untested area in the st/mesa code.
>>>>>>>>>>
>>>>>>>>>> One thing: the patch introduces a dependency on buffer state in the
>>>>>>>>>> texenvprogram code so in state.c we should check for the
>>>>>>>>>> _NEW_BUFFERS flag.
>>>>>>>>>>
>>>>>>>>>> Otherwise, I'd like to debug the softpipe failure a bit further to
>>>>>>>>>> see
>>>>>>>>>> what's going on.  Perhaps you could hold off on committing this for
>>>>>>>>>> a bit...
>>>>>>>>> Well Eric pointed out to me the fun line in the spec
>>>>>>>>>
>>>>>>>>> (3) Should gl_FragColor be aliased to gl_FragData[0]?
>>>>>>>>>
>>>>>>>>>      RESOLUTION: No.  A shader should write either gl_FragColor, or
>>>>>>>>>      gl_FragData[n], but not both.
>>>>>>>>>
>>>>>>>>>      Writing to gl_FragColor will write to all draw buffers
>>>>>>>>> specified
>>>>>>>>>      with DrawBuffersARB.
>>>>>>>>>
>>>>>>>>> So I was really just masking the issue with this. From what I can
>>>>>>>>> see
>>>>>>>>> softpipe messes up and I'm not sure where we should be fixing this.
>>>>>>>>> swrast does okay, its just whether we should be doing something in
>>>>>>>>> gallium
>>>>>>>>> or in the drivers is open.
>>>>>>>> Hmm yes looks like that's not really well defined. I guess there are
>>>>>>>> several options here:
>>>>>>>> 1) don't do anything at the state tracker level, and assume that if a
>>>>>>>> fragment shader only writes to color 0 but has several color buffers
>>>>>>>> bound the color is meant to go to all outputs. Looks like that's what
>>>>>>>> nv50 is doing today. If a shader writes to FragData[0] but not
>>>>>>>> others,
>>>>>>>> in gallium that would mean that output still gets replicated to all
>>>>>>>> outputs, but since the spec says unwritten outputs are undefined that
>>>>>>>> would be just fine (for OpenGL - not sure about other APIs).
>>>>>>>> 2) Use some explicit means to distinguish FragData[] from FragColor
>>>>>>>> in
>>>>>>>> gallium. For instance, could use different semantic name (like
>>>>>>>> TGSI_SEMANTIC_COLOR and TGSI_SEMANTIC_GENERIC for the respective
>>>>>>>> outputs). Or could have a flag somewhere (not quite sure where)
>>>>>>>> saying
>>>>>>>> if color output is to be replicated to all buffers.
>>>>>>>> 3) Translate away the single color output in state tracker to
>>>>>>>> multiple
>>>>>>>> outputs.
>>>>>>>>
>>>>>>>> I don't like option 3) though. Means we need to recompile if the
>>>>>>>> attached buffers change. Moreover, it seems both new nvidia and AMD
>>>>>>>> chips (r600 has MULTIWRITE_ENABLE bit) handle this just fine in hw.
>>>>>>>> I don't like option 1) neither, that kind of implicit behavior might
>>>>>>>> be
>>>>>>>> ok but this kind of guesswork isn't very nice imho.
>>>>>>> Whatever's easiest, just document it. I'd be cool with:
>>>>>>>
>>>>>>> DECL IN[0], COLOR, PERSPECTIVE
>>>>>>> DECL OUT[0], COLOR
>>>>>>> MOV OUT[0], IN[0]
>>>>>>> END
>>>>>>>
>>>>>>> Effectively being a write to all color buffers, however, this one from
>>>>>>> progs/tests/drawbuffers:
>>>>>>>
>>>>>>> DCL IN[0], COLOR, LINEAR
>>>>>>> DCL OUT[0], COLOR
>>>>>>> DCL OUT[1], COLOR[1]
>>>>>>> IMM FLT32 {     1.0000,     0.0000,     0.0000,     0.0000 }
>>>>>>>  0: MOV OUT[0], IN[0]
>>>>>>>  1: SUB OUT[1], IMM[0].xxxx, IN[0]
>>>>>>>  2: END
>>>>>>>
>>>>>>> Would then double-write the second color buffer. Unpleasant. Language
>>>>>>> like this would work, I suppose?
>>>>>>>
>>>>>>> """
>>>>>>> If only one color output is declared, writes to the color output shall
>>>>>>> be redirected to all bound color buffers. Otherwise, color outputs
>>>>>>> shall be bound to their specific color buffer.
>>>>>>> """
>>>>>> Also, keep in mind that writing to multiple color buffers uses
>>>>>> additional memory bandwidth, so for performance, we should only do so
>>>>>> when required.
>>>>> Do apps really have several color buffers bound but only write to one,
>>>>> leaving the state of the others undefined in the process? Sounds like a
>>>>> poor app to begin with to me.
>>>>> Actually, I would restrict that language above further, so only color
>>>>> output 0 will get redirected to all buffers if it's the only one
>>>>> written. As said though I'd think some explicit bits somewhere are
>>>>> cleaner. I'm not yet sure that the above would really work for all APIs,
>>>>> it is possible some say other buffers not written to are left as is
>>>>> instead of undefined.
>>>> Who knows, the GL API allows for it, I don't see how we can
>>>> arbitrarily decide to restrict it.
>>>>
>>>> I could write an app that uses multiple fragment programs, and
>>>> switches between them, with two outputs buffers bound, though I'm
>>>> possibly constructing something very arbitary.
>>> I fail to see the problem. If you have two color buffers bound but only
>>> write to one of them then the implementation is allowed to do anything
>>> it wants with the other one as far as I can tell.
>>>
>>>
>>>> The ARB_draw_buffers explicitly states that Data0 != Color.
>>> Yes. I wonder though are there other differences somewhere (I couldn't
>>> find any) that one gets replicated the other not?
>>>
>>> Anyway, it looks like noone likes that implicit option.
>>> Hence let's make it explicit in gallium.
>>> Not quite sure how yet - this seems to be some sort of shader state. We
>>> could use new semantic for that special replicated output, or redefine
>>> the existing ones (use generic ones for data outputs and color only for
>>> the replicated one). Or maybe we should just make that a tgsi shader
>>> property like those for pixel centers?
>>
>> Here's a key question: on NV and AMD with the "MULTIWRITE_ENABLE" flag, does
>> that option perform faster than using N shader instructions to write to the
>> N buffers?
>>
> 
> I can double check, but I suspect it uses the same amount of memory
> bandwidth either way.  You still have to get the data to multiple
> buffers.
But you'll have more shader instructions for writing to all these
outputs right? I think that could still make a difference, though it
might be more theoretical rather than in practice.

> 
> Alex
> 
>> If so, that would point to a new shader semantic.
>>
>> Otherwise, if there's no gain, I think I'd rather solve the problem for all
>> drivers in the state tracker.  We'd only have to generate a new fragment
>> shader variant when the _number_ of attached color buffers changes, not when
>> the buffer pointers change.  I think this is do-able.

Hmm I'm not sure I really agree with that, though you're right it would
only be dependent on the number of color buffers. Even hardware which
can't do multiple render targets that way might benefit a bit. For
example, i965 needs multiple render target write messages - but if you
expand that in the state tracker you'll also have more shader
instructions, even if it's only MOVs (which probably won't be easy or
impossible to optimize away as these are outputs) it'll also consume
register space.
So most hw drivers we care about benefit from it, at least in theory,
and for all these drivers it's trivial to handle. Granted, it might
actually not be used that often (who uses MRT with old-style color
output?), and it might be a bit more complex to handle in some of the
other drivers.
But if you think it's not worth it exposing it I guess I could live with
that too.

Roland