[Nouveau] [RFC] mesa/st: Avoid passing a NULL buffer to the drivers

Sun Jan 11 17:17:33 PST 2015

On Sun, Jan 11, 2015 at 8:11 PM, Tobias Klausmann
<tobias.johannes.klausmann at mni.thm.de> wrote:
>
>
> On 12.01.2015 01:57, Ilia Mirkin wrote:
>>
>> On Sun, Jan 11, 2015 at 7:43 PM, Tobias Klausmann
>> <tobias.johannes.klausmann at mni.thm.de> wrote:
>>>
>>>
>>> On 11.01.2015 06:05, Ilia Mirkin wrote:
>>>>
>>>> Can you elaborate a bit as to why that's the right thing to do?
>>>>
>>>> On Wed, Jan 7, 2015 at 1:52 PM, Tobias Klausmann
>>>> <tobias.johannes.klausmann at mni.thm.de> wrote:
>>>>>
>>>>> If we capture transform feedback from n stream in (n-1) buffers we face
>>>>> a
>>>>> NULL buffer, use the buffer (n-1) to capture the output of stream n.
>>>>>
>>>>> This fixes one piglit test with nvc0:
>>>>>      arb_gpu_shader5-xfb-streams-without-invocations
>>>>>
>>>>> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
>>>>> ---
>>>>>    src/mesa/state_tracker/st_cb_xformfb.c | 5 +++++
>>>>>    1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/src/mesa/state_tracker/st_cb_xformfb.c
>>>>> b/src/mesa/state_tracker/st_cb_xformfb.c
>>>>> index 8f75eda..5a12da4 100644
>>>>> --- a/src/mesa/state_tracker/st_cb_xformfb.c
>>>>> +++ b/src/mesa/state_tracker/st_cb_xformfb.c
>>>>> @@ -123,6 +123,11 @@ st_begin_transform_feedback(struct gl_context
>>>>> *ctx,
>>>>> GLenum mode,
>>>>>          struct st_buffer_object *bo =
>>>>> st_buffer_object(sobj->base.Buffers[i]);
>>>>>
>>>>>          if (bo) {
>>>>> +         if (!bo->buffer)
>>>>> +            /* If we capture transform feedback from n streams into
>>>>> (n-1)
>>>>> +             * buffers we have to write to buffer (n-1) for stream n.
>>>>> +             */
>>>>> +            bo = st_buffer_object(sobj->base.Buffers[i-1]);
>>>>>             /* Check whether we need to recreate the target. */
>>>>>             if (!sobj->targets[i] ||
>>>>>                 sobj->targets[i] == sobj->draw_count ||
>>>>> --
>>>>> 2.2.1
>>>
>>> Quoted from Ilia Mirkin, to specify what shall be elaborated:
>>> "Can you explain (on-list) why using buffer n - 1 is the right thing to
>>> do to capture output of stream n? I would have thought that the output
>>> for that stream should be discarded or something.
>>>
>>> Like with a spec quotation or some other justification. i.e. why is
>>> the code you wrote correct? Why is it better than, say, bo =
>>> buffers[0], or some other thing entirely?"
>>>
>>> Yeah thats the most concerning point i see as well. The problem is that
>>> there is a interaction between arb_gpu_shader5 and
>>> arb_transform_feedback3,
>>> but after a bit of reading i think the patch is actually what we should
>>> do:
>>>
>>>  From the arb_transfrom_feedback3 spec:
>>> "
>>> (3) How might you use transform feedback with geometry shaders and
>>>          multiple vertex streams?
>>>
>>>        RESOLVED:  As a simple example, let's say you are processing
>>> triangles
>>>        and capture both processed triangle vertices and some values that
>>> are
>>>        computed per-primitive (e.g., facet normal).  The geometry shader
>>>        might declare its outputs like the following:
>>>
>>>          layout(stream = 0) out vec4 position;
>>>          layout(stream = 0) out vec4 texcoord;
>>>          layout(stream = 1) out vec4 normal;
>>>
>>>        "position" and "texcoord" would be per-vertex attributes written
>>> to
>>>        vertex stream 0; "normal" would be a per-triangle facet normal.
>>> The
>>>        geometry shader would emit three vertices to stream zero (the
>>> processed
>>>        input vertices) and a single vertex to stream one (the
>>> per-triangle
>>>        data).  The transform feedback API usage for this case would be
>>>        something like:
>>>
>>>          // Set up buffer objects 21 and 22 to capture data for
>>> per-vertex
>>> and
>>>          // per primitive values.
>>>          glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 0, 21);
>>>          glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 1, 22);
>>>
>>>          // Set up XFB to capture position and texcoord to buffer binding
>>>          // point 0 (buffer 21 bound), and normal to binding point 1
>>> (buffer
>>>          // 22 bound).
>>>          char *strings[] = { "position", "texcoord", "gl_NextBuffer",
>>>                              "normal" };
>>> "
>>>
>>> -> Especially the comments are enlightening as to where the outputs
>>> should
>>> go. Thats what happens with the
>>> "arb_gpu_shader5-xfb-streams-without-invocations" test, where two
>>> stream(outputs) are captured into one buffer.
>>>
>>> One might argue now if we have to count .Buffers[i-1] for all buffers
>>> after
>>> this...
>>>
>>> Comments and additional feedback is always appreciated!
>>
>> The thing you're quoting is talking about the case where everything's
>> supposed to work. I haven't investigated, but I'm guessing that the
>> test has a layout(stream=1) but no buffer is bound at index 1.
>
>
> Actually no, the layout reads like this:
>     layout(stream = 0) out float stream0_0_out;
>     layout(stream = 1) out vec2 stream1_0_out;
>     layout(stream = 2) out float stream2_0_out;
>     layout(stream = 2) out vec4 stream2_1_out;
> where no buffer is bound to stream2_1_out.
>
>>   Is that right? In that case, I would imagine that the TF output should
>> actually just get dropped on the floor. I would assume that this is in
>> the ARB_tf3 spec, but I don't have time to go digging right now.
>
>
> As the layout read more like the one from the specs, i'd still go with [n-1]
> and honestly (not proven correct) i assume that a stream without an output
> is just wrong and should not even get through the linker...

Well, linking has nothing to do with current bindings; it has to do
with connecting inter-stage inputs/outputs... Anyways, I'm moderately
sure that the thing you're currently doing has no justification. Try
to find what the proper way of dealing with this is (I assume i965
deals with it, see what they do), or ask around. If only no buffer is
bound to stream2_1_out, I assume that it's doing the "separate"
outputs? That probably complicates matters...

  -ilia