[Mesa-dev] [PATCH] drirc: add option to disable ARB_draw_indirect

Mon Dec 18 20:45:43 UTC 2017

On Fri, Dec 15, 2017 at 1:08 PM, Nicolai Hähnle <nicolai.haehnle at amd.com> wrote:
> On 15.12.2017 12:37, Rob Clark wrote:
>>
>> On Fri, Dec 15, 2017 at 4:41 AM, Nicolai Hähnle <nicolai.haehnle at amd.com>
>> wrote:
>>>
>>> On 15.12.2017 00:56, Rob Clark wrote:
>>>>
>>>>
>>>> On Wed, Dec 6, 2017 at 3:31 PM, Ian Romanick <idr at freedesktop.org>
>>>> wrote:
>>>>>
>>>>>
>>>>> On 12/05/2017 08:25 AM, Ilia Mirkin wrote:
>>>>>>
>>>>>>
>>>>>> On Tue, Dec 5, 2017 at 8:18 AM, Emil Velikov
>>>>>> <emil.l.velikov at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Rob,
>>>>>>>
>>>>>>> On 5 December 2017 at 12:54, Rob Clark <robdclark at gmail.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> This is a bit sad/annoying.  But with current GPU firmware (at least
>>>>>>>> on
>>>>>>>> a5xx) we can support both draw-indirect and base-instance.  But we
>>>>>>>> can't
>>>>>>>> support draw-indirect with a non-zero base-instance specified.  So
>>>>>>>> add
>>>>>>>> a
>>>>>>>> driconf option to hide the extension from games that are known to
>>>>>>>> use
>>>>>>>> both.
>>>>>>>>
>>>>>>>> Signed-off-by: Rob Clark <robdclark at gmail.com>
>>>>>>>> ---
>>>>>>>> Tbh, I'm also not really sure what to do when/if we got updated
>>>>>>>> firmware
>>>>>>>> which handled draw-indirect with base-instance, since we'd need to
>>>>>>>> make
>>>>>>>> this option conditional on fw version.  For STK that probably isn't
>>>>>>>> a
>>>>>>>> big deal since it doesn't use draw-indirect in a particularly useful
>>>>>>>> way
>>>>>>>> (the indirect buffer is generated on CPU).
>>>>>>>>
>>>>>>> Couldn't freedreno just return 0 for PIPE_CAP_DRAW_INDIRECT (aka
>>>>>>> disable the extension) as it detects buggy FW?
>>>>>>> This is what radeons have been doing as they encounter iffy firmware
>>>>>>> or
>>>>>>> LLVM.
>>>>>>>
>>>>>>> AFAICT freedreno doesn't do GL 4.0 or GLES 3.1 so one should be safe.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Rob is this -><- close to ES 3.1, so that's not a great option.
>>>>>
>>>>>
>>>>>
>>>>> And I don't suppose there's a way to get updated firmware?  i965 has
>>>>> similar sorts of cases where higher versions are disabled due to
>>>>> missing
>>>>> kernel features.
>>>>>
>>>>
>>>> so after r/e the instruction set for the CP microcontrollers and
>>>> writing a disassembler and assembler[1], and figuring out how the fw
>>>> handles CP_DRAW_INDIRECT and CP_DRAW_INDX_INDIRECT packets, I've come
>>>> to the conclusion that the issue isn't actually with draw-indirect vs
>>>> base-instance (at least not w/ the fw from my pixel2 which md5sum
>>>> claims is the same as what is in linux-firmware.. it is possible that
>>>> I was using an earlier version of the fw before when I came to this
>>>> conclusion).  On the plus side, the PFP/ME microcontrollers that parse
>>>> the cmdstream are pretty neat and I learned some useful stuff along
>>>> the way.
>>>>
>>>> But thinking a bit about how stk is using GL_MAP_PERSISTENT_BIT to map
>>>> and update the draw-indirect buffers, it seems to me there are plenty
>>>> of ways this can go wrong w/ tilers (and even more when you throw
>>>> re-ordering into the mix).  Possibly I should disable reordering when
>>>> the indirect buffer is mapped w/ PERSISTENT bit, although for games
>>>> like stk this is probably counter-productive vs just hiding the
>>>> draw-indirect extension.. for games that actually use the GPU to write
>>>> the draw-indirect buffer it shouldn't be a problem.  So I think a
>>>> driconf patch like this probably still ends up being useful in the
>>>> end.
>>>
>>>
>>>
>>> Can you detail a bit what you think could go wrong? I believe that the
>>> intention of the GL spec is that reordering in tilers should be possible
>>> at
>>> least for buffers that are mapped PERSISTENT but not COHERENT.
>>>
>>> You may only have to block reordering if the buffer is mapped both
>>> PERSISTENT *and* COHERENT -- and even then, reordering is probably
>>> possible.
>>>
>>> Granted, the spec is unclear as usual when it comes to these memory
>>> synchronization issues -- the description of the MAP_COHERENT_BIT in
>>> section
>>> 6.2 does not mention WAR hazards (in particular, Write by client after
>>> Read
>>> by server) -- but perhaps that can be fixed.
>>>
>>> To go into a bit more detail, what I suspect you're worried about is
>>> applications doing stuff like:
>>>
>>> 1. Write to indirect buffer (persistently & coherently mapped)
>>> 2. Draw*Indirect
>>> 3. Write to the same location in the indirect buffer
>>> 4. Draw*Indirect
>>>
>>> ... but this is bound to fail with "normal" GPUs (like ours) as well.
>>> Perhaps you have a different scenario in mind?
>>
>>
>> yeah, this was basically the scenario I had in mind.. although I'm
>> perhaps more aggressive in deferring rendering, to the point of
>> re-ordering draws if unnecessary fbo switches are made.  Normally I
>> track which buffers are read and written in a given batch (draw pass)
>> in order to preserve correctness (and in some cases shadowing or doing
>> a staging transfer to update buffers/textures to avoid splitting a
>> batch).  Perhaps it is only an issue w/ persistent+coherent, but w/
>> cpu updating buffer without driver knowing when is kind of
>> sub-optimal.
>>
>> I'm thinking I do need to keep track when there are outstanding
>> coherent+persistent transfers mapped and switch off some of the
>> cleverness.
>
>
> I'm not convinced that this is actually necessary. You probably only need to
> break for things like glFlushMappedBufferRange() and glFenceSync().
>
> The reasoning is basically this: unless one of those synchronizing commands
> is used, every desktop GPU will effectively re-order a sequence of:
>
> 1. Write #1 to persistent-mapped buffer
> 2. Draw #1
> 3. Write #2 to persistent-mapped buffer
> 4. Draw #2
>
> to:
>
> 1. Write #1 to persistent-mapped buffer
> 2. Write #2 to persistent-mapped buffer
> 3. Draw #1
> 4. Draw #2
>
> Correct me if I'm wrong, but the way I understand tiled GPUs, you might in
> addition do some multi-passing that ends up as:
>
> 1. Write #1 to persistent-mapped buffer
> 2. Write #2 to persistent-mapped buffer
> 3.0. Draw #1, tile 0
> 4.0. Draw #2, tile 0
> 3.1. Draw #1, tile 1
> 4.1. Draw #2, tile 1
> ... etc. ...
>

I guess if you throw in FBO switches and draw-reordering things can be
happening a bit more out of order than this.  Maybe not an issue w/ a
correctly written game/app..

Anyways, I'll have to look more closely at stk to understand how it is
expecting CPU writes to this indirect-params buffer to be
synchronized.  I'm not quite sure what is going on, but I think the
main diff (depending on whether draw-indirect + base-instance is
exposed or not) is using glDrawElementsBaseVertex() vs
glDrawElementsIndirect()/glMultiDrawElementsIndirect().  And somehow
it goes fabulously wrong.  I guess I should try mapping/unmapping each
time the buffer is updated to ensure it is synchronized w/ gpu..

BR,
-R

> So as far as interactions with persistent-mapped buffers are concerned,
> tiling just doesn't make a difference.
>
>
>> All the same, the way stk uses draw-indirect is not useful and it
>> would be better to shut it off, at least for me.  Although perhaps
>> instead of adding things like this to driconf extension by extension,
>> it would be more useful to have a way to provide a default
>> MESA_EXTENSION_OVERRIDE via driconf?
>
>
> Yeah, that makes sense.
>
> Cheers,
> Nicolai
>
>
>>
>> BR,
>> -R
>>
>