[PATCH 1/3] drm/amdgpu: fix a typo

Mon Jun 26 09:06:23 UTC 2017

On 23/06/17 07:49 PM, Marek Olšák wrote:
> On Fri, Jun 23, 2017 at 11:27 AM, Christian König
> <deathsimple at vodafone.de> wrote:
>> Am 23.06.2017 um 11:08 schrieb zhoucm1:
>>> On 2017年06月23日 17:01, zhoucm1 wrote:
>>>> On 2017年06月23日 16:25, Christian König wrote:
>>>>> Am 23.06.2017 um 09:09 schrieb zhoucm1:
>>>>>> On 2017年06月23日 14:57, Christian König wrote:
>>>>>>>
>>>>>>> But giving the CS IOCTL an option for directly specifying the BOs
>>>>>>> instead of a BO list like Marek suggested would indeed save us some time
>>>>>>> here.
>>>>>>
>>>>>> interesting, I always follow how to improve our cs ioctl, since UMD
>>>>>> guys aften complain our command submission is slower than windows.
>>>>>> Then how to directly specifying the BOs instead of a BO list? BO handle
>>>>>> array from UMD? Could your guys describe more clear? Is it doable?
>>>>>
>>>>>
>>>>> Making the BO list part of the CS IOCTL wouldn't help at all for the
>>>>> close source UMDs. To be precise we actually came up with the BO list
>>>>> approach because of their requirement.
>>>>>
>>>>> The biggest bunch of work during CS is reserving all the buffers,
>>>>> validating them and checking their VM status.
>>>>
>>>> Totally agree. Every time when I read code there, I often want to
>>>> optimize them.
>>>>
>>>>> It doesn't matter if the BOs come from the BO list or directly in the CS
>>>>> IOCTL.
>>>>>
>>>>> The key point is that CS overhead is pretty much irrelevant for the open
>>>>> source stack, since Mesa does command submission from a separate thread
>>>>> anyway.
>>>>
>>>> If irrelevant for the open stack, then how does open source stack handle
>>>> "The biggest bunch of work during CS is reserving all the buffers,
>>>> validating them and checking their VM status."?
>>
>>
>> Command submission on the open stack is outsourced to a separate user space
>> thread. E.g. when an application triggers a flush the IBs created so far are
>> just put on a queue and another thread pushes them down to the kernel.
>>
>> I mean reducing the overhead of the CS IOCTL is always nice, but you usual
>> won't see any fps increase as long as not all CPUs are completely bound to
>> some tasks.
>>
>>>> If open stack has a better way, I think closed stack can follow it, I
>>>> don't know the history.
>>>
>>> Do you not use bo list at all in mesa? radv as well?
>>
>>
>> I don't think so. Mesa just wants to send the list of used BOs down to the
>> kernel with every IOCTL.
> 
> The CS ioctl actually costs us some performance, but not as much as on
> closed source drivers.
> 
> MesaGL always executes all CS ioctls in a separate thread (in parallel
> with the UMD) except for the last IB that's submitted by SwapBuffers.

... or by an explicit glFinish or glFlush (at least when the current
draw buffer isn't a back buffer) call, right?

> For us, it's certainly useful to optimize the CS ioctl because of apps
> that submit only 1 IB per frame where multithreading has no effect or
> may even hurt performance.

Another possibility might be flushing earlier, e.g. when the GPU and/or
CS submission thread are idle. But optimizing the CS ioctl would still
help in that case.

Finding good heuristics which allows better utilization of the GPU / CS
submission thread and doesn't hurt performance in any scenario might be
tricky though.

-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer