[Mesa-dev] [PATCH] gallium/util: don't let children of fork & exec inherit our thread affinity

Tue Oct 30 23:13:32 UTC 2018

Am 30.10.18 um 23:55 schrieb Marek Olšák:
> On Tue, Oct 30, 2018 at 6:32 PM Gustaw Smolarczyk <wielkiegie at gmail.com
> <mailto:wielkiegie at gmail.com>> wrote:
> 
>     wt., 30 paź 2018, 23:01 Marek Olšák <maraeo at gmail.com
>     <mailto:maraeo at gmail.com>>:
> 
>         On Mon, Oct 29, 2018 at 12:43 PM Michel Dänzer
>         <michel at daenzer.net <mailto:michel at daenzer.net>> wrote:
> 
>             On 2018-10-28 11:27 a.m., Gustaw Smolarczyk wrote:
>             > pon., 17 wrz 2018 o 18:24 Michel Dänzer
>             <michel at daenzer.net <mailto:michel at daenzer.net>> napisał(a):
>             >>
>             >> On 2018-09-15 3:04 a.m., Marek Olšák wrote:
>             >>> On Fri, Sep 14, 2018 at 4:53 AM, Michel Dänzer
>             <michel at daenzer.net <mailto:michel at daenzer.net>> wrote:
>             >>>>
>             >>>> Last but not least, this doesn't solve the issue of
>             apps such as
>             >>>> blender, which spawn their own worker threads after
>             initializing OpenGL
>             >>>> (possibly not themselves directly, but via the toolkit
>             or another
>             >>>> library; e.g. GTK+4 uses OpenGL by default), inheriting
>             the thread affinity.
>             >>>>
>             >>>>
>             >>>> Due to these issues, setting the thread affinity needs
>             to be disabled by
>             >>>> default, and only white-listed for applications where
>             it's known safe
>             >>>> and beneficial. This sucks, but I'm afraid that's the
>             reality until
>             >>>> there's better API available which allows solving these
>             issues.
>             >>>
>             >>> We don't have the bandwidth to maintain whitelists. This
>             will either
>             >>> have to be always on or always off.
>             >>>
>             >>> On the positive side, only Ryzens with multiple CCXs get
>             all the
>             >>> benefits and disadvantages.
>             >>
>             >> In other words, only people who spent relatively large
>             amounts of money
>             >> for relatively high-end CPUs will be affected (I'm sure
>             they'll be glad
>             >> to know that "common people" aren't affected. ;).
>             Affected applications
>             >> will see their performance decreased by a factor of 2-8
>             (the number of
>             >> CCXs in the CPU).
>             >>
>             >> OTOH, only a relatively small number of games will get a
>             significant
>             >> benefit from the thread affinity, and the benefit will be
>             smaller than a
>             >> factor of 2. This cannot justify risking a performance
>             drop of up to a
>             >> factor of 8, no matter how small the risk.
>             >>
>             >> Therefore, the appropriate mechanism is a whitelist.
>             >
>             > Hi,
>             >
>             > What was the conclusion of this discussion? I don't see any
>             > whitelist/blacklist for this feature.
>             >
>             > I have just tested blender and it still renders on only a
>             single CCX
>             > on mesa from git master. Also, there is a bug report that
>             suggests
>             > this regressed performance in at least one game [1].
> 
>             I hooked up that bug report to the 18.3 blocker bug.
> 
> 
>             > If you think enabling it by default is the way to go, we
>             should also
>             > implement a blacklist so that it can be turned off in such
>             cases.
> 
>             I stand by my opinion that a white-list is appropriate, not a
>             black-list. It's pretty much the same as mesa_glthread.
> 
> 
>         So you are saying that gallium multithreading show be slower
>         than singlethreading by default.
> 
>         Marek
> 
> 
>     Hi Marek,
> 
>     The Ryzen optimization helps a lot of applications (mostly games)
>     and improves their performance, mostly because of the reduced cost
>     of communication between application's GL API thread and driver's
>     pipe/winsys threads.
> 
>     However, not all of the applications respond in the same way. The
>     thread affinity management is hacky, by which I mean that this
>     mechanism was not meant to mess with application threads from within
>     library's threads. As an example, blender's threads, which use
>     OpenGL "by accident", are forced to use the same CCX as the main
>     gallium/winsys thread, even if they are many and want to work on as
>     many CCXs as are possible. The thread that starts using GL spawns
>     many more threads that don't use GL at all, and the current atfork
>     mechanism doesn't help.
> 
>     The current mechanism of tweaking thread affinities doesn't work
>     universally with all Linux applications. We need a mechanism of
>     tweaking this behavior, either through a whitelist or through a
>     blacklist. As any application using OpenGL can be affected, I would
>     opt towards disabling this by default and providing a whitelist for
>     applications we know it would help.
> 
> 
> Multithreading is slower than singlethreading. You are pretty much
> saying that Mesa shouldn't use multithreading by default. Just think
> about that. NVIDIA would destroy us at all price points. I'm not that
> insane to disable the thread pinning by default.
> 
> The thread affinity API is very bad for this, but it's the only thing we
> have. Linux lacks a proper thread management API for the Zen
> architecture and the Linux scheduler does the worst thing for Ryzen (it
> puts threads on different CCXs), so the scheduler always works against
> us. It makes Ryzen as slow as possible.
> 
> Only Blender is affected negatively and there is a patch for it. Blender
> is open source, so it can just reset the thread affinity for new threads
> by itself, or set a thread affinity that works best with Ryzen. Mesa can
> contain the workaround out of courtesy.
> 
I think apps having to (re)set thread affinity explicitly to get some
kind of expected "default" behavior is not quite what we'd really want?
I don't have any solution though...

Roland