[Mesa-dev] Question: st/mesa and context-shareable shaders

Mon Sep 28 07:39:16 PDT 2015

Am 28.09.2015 um 15:51 schrieb Marek Olšák:
> On Mon, Sep 28, 2015 at 2:55 PM, Jose Fonseca <jfonseca at vmware.com> wrote:
>> On 27/09/15 19:14, Marek Olšák wrote:
>>>
>>> Hi,
>>>
>>> For some reason, st/mesa assumes that shaders can't be shared by
>>> multiple contexts and therefore has a context pointer in shader keys.
>>> I'd like to get rid of this dependency, because there is no reason why
>>> shaders would need to be tied to a context. In fact, shaders don't
>>> even need a device, they just need a compiler.
>>
>>>
>>>
>>> This is becoming a bigger issue with latest games that might prefer
>>> compiling shaders in another thread using a different GL context.
>>>
>>> As far as I know, shaders should be context-shareable on all hardware
>>> drivers.
>>
>>
>> The thing to keep in mind is that, although in theory gallium shaders could
>> be compiled once and reused by many pipe contexts, in practice many drivers
>> need to produce shader variants that mix in state from other context
>> specific state objects, at draw time.
>>
>> llvmpipe, svga pipe driver are examples of such drivers that I know. Maybe
>> amd/nvidia gallium drivers are immune from that problem, but my
>> understanding was that very few hardware has internal state objects that
>> line up perfectly with gallium state objects, hence that this problem was
>> widespread.
>>
>>
>> And because until now shaders were per-context, this means that pipe drivers
>> can attach the variants to the shader object without any worries about
>> multi-threading, as it's guaranteed that no other thread is operating on
>> them.
> 
> I'm well aware of that issue. You can use a mutex to prevent 2 threads
> from adding 2 shader variants simultaneously into a pipe shader. This
> should prevent corrupting the linked list of variants. Looking up the
> required shader variant based on states doesn't need locks if you add
> an atomic counter containing how many shaders are in the list,
> preventing the lookup function from having to read the ending pointer
> of the list where new shaders can be added by other threads.
> 
> Drivers also shouldn't attach state pointers to shaders, but instead
> they should add their copies there. All in all, this is quite easily
> fixable and therefore not a big issue. It certainly doesn't prevent
> shaders from becoming screen objects.
> 
> 
>>
>>
>> So, if you want to make pipe shaders global, you need to ensure that all
>> shaders objects manipulations are thread-safe.  In particular, if you still
>> want to attach variants to the shaders, you need to ensure access to mutable
>> members are protected with mutex.  Which means you might be forced to lock
>> mutexes on hot-paths, severely impacting performance (single- and multi-
>> thread apps alike).
> 
> The mutex would only have to be locked when a new shader variant is
> added, which is a time-consuming operation by itself (compilation
> etc.), so there is no performance concern from adding a mutex. Other
> cases don't need the lock.
> 
>>
>>
>> One could avoid the threading primitives by attaching the variants to the
>> context themselves -- but then that's no different from what we do today --
>> ie, you'd be compiling as often as today.
> 
> Like I said above, it's easy to add shader variants to screen-based
> shaders with a mutex that has no impact on performance.
> 
>>
>>
>>
>> I don't feel too strongly either way: maybe the benefits of having shaders
>> shared by many pipe context are indeed worth the trouble.  But let it be no
>> doubt: that implies making all operations on shader objects thread safe.
>> And that is not a trivial matter.  It's certainly not as easy as just coping
>> with a null pipe_context argument.
>>
>>
>>> I see only 2 options out of this:
>>>
>>> 1) Removing the context pointer from the shader keys. If drivers need
>>> this, they should handle it by themselves. This will simplify st/mesa,
>>> because TCS, TES, GS won't need the variant-tracking machinery.
>>>
>>> 2) Adding PIPE_CAP_SHAREABLE_SHADERS and if the cap is 1, set the
>>> context pointer to NULL in all keys.
>>>
>>> What do you think?
>>>
>>> Marek
>>
>>
>> I think there are other options worth considering:
>>
>>
>> a) I wonder if wouldn't be better the other way around: if drivers want to
>> avoid compiling the same shader for many contexts, they should maintain a
>> shader cache internally that's shared by all contexts.
>>
>>    As that can be done much more efficient -- there are many ways to
>> implement a global cache that avoids frequent locking
>>
>>    And the way to avoid reimplementing the wheel on every driver would be to
>> have an helper module that would take care of that -- ie, a generial
>> auxiliary module to maintain the shader cache in a thread safe maner.  That
>> module could potentially even learn to (de)serializing the cache from disk.
> 
> This is too complicated for now and not required for having
> screen-based shaders.
> 
>>
>>
>> b) split pipe shader objects in two parts:
>>
>>    1) a global object, which would allow the pipe drivers to pre-compile the
>> TGSI into an internal IR
>>
>>    2) per-context objects, so that pipe drivers could produce variants of
>> that IR which takes in account the currently bound state
>>
>>   And for pipe drivers that can't pre-compile anything, 1) would be nothing
>> more than a duplication of TGSI tokens.
> 
> I assume most drivers just don't want per-context shaders, so adding
> an interface for it would be counterproductive.
> 
>>
>>
>>
>> c) introduce Metal/DX12/Vulkan like "pipeline state objects", whereby the
>> state tracker declares upfront how the whole pipeline state objects are
>> bound together, so the driver can produce the shader variant when that
>> pipeline state is created, and doesn't need to worry about shader variantes
>> at draw time.
>>
>>    This is in fact like a) but moves the burden of creating the variant
>> cache into the state tracker (as the pipeline state cache) where it can be
>> shared by all pipe drivers.
> 
> This is not applicable to OpenGL, because it doesn't have pipeline
> objects, and I mainly optimize for OpenGL here. Additionally, Gallium
> is not a good match for Vulkan and a whole new driver interface for
> Vulkan that is very different from Gallium will need to be introduced,
> though it's more likely people will want to implement Vulkan functions
> directly and skip the interface.
> 
>>
>>
>>
>> Jose
>>
>>
>>> For some reason, st/mesa assumes that shaders can't be shared by
>>> multiple contexts and therefore has a context pointer in shader keys.
>>
>> PS: As I write this, I can't help having a feeling of deja-vu. I couldn't
>> find any discussion from my mail archives, but I think this subject has been
>> brought before.
> 
> I don't think this has been brought up before.
> 
>>
>> I wonder if we should start maintaining a "design.rst" in src/gallium/docs/
>> to document why some things were done the way they were.
> 
> In this case, I believe per-context shaders were done purely because
> of bad decisions and out-of-tree dependencies:
> - classic drivers had (mostly) per-context shaders, so gallium has them too
> - DX9 and DX10 have per-context shaders

I don't think that's quite true, it was more that GL's shared shaders
didn't seem to make all that much sense (due to just about anyone
requiring context dependent state for actual compilation anyway). So the
decision was probably inspired by d3d10, because it seemed to be much
cleaner (no worries about locks etc.), but not really because somehow
the out-of-tree code really would have required it.
In short, for simplicity, only things were sharable which were really
required to be shared (pretty much just actual resources - and yes that
doesn't work too well for GL neither as you can't share sampler/rt
views, let's face it GL's resource model there from legacy GL is a
disaster, not that d3d9 was any better).
And IIRC multithreaded GL in general never really used to work all that
well and noone was really using that much.
That said, it could be changed if that's really helpful. Threading
issues tend to be difficult though to deal with (not so much in theory
rather than in practice...).

Roland