[Mesa-dev] [RFC] opencl: mega-cl

Mon Feb 26 13:06:30 UTC 2018

On Mon, Feb 26, 2018 at 7:10 AM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
> Hi guys,
>
> Having attempted a similar thing in the past, I think there are two
> things at play here.
> As such I'd recommend trying to keep them separate.
>
> 1) Having a single and/or modular - state-tracker <> pipe-driver setup
> 2) "Hilarities" when having NIR code multiple times per process
>
> On 26 February 2018 at 01:54, Rob Clark <robdclark at gmail.com> wrote:
>> On Sun, Feb 25, 2018 at 3:00 PM, Francisco Jerez <currojerez at riseup.net> wrote:
>>> Seems like a serious hack to me to work around broken linking...  IMO we
>>> should just fix the linking issue.  The symbols for the various GLSL
>>> types need to be linked with the proper binding and visibility -- I
>>> assume that the cause of your problem is that people are making
>>> assumptions about the equality of GLSL types based on their memory
>>> addresses *and* marking the symbols as hidden *and* passing pointers to
>>> GLSL types across shared objects?  That sounds like a recipe for
>>> disaster.
>>
>> tbh, maybe hack, or I think more likely, maybe a good idea.. I'm not
>> terribly sold on the idea of dynamically loading pipe driver and
>> linking a lot of shared code into N different pipe_${driver}.so on
>> disk, since the # of drivers seems to be greater than # of state
>> trackers.. not to mention multiple copies of shared gallium code in
>> memory due to being statically linked into both state tracker and
>> driver.
>>
> This is a) above.
> If using a dynamic pipe-drivers across the tree one can achieve very
> good disk util.
> All the bits for DRI, VDPAU and others is there, we just need a
> configure toggle.
>
> The call which approach to use will be left to the distribution.
>
>> That said, glsl_type's defn does seem to expect to only exist once in
>> a process, ie. == is ptr comparison and when you link nir/glsl_types
>> into both state tracker and driver, glsl_type ptrs are getting passed
>> across that boundary.  I'm not really sure that is worth fixing (ie.
>> why should it exist twice in a process in the first place?)
>>
> This seems like b) - a bug, IMHO, which we should fix regardless of the above.
>
> Why - it's possible to have an application use OpenCL, Vulkan (even
> VDPAU, GL, etc.).
> Thus effectively pulling the NIR codebase multiple times in the same process.

But, each will have it's own pipe_contexts and they wouldn't be
sharing shaders between them, so that case of NIR (or really
glsl_types) existing multiple times in a process should be harmless.

Unless there is some linker magic to make it only use the first copy
of glsl_types that gets loaded, when it is statically linked into
multiple different .so's, I think the only other option to make
dynamic pipe loader work would be to make libnir into an .so

> Of the top of my head it sounds like we have a bunch of global
> variables, which are causing the problem.
> Or perhaps it's the screen sharing that bites us?
>
>> Maybe there are some linker tricks to solve this, idk.. that is a bit
>> outside my area of expertise.
>
> Last time I've looked symbols were properly annotated.
>
> Rob, can you try dropping the freedreno symbol from
> src/gallium/targets/dri/dri.sym.
> I'm about ~90% sure that it will fix your problem.
>

Hmm, that just makes clGetPlatformIDs() fail in a weird way (returns
-1001).. maybe something funny going on with build?

but I'm not entirely sure how you were expecting that to avoid two
instances of glsl_types (and it's corresponding singleton's)

BR,
-R