[Mesa-dev] [RFC] opencl: mega-cl

Mon Feb 26 13:16:34 UTC 2018

On Mon, Feb 26, 2018 at 8:06 AM, Rob Clark <robdclark at gmail.com> wrote:
> On Mon, Feb 26, 2018 at 7:10 AM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>> Hi guys,
>>
>> Having attempted a similar thing in the past, I think there are two
>> things at play here.
>> As such I'd recommend trying to keep them separate.
>>
>> 1) Having a single and/or modular - state-tracker <> pipe-driver setup
>> 2) "Hilarities" when having NIR code multiple times per process
>>
>> On 26 February 2018 at 01:54, Rob Clark <robdclark at gmail.com> wrote:
>>> On Sun, Feb 25, 2018 at 3:00 PM, Francisco Jerez <currojerez at riseup.net> wrote:
>>>> Seems like a serious hack to me to work around broken linking...  IMO we
>>>> should just fix the linking issue.  The symbols for the various GLSL
>>>> types need to be linked with the proper binding and visibility -- I
>>>> assume that the cause of your problem is that people are making
>>>> assumptions about the equality of GLSL types based on their memory
>>>> addresses *and* marking the symbols as hidden *and* passing pointers to
>>>> GLSL types across shared objects?  That sounds like a recipe for
>>>> disaster.
>>>
>>> tbh, maybe hack, or I think more likely, maybe a good idea.. I'm not
>>> terribly sold on the idea of dynamically loading pipe driver and
>>> linking a lot of shared code into N different pipe_${driver}.so on
>>> disk, since the # of drivers seems to be greater than # of state
>>> trackers.. not to mention multiple copies of shared gallium code in
>>> memory due to being statically linked into both state tracker and
>>> driver.
>>>
>> This is a) above.
>> If using a dynamic pipe-drivers across the tree one can achieve very
>> good disk util.
>> All the bits for DRI, VDPAU and others is there, we just need a
>> configure toggle.
>>
>> The call which approach to use will be left to the distribution.
>>
>>> That said, glsl_type's defn does seem to expect to only exist once in
>>> a process, ie. == is ptr comparison and when you link nir/glsl_types
>>> into both state tracker and driver, glsl_type ptrs are getting passed
>>> across that boundary.  I'm not really sure that is worth fixing (ie.
>>> why should it exist twice in a process in the first place?)
>>>
>> This seems like b) - a bug, IMHO, which we should fix regardless of the above.
>>
>> Why - it's possible to have an application use OpenCL, Vulkan (even
>> VDPAU, GL, etc.).
>> Thus effectively pulling the NIR codebase multiple times in the same process.
>
> But, each will have it's own pipe_contexts and they wouldn't be
> sharing shaders between them, so that case of NIR (or really
> glsl_types) existing multiple times in a process should be harmless.
>
> Unless there is some linker magic to make it only use the first copy
> of glsl_types that gets loaded, when it is statically linked into
> multiple different .so's, I think the only other option to make
> dynamic pipe loader work would be to make libnir into an .so
>
>> Of the top of my head it sounds like we have a bunch of global
>> variables, which are causing the problem.
>> Or perhaps it's the screen sharing that bites us?
>>
>>> Maybe there are some linker tricks to solve this, idk.. that is a bit
>>> outside my area of expertise.
>>
>> Last time I've looked symbols were properly annotated.
>>
>> Rob, can you try dropping the freedreno symbol from
>> src/gallium/targets/dri/dri.sym.
>> I'm about ~90% sure that it will fix your problem.
>>
>
> Hmm, that just makes clGetPlatformIDs() fail in a weird way (returns
> -1001).. maybe something funny going on with build?

Ok, this was a LD_LIBRARY_PATH issue.. (I have libllvm_spirv.so
installed in /usr/local/llvm/... and the failure mode for not failing
a dependent so of libMesaOpenCL.so was not entirely what I expected)

So w/ mega-cl patch reverted and the dri.sym change you suggested, I
still get the same failure as before:

  math-int: ../src/compiler/nir/nir.h:1960: nir_shader_get_entrypoint:
Assertion `func->return_type == glsl_void_type()' failed.

which is the result of having two different void_type singletons... I
wasn't entirely sure how the dri.sym change was expected to fix that
(but then again I'm not well versed in linker magic)

BR,
-R

> but I'm not entirely sure how you were expecting that to avoid two
> instances of glsl_types (and it's corresponding singleton's)
>
> BR,
> -R