[Mesa-dev] [RFC] opencl: mega-cl

Tue Feb 27 13:10:25 UTC 2018

On Tue, Feb 27, 2018 at 6:47 AM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
> On 27 February 2018 at 00:26, Rob Clark <robdclark at gmail.com> wrote:
>> On Mon, Feb 26, 2018 at 6:12 PM, Francisco Jerez <currojerez at riseup.net> wrote:
>>> Rob Clark <robdclark at gmail.com> writes:
>>>
>>>> On Sun, Feb 25, 2018 at 3:00 PM, Francisco Jerez <currojerez at riseup.net> wrote:
>>>>> Seems like a serious hack to me to work around broken linking...  IMO we
>>>>> should just fix the linking issue.  The symbols for the various GLSL
>>>>> types need to be linked with the proper binding and visibility -- I
>>>>> assume that the cause of your problem is that people are making
>>>>> assumptions about the equality of GLSL types based on their memory
>>>>> addresses *and* marking the symbols as hidden *and* passing pointers to
>>>>> GLSL types across shared objects?  That sounds like a recipe for
>>>>> disaster.
>>>>
>>>> tbh, maybe hack, or I think more likely, maybe a good idea.. I'm not
>>>> terribly sold on the idea of dynamically loading pipe driver and
>>>> linking a lot of shared code into N different pipe_${driver}.so on
>>>> disk, since the # of drivers seems to be greater than # of state
>>>> trackers.. not to mention multiple copies of shared gallium code in
>>>> memory due to being statically linked into both state tracker and
>>>> driver.
>>>>
>>>
>>> It would probably make sense to link that shared code dynamically into
>>> the pipe-loader modules in order to avoid the duplication you're talking
>>> about (e.g. by building them as a meson shared_module() instead of a
>>> shared_library() in order to allow the common symbols to be undefined in
>>> the pipe-loader module which will then be pulled in by the state
>>> tracker) -- Actually that's likely to fix your problem too.
>>
>> yes, for pipe_loader_dynamic case, this is the only way I can see for
>> it to work (although I think we still want to link statically in the
>> pipe_loader_static case).  Well that, or making the glsl_types symbols
>> not -Bsymbolic.
>>
>> The potential problem here is, what if an app links in both (for ex.)
>> CL + GL, but somehow from different mesa versions.  I suppose that
>> might be a corner case, but it would fall over badly w/ nir/glsl_types
>> in a .so.  But I think the way things are packaged in most distro's, a
>> user could upgrade specifically the mesa-gl package but not the
>> mesa-cl package.  (Also, we don't really want to export glsl/nir
>> symbols lest someone mistake them for a stable ABI.)
>>
>> I guess I am ok with the approach of (a) adding support for "mega-cl"
>> using piple_loader_static plus (b) adding a libnir.so for the dynamic
>> case, and letting users of pipe_loader_dynamic keep both pieces in the
>> broken corner cases.
>>
>>>> That said, glsl_type's defn does seem to expect to only exist once in
>>>> a process, ie. == is ptr comparison and when you link nir/glsl_types
>>>> into both state tracker and driver, glsl_type ptrs are getting passed
>>>> across that boundary.  I'm not really sure that is worth fixing (ie.
>>>> why should it exist twice in a process in the first place?)
>>>>
>>>
>>> *Shrug* It's kind of a micro-optimization with nasty side effects
>>> (namely that you cannot pass GLSL IR or NIR across shared object
>>> boundaries without things exploding) and dubious payoff, it would
>>> probably be reasonable to simply drop the micro-optimization, but that
>>> wasn't the solution I was suggesting (but rather, making the
>>> optimization work by exporting the symbols GLSL type equality cares
>>> about with the right linkage).
>>
>> well, I think it is long past too late to try to solve that.  If it
>> were only in c++ code we could overload == but it is a mix of c++ and
>> c, and tracking down all the places that do pointer comparisons seems
>> like it would be tricky to avoid missing some and having subtle bugs
>> as a result.
>>
> Do we know why things were exploding on your end and Karol was fine?
> Was it a missing a assert/other trivial bits, or it genuinely working
> fine for Karol?
>

He was also getting two copies of glsl_types, so I guess somehow just
didn't discover the problem yet.

> If one is to opt for either workaround, please make sure that it is
> extensively documented.
>
>
>>>> Maybe there are some linker tricks to solve this, idk.. that is a bit
>>>> outside my area of expertise.
>>>
>>> More than adding new linker tricks I think you just need to make sure
>>> that existing linker tricks we're doing right now ('-fvisibility=hidden'
>>> and various linker scripts) don't break the build by forcing common
>>> symbols to be local, along the lines of:
>>>
>>> | diff --git a/src/compiler/glsl_types.h b/src/compiler/glsl_types.h
>>> | index ab0b2637649..9d8a5ea341c 100644
>>> | --- a/src/compiler/glsl_types.h
>>> | +++ b/src/compiler/glsl_types.h
>>> | @@ -243,7 +243,7 @@ public:
>>> |     /*@{*/
>>> |  #undef  DECL_TYPE
>>> |  #define DECL_TYPE(NAME, ...) \
>>> | -   static const glsl_type *const NAME##_type;
>>> | +   static const glsl_type *const NAME##_type __attribute__((visibility("default")));
>>> |  #undef  STRUCT_TYPE
>>> |  #define STRUCT_TYPE(NAME) \
>>> |     static const glsl_type *const struct_##NAME##_type;
>>> | diff --git a/src/gallium/targets/opencl/opencl.sym b/src/gallium/targets/opencl/opencl.sym
>>> | index 9fcc57692b8..fadc2eb6c1d 100644
>>> | --- a/src/gallium/targets/opencl/opencl.sym
>>> | +++ b/src/gallium/targets/opencl/opencl.sym
>>> | @@ -2,6 +2,10 @@
>>> |       global:
>>> |               cl*;
>>> |               opencl_dri_*;
>>> | +
>>> | +             extern "C++" {
>>> | +                     glsl_type::*;
>>> | +             };
>>> |       local:
>>> |               *;
>>> |  };
>>> | diff --git a/src/gallium/targets/pipe-loader/pipe.sym b/src/gallium/targets/pipe-loader/pipe.sym
>>> | index 605cb83d802..801b0f44d56 100644
>>> | --- a/src/gallium/targets/pipe-loader/pipe.sym
>>> | +++ b/src/gallium/targets/pipe-loader/pipe.sym
>>> | @@ -7,6 +7,10 @@
>>> |               # due to LLVM being initialized multiple times.
>>> |               radeon_drm_winsys_create;
>>> |               amdgpu_winsys_create;
>>> | +
>>> | +             extern "C++" {
>>> | +                     glsl_type::*;
>>> | +             };
>>> |       local:
>>> |               *;
>>> |  };
>>>
>>
>> iiuc, this would have the same corner case of moving nir/glsl_types to
>> a separate .so when mixing state tracker versions in a process.
>>
>>>> To me, the mega-driver approach worked well for gl, and many of the
>>>> other state trackers are already using libpipe_loader_static.. why not
>>>> just fix the problems with that approach and abandon
>>>> libpipe_loader_dynamic?  This seems like it would work out better for
>>>> memory footprint since pages for drivers not used simply wouldn't get
>>>> paged in.  And you aren't ending up with two copies of libgallium or
>>>> libnir code in memory.
>>>>
>>>
>>> The dynamic pipe loader would give you better disk and memory footprint
>>> than the static one assuming that the problem with linking shared code
>>> statically into each pipe-loader module was fixed, because that would
>>> allow the pipe driver code to be shared in memory and disk among
>>> different state trackers, but I guess that the benefit is slight at this
>>> point, I understand your reluctance to fix it ;).
>>>
>>
>> I guess I haven't measured, but I wonder if this is even true.  The
>> cost increase is duplicating pipe_driver code in each state tracker
>> (but I guess there are only 3 or 4 that typically end up on users
>> filesys, and of those only mesa/st really needs to link in *all* the
>> drivers, clover and the video state trackers are only supported by a
>> few drivers each), but is that offset by de-duplicating shared code
>> (gallium/aux, libnir, etc)?
>>
> The size guesstimate is miles off.
>
> As mentioned already - there are _serious_ size savings. Specifics
> depend on your exact build, but a 'generic' x86 is below.
>
> Disabled the Vulkan bits to save a few seconds
> ./configure \
>    --with-vulkan-drivers= \
>    --with-gallium-drivers=nouveau,r300,r600,radeonsi,svga,swrast,virgl \
>    --enable-nine \
>    --enable-xa \
>    --enable-omx-bellagio \
>    --enable-opencl \
>    --enable-opencl-icd
>
> * 'overall size' - make && DESTDIR=`pwd`/install make install && du -ch install
> 635M    total - mega
> 382M    total - pipe-drivers
>
> Since much of ^^ can be stripped/etc, let's look at the binary size:
>
> First the pipe-drivers - identical across both
>     text    data     bss      dec     hex filename
>  2890690  101208 1998856  4990754  4c2722 pipe_nouveau.so
>  1788418   83296 1998376  3870090  3b0d8a pipe_r300.so
>  1872767  115000 1735176  3722943  38cebf pipe_r600.so
>  2805967  222072 1735464  4763503  48af6f pipe_radeonsi.so
>  1687546   79667  428384  2195597  21808d pipe_swrast.so
>  1766134   86240  419272  2271646  22a99e pipe_vmwgfx.so
>
> VDPAU: x10 ~7M -> 0.7M
>     text    data     bss      dec     hex filename
>  5742358  287184 1868280  7897822  7882de libvdpau_*.so - mega
>   648640   72216    8832   729688   b2258 libvdpau_*.so - pipe-drivers
>
> DRI: x2 ~13M -> 6.5M
>     text    data     bss      dec     hex filename
> 11183120  399368 2061728 13644216  d031b8 gallium_dri.so - mega
>  6110326  306680  316328  6733334  66be16 gallium_dri.so -
> pipe-drivers -> we have some bits to cleanup
>
> VA, OMX, etc are in the same case.
>
> In a Tl;Dr; - build two targets you're already saving some. Build
> three or more you're up for a huge win.
>

note that currently the video state trackers (and with my patch,
clover) are currently linking all the drivers, even when they don't
support that particular state tracker, so I guess the thing I
originally suggested about having stub drivers would improve things a
bit.  (Although the drivers that would be stubbed out are not the
largest ones.)

And in practice, how many people would be installing more than
gl+vdpau+clover?  All the drivers support gl, but I think the typical
config there is already mega-driver.  So we are trading off removing
pipe_*.so for adding a larger vdpau+clover, but once we strip out the
drivers that don't make sense for those state trackers, is it really
so bad?

>
>> I did have a thought earlier today.  Since we have glvnd, and
>> opencl-icd (and I guess vdpau and vaapi have a sort of loader), where
>> the state tracker isn't directly exporting the API symbols but rather
>> just a jump-table used by the icd .so, we could actually perhaps do a
>> mega-mega driver where all the state trackers and all the drivers are
>> linked into a single .so.. That would reduce the disk and memory
>> footprint the most.  (Well, ok, it would cost maybe some virtual
>> address space, but pages associated w/ unused state trackers wouldn't
>> be demand-paged in, so meh?)
>>
> Err... no. It was brought up before and it's a very bad idea. The
> different targets pull different dependencies which you can not force.

well, I mean, we are already pulling in llvm no matter what, so is
whatever else that gets dragged in really that bad (or unsolveable)?
What sort of dependencies are we talking about?

> I would greatly encourage that we shift away from the binary sizes and
> consider resolving the icky pointer/singleton/related issues.

Short of dynamically linking glsl_types, or making glsl_types symbols
resolved at runtime, I am not sure there is a solution.  And that
becomes a potential issue if you mix state trackers from different
mesa versions.  (Although maybe the answer there is simply "don't do
that"?)

BR,
-R