[Mesa-dev] [PATCH] radeon/llvm: Use LLVM C API for compiling LLVM IR to ISA.

Tue Apr 23 20:47:24 PDT 2013

On Sat, Apr 20, 2013 at 09:27:16AM +0200, Mathias Fröhlich wrote:
> 
> Hi Tom,
> 
> May be I need to tell where the problem really appears in real life.
> OpenSceneGraph has some nifty features regarding multi channel rendering.
> Assume a setup of multiple full screen views running on different graphics 
> boards into a single mashine composing a view into a single scene.
> Now the recommended way to do this with osg is to set up a X screen per 
> graphics board. Even if this spans multiple monitors/projectors. Set up a GL 
> graphics context per graphics board and set up a viewport per projector in the 
> graphics contexts. Rendering happens now in parallel for each graphics 
> context. I do drive such a thing here with two radeons and three monitors for 
> testing and here the problem appears.
> 
> When I start the favourite flight simulation application of my choice with this 
> setup, then it crashes almost immediately without llvm_start_multithreaded 
> being called. Wheres it works stable if we ensure llvm being multithreaded.
> 
> So, I tried to distill a piglit testcase out of this somehow huger setup with 
> flightgear, OpenSceneGraph, multiple gpu's and what not.
> 
> On Friday, April 19, 2013 20:08:54 Tom Stellard wrote:
> > On Wed, Apr 17, 2013 at 07:54:32AM +0200, Mathias Fröhlich wrote:
> > > Tom,
> > > 
> > > > -class LLVMEnsureMultithreaded {
> > > > -public:
> > > > -   LLVMEnsureMultithreaded()
> > > > -   {
> > > > -      llvm_start_multithreaded();
> > > > -   }
> > > > -};
> > > > -
> > > > -static LLVMEnsureMultithreaded lLVMEnsureMultithreaded;
> > > 
> > > Removing this leads to crashes in llvm with applications that concurrently
> > > work on different gl contexts.
> > 
> > The test you wrote still passes with this patch.  Do you see that
> > we are now calling the C API version of llvm_start_multithreaded(),
> > LLVMStartMutithreaded() from inside radeon_llvm_compile() protected by a
> > static variable?
> 
> Oh, no I did not see this. I did not realize that the llvm_start_multithreaded 
> call is not just plain C. So I thought grepping for the call I used is 
> sufficient.
> 
> But negative. If I really apply your patch and try to run this with the above 
> setup I get the crashes. The same with the piglit test here.
> 
> Too bad, that reproducing races is racy for itself.
> With the piglit test I get about 2/3 of the runs either glibc memory 
> corruption aborts. Or one of the below asserts from llvm:
> 
> bool llvm::llvm_start_multithreaded(): Assertion `!multithreaded_mode && 
> "Already multithreaded!"' failed.
> 
> void 
> llvm::PassRegistry::removeRegistrationListener(llvm::PassRegistrationListener*): 
> Assertion `I != Impl->Listeners.end() && "PassRegistrationListener not 
> registered!"' failed.
> 
> bool llvm::sys::SmartMutex<mt_only>::release() [with bool mt_only = true]: 
> Assertion `((recursive && acquired) || (acquired == 1)) && "Lock not acquired 
> before release!"' failed.
> 
> So the biggest problem IIRC was that use of llvm::sys::SmartMutex<mt_only> 
> which is spread around here and there in llvm. The pass registry was (is?) one 
> of the users for that. If you did not tell llvm to run multithreaded these 
> locks get noops and you concurrently access containers and that ...
> 
> Looking at the first assert, the llvm guys have made this problem even worse 
> IMO since I looked at this before. We need to check for multithreading being 
> enabled before trying to set this. Both of which being racy for itself in this 
> way and all of them not being save against already happening llvm access from 
> an other thread and an other foreign use.
> 
> > Sorry about that. I didn't have piglit commit access at the time, and
> > I forgot about the patch.  I fixed a few things and sent v3 to the list.
> The same here. Thanks for this.
> 
> > > Regarding the point where this funciton is called I had choosen static
> > > initialization time since llvm requires this function to be called single
> > > threaded which we cannot guarantee in any case. Keep in mind that you need
> > > to ensure this function called non concurrently even against applications
> > > that itself already use the llvm libs in some way while the driver is
> > > loaded. But the best bet is to do this in the dynamic loder which is
> > > itself serialized, so I could avoid calling this function concurrently by
> > > initialization of different contexts. That should at least shield against
> > > applications that itself do the same trick by calling this funtion in the
> > > dlopen phase in some static initializer ...
> > > We may get around part of this problem with dlopening the driver with
> > > better isolation but up to now the problem can get that far.
> > 
> > This is a tricky problem, and I'm not sure that radeon_llvm_compile() is
> > the best place to call llvm_start_multithreaded().  Maybe it would be
> > better to move this into gallivm, because this problem could affect any
> > driver that uses the gallivm code, which includes: llvmpipe, r300g, r600g,
> > radeonsi, and i915g.  What do you think?
> 
> Yep, an other place would be better.
> 
> I do not know the llvm tools well enough, but If I move the current c++ code 
> into src/gallium/auxiliary/gallivm/lp_bld_misc.cpp it works for me (TM).
> Seriously, I know of one guy who wants to use llvmpipe with windows and he 
> would benefit from the c++ solution since the gcc extension solution is clearly 
> not available with the cl windows build then. And with the lack of a windows 
> test system I do also not start implementing DllMain.
> That could be a short term solution ...
> 
> 

Hi Mathias,

First of all, thanks for investigating this.  The information you've
provided has helped me a lot.

> Appart from that, I mentioned better symbol isolation somehow.
> 
> How would this improove the situation with the initialization?
> If we would know that this current driver module is the only user of the this 
> particular llvm instance, then we could really place a mutex somewhere and 
> serialize the call to lvm::llvm_start_multithreaded() by ourselves as we then 
> know that we are the only user. This would then happen before the first use of 
> llvm by the driver module in question. No dlopen time static initializers that 
> usually lead to ordering problems earlier or later ...
> 
> So, I have a patch set floating around here for about half a year that changes 
> the RTLD_GLOBAL arguments to dlopen to RTLD_LOCAL|RTLD_DEEPBIND which should 
> (? I still have to definitely investigate this) provide us the private copy of 
> the used libraries/modules. Also up to now I did not take the time to really 
> analyze if we rely on some globals being really shared between driver modules 
> or different types of drivers that we then also have multiple times in memory. 
> Note that this does *not* affect the shared glapi topics as they are all above 
> the loaded driver modules at the level of libGL.so.1.
> So for example I can think of some need for a single libdrm_radeon to make cl-
> gl buffer sharing work for example? Do we?
> 
> Alternatively, you might be able to pack every use of llvm at least in radeon 
> statically into libllvmradeon and again hide all non driver api used stuff from 
> the outside. Then you would also have your private llvm instance that you can 
> initialize with your own lock to serialize. That is then more a radeon alone 
> solution as the other drivers need to do something similar themselves then.
> And I realize that your recent code move work could point into this direction 
> at least for the radeon type drivers.
>

I took a shot at implementing it this way with private static copies of
llvm.  I've pushed the initial work to this branch:
git://people.freedesktop.org/~tstellar/mesa llvm-build

I was able to hide the LLVM symbols by using the --exclude-libs linker
flag and so far I've tested with glxgears, an opencl example and also
eglgears and they all work.  I still need to do some more testing, but
any thoughts on what I've done so far in this branch?

-Tom

> Both symbol isolation methods will also make the driver used llvm version 
> independent of any possibly application used version that can lead to 
> incompatible symbol clashes. So you gain even more with that.
> 
> Opinions?
> 
> Mathias