[Mesa-dev] [PATCH] radeon/llvm: Use LLVM C API for compiling LLVM IR to ISA.

Sat Apr 20 00:27:16 PDT 2013

Hi Tom,

May be I need to tell where the problem really appears in real life.
OpenSceneGraph has some nifty features regarding multi channel rendering.
Assume a setup of multiple full screen views running on different graphics 
boards into a single mashine composing a view into a single scene.
Now the recommended way to do this with osg is to set up a X screen per 
graphics board. Even if this spans multiple monitors/projectors. Set up a GL 
graphics context per graphics board and set up a viewport per projector in the 
graphics contexts. Rendering happens now in parallel for each graphics 
context. I do drive such a thing here with two radeons and three monitors for 
testing and here the problem appears.

When I start the favourite flight simulation application of my choice with this 
setup, then it crashes almost immediately without llvm_start_multithreaded 
being called. Wheres it works stable if we ensure llvm being multithreaded.

So, I tried to distill a piglit testcase out of this somehow huger setup with 
flightgear, OpenSceneGraph, multiple gpu's and what not.

On Friday, April 19, 2013 20:08:54 Tom Stellard wrote:
> On Wed, Apr 17, 2013 at 07:54:32AM +0200, Mathias Fröhlich wrote:
> > Tom,
> > 
> > > -class LLVMEnsureMultithreaded {
> > > -public:
> > > -   LLVMEnsureMultithreaded()
> > > -   {
> > > -      llvm_start_multithreaded();
> > > -   }
> > > -};
> > > -
> > > -static LLVMEnsureMultithreaded lLVMEnsureMultithreaded;
> > 
> > Removing this leads to crashes in llvm with applications that concurrently
> > work on different gl contexts.
> 
> The test you wrote still passes with this patch.  Do you see that
> we are now calling the C API version of llvm_start_multithreaded(),
> LLVMStartMutithreaded() from inside radeon_llvm_compile() protected by a
> static variable?

Oh, no I did not see this. I did not realize that the llvm_start_multithreaded 
call is not just plain C. So I thought grepping for the call I used is 
sufficient.

But negative. If I really apply your patch and try to run this with the above 
setup I get the crashes. The same with the piglit test here.

Too bad, that reproducing races is racy for itself.
With the piglit test I get about 2/3 of the runs either glibc memory 
corruption aborts. Or one of the below asserts from llvm:

bool llvm::llvm_start_multithreaded(): Assertion `!multithreaded_mode && 
"Already multithreaded!"' failed.

void 
llvm::PassRegistry::removeRegistrationListener(llvm::PassRegistrationListener*): 
Assertion `I != Impl->Listeners.end() && "PassRegistrationListener not 
registered!"' failed.

bool llvm::sys::SmartMutex<mt_only>::release() [with bool mt_only = true]: 
Assertion `((recursive && acquired) || (acquired == 1)) && "Lock not acquired 
before release!"' failed.

So the biggest problem IIRC was that use of llvm::sys::SmartMutex<mt_only> 
which is spread around here and there in llvm. The pass registry was (is?) one 
of the users for that. If you did not tell llvm to run multithreaded these 
locks get noops and you concurrently access containers and that ...

Looking at the first assert, the llvm guys have made this problem even worse 
IMO since I looked at this before. We need to check for multithreading being 
enabled before trying to set this. Both of which being racy for itself in this 
way and all of them not being save against already happening llvm access from 
an other thread and an other foreign use.

> Sorry about that. I didn't have piglit commit access at the time, and
> I forgot about the patch.  I fixed a few things and sent v3 to the list.
The same here. Thanks for this.

> > Regarding the point where this funciton is called I had choosen static
> > initialization time since llvm requires this function to be called single
> > threaded which we cannot guarantee in any case. Keep in mind that you need
> > to ensure this function called non concurrently even against applications
> > that itself already use the llvm libs in some way while the driver is
> > loaded. But the best bet is to do this in the dynamic loder which is
> > itself serialized, so I could avoid calling this function concurrently by
> > initialization of different contexts. That should at least shield against
> > applications that itself do the same trick by calling this funtion in the
> > dlopen phase in some static initializer ...
> > We may get around part of this problem with dlopening the driver with
> > better isolation but up to now the problem can get that far.
> 
> This is a tricky problem, and I'm not sure that radeon_llvm_compile() is
> the best place to call llvm_start_multithreaded().  Maybe it would be
> better to move this into gallivm, because this problem could affect any
> driver that uses the gallivm code, which includes: llvmpipe, r300g, r600g,
> radeonsi, and i915g.  What do you think?

Yep, an other place would be better.

I do not know the llvm tools well enough, but If I move the current c++ code 
into src/gallium/auxiliary/gallivm/lp_bld_misc.cpp it works for me (TM).
Seriously, I know of one guy who wants to use llvmpipe with windows and he 
would benefit from the c++ solution since the gcc extension solution is clearly 
not available with the cl windows build then. And with the lack of a windows 
test system I do also not start implementing DllMain.
That could be a short term solution ...

Appart from that, I mentioned better symbol isolation somehow.

How would this improove the situation with the initialization?
If we would know that this current driver module is the only user of the this 
particular llvm instance, then we could really place a mutex somewhere and 
serialize the call to lvm::llvm_start_multithreaded() by ourselves as we then 
know that we are the only user. This would then happen before the first use of 
llvm by the driver module in question. No dlopen time static initializers that 
usually lead to ordering problems earlier or later ...

So, I have a patch set floating around here for about half a year that changes 
the RTLD_GLOBAL arguments to dlopen to RTLD_LOCAL|RTLD_DEEPBIND which should 
(? I still have to definitely investigate this) provide us the private copy of 
the used libraries/modules. Also up to now I did not take the time to really 
analyze if we rely on some globals being really shared between driver modules 
or different types of drivers that we then also have multiple times in memory. 
Note that this does *not* affect the shared glapi topics as they are all above 
the loaded driver modules at the level of libGL.so.1.
So for example I can think of some need for a single libdrm_radeon to make cl-
gl buffer sharing work for example? Do we?

Alternatively, you might be able to pack every use of llvm at least in radeon 
statically into libllvmradeon and again hide all non driver api used stuff from 
the outside. Then you would also have your private llvm instance that you can 
initialize with your own lock to serialize. That is then more a radeon alone 
solution as the other drivers need to do something similar themselves then.
And I realize that your recent code move work could point into this direction 
at least for the radeon type drivers.

Both symbol isolation methods will also make the driver used llvm version 
independent of any possibly application used version that can lead to 
incompatible symbol clashes. So you gain even more with that.

Opinions?

Mathias