[Mesa-dev] [PATCH] radeon/llvm: Use LLVM C API for compiling LLVM IR to ISA.
Tom Stellard
tom at stellard.net
Tue Apr 23 20:47:24 PDT 2013
On Sat, Apr 20, 2013 at 09:27:16AM +0200, Mathias Fröhlich wrote:
>
> Hi Tom,
>
> May be I need to tell where the problem really appears in real life.
> OpenSceneGraph has some nifty features regarding multi channel rendering.
> Assume a setup of multiple full screen views running on different graphics
> boards into a single mashine composing a view into a single scene.
> Now the recommended way to do this with osg is to set up a X screen per
> graphics board. Even if this spans multiple monitors/projectors. Set up a GL
> graphics context per graphics board and set up a viewport per projector in the
> graphics contexts. Rendering happens now in parallel for each graphics
> context. I do drive such a thing here with two radeons and three monitors for
> testing and here the problem appears.
>
> When I start the favourite flight simulation application of my choice with this
> setup, then it crashes almost immediately without llvm_start_multithreaded
> being called. Wheres it works stable if we ensure llvm being multithreaded.
>
> So, I tried to distill a piglit testcase out of this somehow huger setup with
> flightgear, OpenSceneGraph, multiple gpu's and what not.
>
> On Friday, April 19, 2013 20:08:54 Tom Stellard wrote:
> > On Wed, Apr 17, 2013 at 07:54:32AM +0200, Mathias Fröhlich wrote:
> > > Tom,
> > >
> > > > -class LLVMEnsureMultithreaded {
> > > > -public:
> > > > - LLVMEnsureMultithreaded()
> > > > - {
> > > > - llvm_start_multithreaded();
> > > > - }
> > > > -};
> > > > -
> > > > -static LLVMEnsureMultithreaded lLVMEnsureMultithreaded;
> > >
> > > Removing this leads to crashes in llvm with applications that concurrently
> > > work on different gl contexts.
> >
> > The test you wrote still passes with this patch. Do you see that
> > we are now calling the C API version of llvm_start_multithreaded(),
> > LLVMStartMutithreaded() from inside radeon_llvm_compile() protected by a
> > static variable?
>
> Oh, no I did not see this. I did not realize that the llvm_start_multithreaded
> call is not just plain C. So I thought grepping for the call I used is
> sufficient.
>
> But negative. If I really apply your patch and try to run this with the above
> setup I get the crashes. The same with the piglit test here.
>
> Too bad, that reproducing races is racy for itself.
> With the piglit test I get about 2/3 of the runs either glibc memory
> corruption aborts. Or one of the below asserts from llvm:
>
> bool llvm::llvm_start_multithreaded(): Assertion `!multithreaded_mode &&
> "Already multithreaded!"' failed.
>
> void
> llvm::PassRegistry::removeRegistrationListener(llvm::PassRegistrationListener*):
> Assertion `I != Impl->Listeners.end() && "PassRegistrationListener not
> registered!"' failed.
>
> bool llvm::sys::SmartMutex<mt_only>::release() [with bool mt_only = true]:
> Assertion `((recursive && acquired) || (acquired == 1)) && "Lock not acquired
> before release!"' failed.
>
> So the biggest problem IIRC was that use of llvm::sys::SmartMutex<mt_only>
> which is spread around here and there in llvm. The pass registry was (is?) one
> of the users for that. If you did not tell llvm to run multithreaded these
> locks get noops and you concurrently access containers and that ...
>
> Looking at the first assert, the llvm guys have made this problem even worse
> IMO since I looked at this before. We need to check for multithreading being
> enabled before trying to set this. Both of which being racy for itself in this
> way and all of them not being save against already happening llvm access from
> an other thread and an other foreign use.
>
> > Sorry about that. I didn't have piglit commit access at the time, and
> > I forgot about the patch. I fixed a few things and sent v3 to the list.
> The same here. Thanks for this.
>
> > > Regarding the point where this funciton is called I had choosen static
> > > initialization time since llvm requires this function to be called single
> > > threaded which we cannot guarantee in any case. Keep in mind that you need
> > > to ensure this function called non concurrently even against applications
> > > that itself already use the llvm libs in some way while the driver is
> > > loaded. But the best bet is to do this in the dynamic loder which is
> > > itself serialized, so I could avoid calling this function concurrently by
> > > initialization of different contexts. That should at least shield against
> > > applications that itself do the same trick by calling this funtion in the
> > > dlopen phase in some static initializer ...
> > > We may get around part of this problem with dlopening the driver with
> > > better isolation but up to now the problem can get that far.
> >
> > This is a tricky problem, and I'm not sure that radeon_llvm_compile() is
> > the best place to call llvm_start_multithreaded(). Maybe it would be
> > better to move this into gallivm, because this problem could affect any
> > driver that uses the gallivm code, which includes: llvmpipe, r300g, r600g,
> > radeonsi, and i915g. What do you think?
>
> Yep, an other place would be better.
>
> I do not know the llvm tools well enough, but If I move the current c++ code
> into src/gallium/auxiliary/gallivm/lp_bld_misc.cpp it works for me (TM).
> Seriously, I know of one guy who wants to use llvmpipe with windows and he
> would benefit from the c++ solution since the gcc extension solution is clearly
> not available with the cl windows build then. And with the lack of a windows
> test system I do also not start implementing DllMain.
> That could be a short term solution ...
>
>
Hi Mathias,
First of all, thanks for investigating this. The information you've
provided has helped me a lot.
> Appart from that, I mentioned better symbol isolation somehow.
>
> How would this improove the situation with the initialization?
> If we would know that this current driver module is the only user of the this
> particular llvm instance, then we could really place a mutex somewhere and
> serialize the call to lvm::llvm_start_multithreaded() by ourselves as we then
> know that we are the only user. This would then happen before the first use of
> llvm by the driver module in question. No dlopen time static initializers that
> usually lead to ordering problems earlier or later ...
>
> So, I have a patch set floating around here for about half a year that changes
> the RTLD_GLOBAL arguments to dlopen to RTLD_LOCAL|RTLD_DEEPBIND which should
> (? I still have to definitely investigate this) provide us the private copy of
> the used libraries/modules. Also up to now I did not take the time to really
> analyze if we rely on some globals being really shared between driver modules
> or different types of drivers that we then also have multiple times in memory.
> Note that this does *not* affect the shared glapi topics as they are all above
> the loaded driver modules at the level of libGL.so.1.
> So for example I can think of some need for a single libdrm_radeon to make cl-
> gl buffer sharing work for example? Do we?
>
> Alternatively, you might be able to pack every use of llvm at least in radeon
> statically into libllvmradeon and again hide all non driver api used stuff from
> the outside. Then you would also have your private llvm instance that you can
> initialize with your own lock to serialize. That is then more a radeon alone
> solution as the other drivers need to do something similar themselves then.
> And I realize that your recent code move work could point into this direction
> at least for the radeon type drivers.
>
I took a shot at implementing it this way with private static copies of
llvm. I've pushed the initial work to this branch:
git://people.freedesktop.org/~tstellar/mesa llvm-build
I was able to hide the LLVM symbols by using the --exclude-libs linker
flag and so far I've tested with glxgears, an opencl example and also
eglgears and they all work. I still need to do some more testing, but
any thoughts on what I've done so far in this branch?
-Tom
> Both symbol isolation methods will also make the driver used llvm version
> independent of any possibly application used version that can lead to
> incompatible symbol clashes. So you gain even more with that.
>
> Opinions?
>
> Mathias
More information about the mesa-dev
mailing list