[Mesa-dev] [RFC PATCH 00/16] A new IR for Mesa

Wed Aug 20 12:43:01 PDT 2014

On Wed, Aug 20, 2014 at 12:26:15PM -0700, Connor Abbott wrote:
> On Wed, Aug 20, 2014 at 12:17 PM, Tom Stellard <tom at stellard.net> wrote:
> > On Tue, Aug 19, 2014 at 05:19:15PM -0700, Connor Abbott wrote:
> >> On Tue, Aug 19, 2014 at 3:57 PM, Tom Stellard <tom at stellard.net> wrote:
> >> > On Tue, Aug 19, 2014 at 01:37:56PM -0700, Connor Abbott wrote:
> >> >> On Tue, Aug 19, 2014 at 11:40 AM, Francisco Jerez <currojerez at riseup.net> wrote:
> >> >> > Tom Stellard <tom at stellard.net> writes:
> >> >> >
> >> >> >> On Tue, Aug 19, 2014 at 11:04:59AM -0400, Connor Abbott wrote:
> >> >> >>> On Mon, Aug 18, 2014 at 8:52 PM, Michel Dänzer <michel at daenzer.net> wrote:
> >> >> >>> > On 19.08.2014 01:28, Connor Abbott wrote:
> >> >> >>> >> On Mon, Aug 18, 2014 at 4:32 AM, Michel Dänzer <michel at daenzer.net> wrote:
> >> >> >>> >>> On 16.08.2014 09:12, Connor Abbott wrote:
> >> >> >>> >>>> I know what you might be thinking right now. "Wait, *another* IR? Don't
> >> >> >>> >>>> we already have like 5 of those, not counting all the driver-specific
> >> >> >>> >>>> ones? Isn't this stuff complicated enough already?" Well, there are some
> >> >> >>> >>>> pretty good reasons to start afresh (again...). In the years we've been
> >> >> >>> >>>> using GLSL IR, we've come to realize that, in fact, it's not what we
> >> >> >>> >>>> want *at all* to do optimizations on.
> >> >> >>> >>>
> >> >> >>> >>> Did you evaluate using LLVM IR instead of inventing yet another one?
> >> >> >>> >>>
> >> >> >>> >>>
> >> >> >>> >>> --
> >> >> >>> >>> Earthling Michel Dänzer            |                  http://www.amd.com
> >> >> >>> >>> Libre software enthusiast          |                Mesa and X developer
> >> >> >>> >>
> >> >> >>> >> Yes. See
> >> >> >>> >>
> >> >> >>> >> http://lists.freedesktop.org/archives/mesa-dev/2014-February/053502.html
> >> >> >>> >>
> >> >> >>> >> and
> >> >> >>> >>
> >> >> >>> >> http://lists.freedesktop.org/archives/mesa-dev/2014-February/053522.html
> >> >> >>> >
> >> >> >>> > I know Ian can't deal with LLVM for some reason. I was wondering if
> >> >> >>> > *you* evaluated it, and if so, why you rejected it.
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > --
> >> >> >>> > Earthling Michel Dänzer            |                  http://www.amd.com
> >> >> >>> > Libre software enthusiast          |                Mesa and X developer
> >> >> >>>
> >> >> >>>
> >> >> >>> Well, first of all, the fact that Ian and Ken don't want to use it
> >> >> >>> means that any plan to use LLVM for the Intel driver is dead in the
> >> >> >>> water anyways - you can translate NIR into LLVM if you want, but for
> >> >> >>> i965 we want to share optimizations between our 2 backends (FS and
> >> >> >>> vec4) that we can't do today in GLSL IR so this is what we want to use
> >> >> >>> for that, and since nobody else does anything with the core GLSL
> >> >> >>> compiler except when they have to, when we start moving things out of
> >> >> >>> GLSL IR this will probably replace GLSL IR as the infrastructure that
> >> >> >>> all Mesa drivers use. But with that in mind, here are a few reasons
> >> >> >>> why we wouldn't want to use LLVM:
> >> >> >>>
> >> >> >>> * LLVM wasn't built to understand structured CFG's, meaning that you
> >> >> >>> need to re-structurize it using a pass that's fragile and prone to
> >> >> >>> break if some other pass "optimizes" the shader in a way that makes it
> >> >> >>> non-structured (i.e. not expressible in terms of loops and if
> >> >> >>> statements). This loss of information also means that passes that need
> >> >> >>> to know things like, for example, the loop nesting depth need to do an
> >> >> >>> analysis pass whereas with NIR you can just walk up the control flow
> >> >> >>> tree and count the number of loops we hit.
> >> >> >>>
> >> >> >>
> >> >> >> LLVM has a pass to structurize the CFG.  We use it in the radeon
> >> >> >> drivers, and it is run after all of the other LLVM optimizations which have
> >> >> >> no concept of structured CFG.  It's not bug free, but it works really
> >> >> >> well even with all of the complex OpenCL kernels we throw at it.
> >> >> >>
> >> >> >> Your point about losing information when the CFG is de-structurized is
> >> >> >> valid, but for things like loop depth, I'm not sure why we couldn't write an
> >> >> >> LLVM analysis pass for this (if one doesn't already exist).
> >> >> >>
> >> >> >
> >> >> > I don't think this is such a big deal either.  At least the
> >> >> > structurization pass used on newer AMD hardware isn't "fragile" in the
> >> >> > way you seem to imply -- AFAIK (unlike the old AMDIL heuristic
> >> >> > algorithm) it's guaranteed to give you a valid structurized output no
> >> >> > matter what the previous optimization passes have done to the CFG,
> >> >> > modulo bugs.  I admit that the situation is nevertheless suboptimal.
> >> >> > Ideally this information wouldn't get lost along the way.  For the long
> >> >> > term we may want to represent structured control flow directly in the IR
> >> >> > as you say, I just don't see how reinventing the IR saves us any work if
> >> >> > we could just fix the existing one.
> >> >>
> >> >> It seems to me that something like how we represent control flow is a
> >> >> pretty fundamental part of the IR - it affects any optimization pass
> >> >> that needs to do anything beyond adding and removing instructions. How
> >> >> would you fix that, especially given that LLVM is primarily designed
> >> >> for CPU's where you don't want to be restricted to structured control
> >> >> flow at all? It seems like our goals (preserve the structure) conflict
> >> >> with the way LLVM has been designed.
> >> >>
> >> >
> >> > I think it's important to distinguish between LLVM IR and the tools
> >> > available to manipulate it.  LLVM IR is meant to be a platform
> >> > independent program representation.  There is nothing about the IR that
> >> > would prevent someone from using it for hardware that required structured
> >> > control flow.
> >>
> >> Right - when I said that structured control flow was a fundamental
> >> part of the IR, I meant that in the sense that it's a constraint that
> >> all optimization passes have to follow. I was also thinking of NIR,
> >> where it actually is a fundamental part of the IR datastructures - all
> >> control flow consists of a tree of loops, if statements, and basic
> >> blocks and there are no jump statements in the IR except for break,
> >> continue, and return. There are helpers to mutate the control flow
> >> tree (adding an if after an instruction, deleting a loop, etc.) so
> >> that you can more or less pretend you're operating on something like
> >> GLSL IR, while the CFG is being updated for you, basic blocks are
> >> being created and deleted, etc.
> >>
> >> >
> >> > The tools (mainly the optimization passes) are where decisions about
> >> > things like preserving structured control flow are made.  There are
> >> > currently two strategies available for using the tools to produce programs
> >> > with structured control flow:
> >> >
> >> > 1. Use the CFG structurizer pass
> >> >
> >> > 2. Only use transforms that maintain the structure of the control flow.
> >>
> >> I'm a little confused about how this strategy would work. I'm assuming
> >> that the control flow structure (i.e. the tree of loops and ifs) is
> >> stored in some kind of metadata or fake instruction on top of the IR -
> >> I haven't looked into this much, so correct me if I'm wrong. If so,
> >> wouldn't you still have to make every optimization pass that touches
> >> the CFG properly update that metadata to avoid it going stale, since
> >> the optimizations themselves are operating on a list of basic blocks
> >> which is a little lower-level?
> >>
> >
> > There is no CFG metadata.  If you want to collect some information about the
> > CFG, you would use an analysis pass to do this.  For example, LLVM has an
> > analysis pass for computing the dominator tree.  If an optimization
> > wants to use this analysis it would add this analysis as a pass dependency
> > and then LLVM would run the dominator tree analysis before the optimizations pass.
> >
> > Once the analysis has been run, the result is cached for other passes to use.
> > However, the base assumption is that optimization passes invalidate
> > all analysis information, so passes are required to report which analysis passes
> > or which features of the program are preserved.  So, if a pass reports
> > that it preserves the CFG, then the dominator tree analysis is still considered
> > valid.
> >
> > This a high level overview of how it works, but to get back to your question,
> > if you wanted to use strategy number 2, you could just choose to only run
> > optimizations that preserved the CFG.
> >
> > -Tom
> 
> Ah, I see, that makes sense. That does seem like a rather terrible
> solution though, since not being able to change the CFG seems rather
> harsh.
>

Yeah, that's why I listed it second ;)

-Tom
 
> >> >
> >> > -Tom
> >> >
> >> >> >
> >> >> >>> * LLVM doesn't do modifiers, meaning that we can't do optimizations
> >> >> >>> like "clamp(x, 0.0, 1.0) => mov.sat x" and "clamp(x, 0.25, 1.0) =>
> >> >> >>> max.sat(x, .25)" in a generic fashion.
> >> >> >>>
> >> >> >>
> >> >> >> The way to handle this with LLVM would be to add intrinsics to represent
> >> >> >> the various modifiers and then fold them into instructions during
> >> >> >> instruction selection.
> >> >> >>
> >> >> >
> >> >> > IMHO this is a feature.  One of the things I don't like about NIR is
> >> >> > that it's still vec4-centric.  Most drivers are going to want something
> >> >> > else and different to each other, we cannot please all of them with one
> >> >> > single vector addressing model built into the core instruction set, so
> >> >> > I'd rather have modifiers, writemasks and swizzles represented as the
> >> >> > composition of separate instructions/intrinsics with simple and
> >> >> > well-defined semantics, which can be coalesced back into the real
> >> >> > instruction as Tom says (easy even if you don't use LLVM's instruction
> >> >> > selector as long as it's SSA form).
> >> >>
> >> >> While NIR is vec4-centric, nothing's stopping you from splitting up
> >> >> instructions and doing optimizations at the scalar level for scalar
> >> >> ISA's - in fact, that's what I expect to happen. And for backends that
> >> >> really do need to have swizzles and writemasks, coalescing these
> >> >> things back into the original instruction is not at all trivial - in
> >> >> fact, going into and out of SSA without introducing extra copies even
> >> >> in situations like:
> >> >>
> >> >> foo.xyz = ...
> >> >> ... = foo
> >> >> foo.x = ...
> >> >>
> >> >> is a problem that hasn't been solved yet publicly (it seems doable,
> >> >> but difficult). So while we might not need swizzles and writemasks for
> >> >> most backends, for the few that do need it (like, for example, the
> >> >> i965 vec4 backend) it will be very nice to have one common lowering
> >> >> pass that solves this hard problem, which would be impossible to do
> >> >> without having swizzles and writemasks in the IR. And it's very likely
> >> >> that these backends, which probably aren't using SSA due to the
> >> >> aforementioned difficulties, will also benefit from having modifiers
> >> >> already folded for them - this is something that's already a problem
> >> >> for i965 vec4 backend and that NIR will help a lot.
> >> >>
> >> >> >
> >> >> >>> * LLVM is hard to embed into other projects, especially if it's used
> >> >> >>> as anything but a command-line tool that only runs once. See, for
> >> >> >>> example, http://blog.llvm.org/2014/07/ftl-webkits-llvm-based-jit.html
> >> >> >>> under "Linking WebKit with LLVM" - most of those problems would also
> >> >> >>> apply to us.
> >> >> >>>
> >> >> >>
> >> >> >> You have to keep in mind that the way webkit uses LLVM is totally
> >> >> >> different than how Mesa would use LLVM if LLVM IR was adopted as a
> >> >> >> common IR.
> >> >> >>
> >> >> >> webkit is using LLVM as a full JIT compiler, which means it depends
> >> >> >> on almost all of the pieces of the LLVM stack, the IR manipulation,
> >> >> >> optimization passes, one or more of the code gen backends, as well
> >> >> >> as the entire JIT layer.  The JIT layer in particular is missing a lot of
> >> >> >> functionality in the C API, which makes it more difficult to work with.
> >> >> >>
> >> >> >> If Mesa were to adopt LLVM IR as a common IR, the only LLVM library
> >> >> >> functionality it would need would be the IR manipulation and the
> >> >> >> optimizations passes.
> >> >> >>
> >> >> >>> * LLVM is on a different release schedule (6 months vs. 3 months), has
> >> >> >>> a different review process, etc., which means that to add support for
> >> >> >>> new functionality that involves shaders, we now have to submit patches
> >> >> >>> to two separate projects, and then 2 months later when we ship Mesa it
> >> >> >>> turns out that nobody can actually use the new feature because it
> >> >> >>> depends upon an unreleased version of LLVM that won't be released for
> >> >> >>> another 3 months and then packaged by distros even later... we've
> >> >> >>> already had problems where distros refused to ship newer Mesa releases
> >> >> >>> because radeon depended on a version of LLVM newer than the one they
> >> >> >>> were shipping, and if we started using LLVM in core Mesa it would get
> >> >> >>> even worse. Proprietary drivers solve this problem by just forking
> >> >> >>> LLVM, building it with the rest of their driver, and linking it in as
> >> >> >>> a static library, but distro packagers would hate us if we did that.
> >> >> >>>
> >> >> >>
> >> >> >> If Mesa were using LLVM IR as a common IR I'm not sure what features
> >> >> >> in Mesa would be tied to new additions in LLVM.  As I said before,
> >> >> >> all Mesa would be using would be the IR manipulations and the
> >> >> >> optimization passes.  The IR manipulations only require new features
> >> >> >> when something new is added to LLVM IR specification, which is rare.
> >> >> >> It's possible there could be some lag in new features that go into
> >> >> >> the optimization passes, but if there was some optimization that was
> >> >> >> deemed really critical, it could be implemented in Mesa using the IR
> >> >> >> manipulators.
> >> >> >>
> >> >> >> -Tom
> >> >> >>
> >> >> >>> I wouldn't completely rule out LLVM, and I do think they do a lot of
> >> >> >>> things right, but for now it seems like it's not the path that the
> >> >> >>> Intel team wants to take.
> >> >> >>>
> >> >> >>> Connor
> >> >> >>> _______________________________________________
> >> >> >>> mesa-dev mailing list
> >> >> >>> mesa-dev at lists.freedesktop.org
> >> >> >>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >> >> >> _______________________________________________
> >> >> >> mesa-dev mailing list
> >> >> >> mesa-dev at lists.freedesktop.org
> >> >> >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev