Replacing NIR with SPIR-V?

Sun Jan 23 16:13:40 UTC 2022

On Sun, Jan 23, 2022 at 1:58 PM Abel Bernabeu
<abel.bernabeu at esperantotech.com> wrote:
>>
>> Yes, NIR arrays and struct and nir_deref to deal with them but, by the time you get into the back-end, all the nir_derefs are gone and you're left with load/store messages with actual addresses (either a 64-bit memory address or a index+offset pair for a bound resource).  Again, unless you're going to dump straight into LLVM, you really don't want to handle that in your back-end unless you really have to.
>
>
> That is the thing: there is already a community maintained LLVM backend for RISC-V and I need to see how to get value from that effort. And that is a very typical escenario for new architectures. There is already an LLVM backend for a programmable device and someone asks: could you do some graphics around this without spending millions?
>
> Then your options as an engineer are:
>
> - Use Mesa as a framework and translate NIR to assembly (most likely choice).
>
> - Use Mesa as a framework and translate NIR to LLVM IR with some intrinsics, then feed the pre-existing LLVM backend.

Using the pre-existing backend probably isn't a real option, because
it's designed for different things. The biggest hurdle is that for
many years now, vendors have realized that SIMT-style parallelism is
the most appropriate for GPUs, and in order to do SIMT effectively
your whole compiler stack has to be aware of it, from the frontend to
the backend. There are two "views" of your program, the "thread-level
view" which is what the programmer wrote where SIMD lanes are separate
threads and you're specifying what happens to one thread, and the
"wave-level view" which is what the machine actually executes. In the
backend you have to be aware of both the thread-level view and
wave-level view of your program in order to effectively register
allocate, and that's something that LLVM's backend infrastructure just
can't do. AMDGPU has some hacks, but they're not 100% effective and to
use them in RISC-V you'd probably have to rewrite the whole backend
anyway. For example, in the AMDGPU backend vector registers aren't
exposed as actual vector registers to LLVM's machinery, but it still
models control-flow at the "wave-level" which causes some inaccuracy
in liveness. So, the existing investment isn't really worth as much as
you think it is.

>
> - Use some new alternative, possibly a Mesa fork relying on the Khronos SPIR-V to LLVM IR translator. Start fixing the tool for supporting graphics... Make SPIR-V the IR that communicates frontend and backend :-)
>
> I am not thinking in terms of what is best for Mesa, but in terms of how could the RISC-V community organize its effort given that an LLVM backend is a given thing.
>
> I see the current reasons why NIR is preferred over SPIR-V in Mesa. So far you have given me three
>
> - There is a well designed library for traversing NIR, whereas SPIR-V defines nothing.
> - The arrays and structs are lowered before the shader is passed to the backend.
> - You see SPIR-V as a "serializing" format for IR to be exchanged through the network (like a .PNG for shaders), whereas NIR's focus is more about how the data structures are represented in memory while in use.
>
> My takeaway messages are two:
>
> - Advise to support NIR on the RISC-V plan.

Sure.

> - If I have a chance, suggest to Khronos making SPIR-V more like NIR, so in the future it is considered beyond a serializing format.

No, this is a terrible idea. Serialization formats like SPIR-V and IRs
intended to be consumed by a backend like NIR (and LLVM) have very
different needs - SPIR-V has to be backwards compatible and have very
broad support, NIR has to represent lower-level constructs that SPIR-V
doesn't, etc. If you created an in-memory IR datastructure that
adhered slavishly to SPIR-V you'd have a terrible replacement for NIR,
and that's just by necessity - they're solving different problems.

There's some history to this, because before SPIR-V there was SPIR
which was just LLVM bitcode with some graphics stuff on top. It failed
for the reasons above and was abandoned, and that's why we have SPIR-V
today.

>
> Thanks for your comments so far.
>
>
>
> On Fri, Jan 21, 2022 at 4:24 AM Jason Ekstrand <jason at jlekstrand.net> wrote:
>>
>> On Thu, Jan 20, 2022 at 5:49 PM Abel Bernabeu <abel.bernabeu at esperantotech.com> wrote:
>>>
>>> In principle, all the properties you highlight in your blog as key points of NIR also apply to SPIR-V.
>>
>>
>> First off, that blog post is truly ancient.  Based on the quote from nir_opt_algebraic.c, it looks like less than 6 months after the original NIR patches landed which puts it at 5-6 years old.  A lot has changed since then.
>>
>>>
>>> I was curious to know where in the details that I miss, NIR starts shining as a more suitable IR than SPIR-V for the task of communicating front-end and back-end. By the way, thanks for putting together that blog post.
>>
>>
>> In terms of what they're capable of communicating, yes, SPIR-V and NIR can express many of the same things.  But that's not the point.  The point is that there's a lot that happens between coming out of GLSL or SPIR-V and going into the back-end.  A lot of what we do with NIR is share as much of that lowering across drivers as possible.  Yes, we could convert back to SPIR-V before going into back-ends but there's really no point since they need their own IRs anyway.  If you're dumping straight into LLVM or similar, then maybe you don't need any of that, but if you're building a custom back-end, you really want to let NIR do that lowering and you don't want to handle it all on your own.
>>
>>>
>>> As it seems clear that the NIR question is well settled within the mesa community and I really see value in having mesa drivers, I promise to pay as much attention to the NIR use cases as I did with SPIR-V :-)
>>>
>>> By the way, we are not planning on supporting with specific RISC-V instructions everything that has an instruction on SPIR-V. Regarding the two areas you mention:
>>>
>>> - Arrays and structs: SPIR-V's OpAccessChain would need to be processed by a backend and translated to pointer arithmetic plus dereferencing (kind of the same thing as having to process a nir_deref). This translation can be done in RISC-V with no issue, whether it is OpAccessChain or nir_deref.
>>
>>
>> A big part of the point of NIR is to get rid of these things so that drivers don't have to deal with them.  Yes, NIR arrays and struct and nir_deref to deal with them but, by the time you get into the back-end, all the nir_derefs are gone and you're left with load/store messages with actual addresses (either a 64-bit memory address or a index+offset pair for a bound resource).  Again, unless you're going to dump straight into LLVM, you really don't want to handle that in your back-end unless you really have to.
>>
>> Over-all, I think you're asking the wrong set of questions.  If you're trying to understand Mesa GPU compilers, looking at NIR from documentation and blog posts and comparing with SPIR-V is likely to raise more questions than answers.  I would instead recommend looking at an actual driver and seeing how things flow through the compiler stack.  That's likely to teach you a lot more about how the Mesa compiler stack works than reading blogs.  That, or start implementing a NIR back-end and see what you run into and ask questions on #dri-devel.
>>
>> --Jason
>>
>>
>>>
>>> - Trigonometric operations: personally I consider that only "sin" and "cos" are needed additions for RISC-V. Unclear what precision yet, likely 8 bits, for serving as initial value for a Newton-Rapson style computation.
>>>
>>> Regards.
>>>
>>> On Thu, Jan 20, 2022 at 2:36 AM Jason Ekstrand <jason at jlekstrand.net> wrote:
>>>>
>>>> > - Does it make sense to move to SPIR-V?
>>>>
>>>> None whatsoever.  SPIR-V is an interchange format, not a set of manipulatable data structures suitable for compiler lowering and optimization.
>>>>
>>>> You also don't want to build hardware around consuming SPIR-V.  There are lots of things that the SPIR-V has which you wouldn't want to support natively in hardware such as structures and arrays in SSA values or complex trig ops like atan2().  Part of the purpose of NIR is to lower these things to simpler constructs which are supported in native hardware.
>>>>
>>>> --Jason
>>>>
>>>> On Wed, Jan 19, 2022 at 7:17 PM Abel Bernabeu <abel.bernabeu at esperantotech.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> My name Abel Bernabeu and I currently chair the Graphics and ML Special Interest Group within RISC-V.
>>>>>
>>>>> As part of my work for RISC-V I am currently looking at what is needed for supporting a graphics product that uses a (potentially extended) RISC-V ISA for its shading cores. My initial focus has been on analyzing the functional gap between RISC-V and SPIR-V, assuming that whatever is needed for a modern graphics accelerator is inevitably present on SPIR-V.
>>>>>
>>>>> Now, the thing is that most of the potential adopters on our committee will likely be interested in using mesa for developing their drivers and that means using NIR as intermediate representation. Thus, I also need to consider NIR when looking at the functional gap, doubling the amount of work during the analysis.
>>>>>
>>>>> Why is mesa using NIR as intermediate representation rather than SPIR-V? It would make my life easier if mesa used SPIR-V rather than NIR for communicating the front-end and the backends.
>>>>>
>>>>> I know it is a lot of work to migrate to SPIR-V, but I am interested in knowing what is the opinion of the mesa developers:
>>>>>
>>>>> - My understanding is that when mesa adopted NIR, there was no SPIR-V. Was a comparison made after the SPIR-V ratification?
>>>>>
>>>>> - Does it make sense to move to SPIR-V?
>>>>>
>>>>> - Is it feasible in terms of functionality supported by SPIR-V?
>>>>>
>>>>> - Is the cost worth the potential advantage of using a more commonly adopted standard?
>>>>>
>>>>> Thanks in advance for your time and thoughts.
>>>>>
>>>>> Regards.