time for amber2 branch?

Thu Jun 20 18:46:21 UTC 2024

On 19/06/2024 20:34, Mike Blumenkrantz wrote:
 > Terakan is not a Mesa driver, and Mesa has no obligation to cater to 
out-of-tree projects which use its internal API. For everything else, 
see above.

I don't think, however, that it can simply be dismissed like it doesn't 
exist when it's:
  • striving to become a part of Mesa among the "cool" drivers with 
broad extension support like RADV, Anvil, Turnip, and now NVK;
  • actively developed nearly every day (albeit for around 2 hours per 
day on average because it's a free time project);
  • trying to explore horizons Mesa hasn't been to yet (submitting 
hardware commands directly on Windows).

As for R600g, it's one thing to drop the constraints imposed by some 
Direct3D 9 level GPUs that, for instance, don't even support integers in 
shaders or something like that (if that's even actually causing issues 
in reality that slow down development of everything else significantly — 
the broad hardware support is something that I absolutely LOVE Mesa and 
overall open source infrastructure for, and I think that's the case for 
many others too), but here we're talking about Direct3D 11 (or 10, but 
programmed largely the same way) class hardware with OpenGL 4.5 already 
supported, and 4.6 being straightforward to implement.

This means that, with the exception of OpenCL-specific global addressing 
issues (R9xx can have a 4 GB "global memory" binding though possibly), 
the interface contract between Gallium's internals and R600g shouldn't 
differ that much from that of the more modern drivers — the _hardware_ 
architecture itself doesn't really warrant dropping active support in 
common code.

Incidents like one change suddenly breaking vertex strides are thus 
mainly a problem in how _the driver itself_ is written, and that's of 
course another story… While I can't say much about Gallium interactions 
specifically, I keep encountering more and more things that are 
unhandled or broken in how the driver actually works with the GPU, and 
there are many Piglit tests that fail. I can imagine the way R600g is 
integrated into Gallium isn't in a much better state.

So I think it may make sense (even though I definitely don't see any 
serious necessity) to **temporarily** place R600g in a more stable 
environment where regressions in it are less likely to happen, but then 
once it's brought up to modern Mesa quality standards, and when it 
becomes more friendly to the rest of Mesa, to **move it back** to the 
main branch (but that may stumble upon a huge lot of interface version 
conflicts, who knows). Some of the things we can do to clean it up are:

  • Make patterns of interaction with other subsystems of Gallium more 
similar to those used by other drivers. Maybe use RadeonSI as the 
primary example because of their shared roots.
  • Fix some GPU configuration bugs — that I described in my previous 
message, as well as some other ones, such as these small ones:
    • Emit all viewports and scissors at once without using the dirty 
mask because the hardware requires that (already handled years ago in 
RadeonSI).
    • Fix gl_VertexID in indirect draws — the DRAW_INDIRECT packets 
write the base to SQ_VTX_BASE_VTX_LOC, which has an effect on vertex 
fetch instructions, but not on the vertex ID input; instead switch from 
SQ_VTX_FETCH_VERTEX_DATA to SQ_VTX_FETCH_NO_INDEX_OFFSET, and COPY_DW 
the base to VGT_INDX_OFFSET.
    • Properly configure the export format of the pixel shader DB export 
vector (gl_FragDepth, gl_FragStencilRefARB, gl_SampleMask).
    • Investigate how queries currently work if the command buffer was 
split in the middle of a query, add the necessary stitching where needed.
  • Make Piglit squeal less. I remember trying to experiment with 
glDispatchComputeIndirect, only to find out that the test I wanted to 
run to verify my solution was broken for another reason. Oink oink.
  • If needed, remove the remaining references to TGSI enums, and also 
switch to the NIR transform feedback interface that, as far as I 
understand, is compatible with Nine and D3D10 frontends (or maybe it's 
the other way around (= either way, make that consistent).
  • Do some cleanup in common areas:
    • Register, packet and shader structures can be moved to JSON 
definitions similar to those used for GCN/RDNA, but with more clear 
indication of the architecture revisions they can be used on (without 
splitting into r600d.h and evergreend.h). I've already stumbled upon a 
typo in that probably hand-written S_/G_/C_ #define soup that has caused 
weird Vulkan CTS failures once, specifically in 
C_028780_BLEND_CONTROL_ENABLE in evergreend.h, and who knows what other 
surprises may be there. Some fields there are apparently just for the 
wrong architecture revisions (though maybe actually present, but 
undocumented, I don't know, given the [RESERVED] situation with the 
documentation for anisotropic filtering and maybe non-1D/2D_THIN tiling 
modes, for example, and that we have the reference for the 3D registers, 
but not for compute).
    • A lot of format information can be shared between vertex fetch, 
texture fetch, and color/storage attachments. I'm finishing writing some 
common format code for Terakan currently, that may be adopted by R600g.
    • Carefully make sure virtual memory is properly supported in all 
places on R9xx (using virtual addresses, and not emitting relocation 
NOPs that are harmless but wasteful — moreover, this part deserves some 
common function that will make it easier to port R600g to other 
platforms, such as by making it write D3DKMTRender patch locations on 
Windows).
  • Unify R6xx/R7xx and R8xx/R9xx code wherever possible. There's 
r600_state.c that is over 100 KB large, and evergreen_state.c that's 
even bigger, but in many places it's just the same code, just including 
r600d.h in one file and evergreend.h in another — and how much technical 
debt we already have in the R6xx/R7xx code is an interesting question. 
To me, there doesn't seem to be any necessity to abandon R6xx/R7xx 
support completely currently considering that the programming 
differences from R8xx/R9xx are pretty minor. At least as long as someone 
occasionally runs tests on the older generations.

Maybe that will involve some small-scale changes, maybe that will end up 
being more like a rewrite, but still it's totally possible that R600g 
may have a new beginning at this point, especially with Gert Wollny's 
compiler, and me visiting every aspect of the interface of those GPUs, 
rather than an ending. At some point we may even start exposing 
R600-specific functionality such as D3DFMT_D24FS8 in Gallium Nine on 
R6xx/R7xx.

However, I don't like the whole idea of moving drivers away from the 
main branch because that affects not only development, but also users of 
Mesa. It'd be necessary to ensure that Linux distribution maintainers 
are well-notified of the new branch, but even then that may still cause 
issues. Like, what if the amber2 drivers end up in a separate package in 
a distribution — and that'll possibly mean that after some `apt-get 
dist-upgrade`, users will suddenly lose GPU acceleration on their 
systems for an unobvious reason. And we definitely shouldn't be 
underestimating the number of users of that old hardware outside Linux 
developer circles — especially TeraScale (I think Firefox regularly gets 
issue reports from Nvidia Rankine/Curie users?) I occasionally see 
people on Reddit and other platforms discussing the status of Terakan, 
and I'd expect that the people who talk about some software are just a 
small fraction of those who use it at all. And sometimes weird things 
just happen like Bringus Studios bringing up a Xi3 Piston out of 
semi-vaporware nowhere…

Regarding CI, I can't promise anything right now, but I think that's not 
an unsolvable issue. Overall just one machine with a Trinity APU, an 
R6xx/R7xx card, and an R8xx card (one of them preferably being RV670, 
RV770, or Cypress/Hemlock, to be able to test co-issuing of float64 
instructions with a transcendental one when that's implemented) likely 
should cover most of our regression testing needs — at least in Gallium 
interaction most definitely.

Terakan development will surely continue being based on the main branch, 
partly because the original reason behind the split suggestion mostly 
doesn't apply to it. I do need recent Vulkan headers and all the WSI 
improvements at the very least — and there are areas where Terakan 
itself may contribute something new to the common Vulkan runtime code. I 
already have some WSI-demanded binary-over-timeline sync type 
enhancements on my branch, and if my Windows experiments go forward, 
there will likely be a lot of what can be added to the common code, such 
as WDDM 1 synchronization primitives (even though WDDM 2's timeline 
semaphores aka monitored fences are more important to modern drivers, 
there's no WDDM 2 on Windows older than 10), as well as paths for 
zero-copy presentation (primarily for WDDM 1 level configurations — like 
via sharing images with Direct3D 10/11, or with OpenGL to take advantage 
of the "exclusive borderless" driver hack, or maybe even via 
D3DKMTPresent where possible).

On 20/06/2024 20:30, Adam Jackson wrote:
 > We're using compute shaders internally in more and more ways, for 
example, maybe being able to assume them would be a win.

I'd imagine that compute shader usage scenarios in common Gallium code 
are optional, and depending on the hardware, compute shaders can even be 
the less optimal approach to things like image copying/resolving (where 
specialized copy hardware is available) from the perspective of 
performance or maybe format support (early, or maybe actually all, I 
don't know for sure yet, AMD R8xx hardware, for instance, hangs with 
linear storage images according to one comment in R800AddrLib, and 
that's why a quad with a color target may be preferable for copying — 
and it also has fast resolves inside its color buffer hardware, as well 
as a DMA engine).

— Triang3l