[Mesa-dev] [RFC PATCH 00/65] ARB_bindless_texture for RadeonSI
Samuel Pitoiset
samuel.pitoiset at gmail.com
Fri May 19 16:52:05 UTC 2017
Hi,
This series implements ARB_bindless_texture for RadeonSI.
Reminder: the GLSL compiler part is already upstream.
This series has been mainly tested with Feral games, here's the list of
existing games that use ARB_bindless_texture (though not by default):
- DXMD
- Hitman
- Dirt Rally
- Mad Max
Today, Feral announced "Warhammer 40,000: Dawn of War III" (called DOW3) which
is going to be released next month. This game *requires* ARB_bindless_texture,
that now explains why I did all this work. :-) So, we have ~3 weeks for merging
this whole series. It would be very nice to have DOW3 support at day one!
=== Tracking bindless problems ===
The following games have been successfully tested:
- Dirt Rally
- Hitman
- Mad Max
- DOW3
For these:
- No rendering issues
- No VM faults (ie. amdgpu.vm_debug=1)
However, DXMD is currently broken because the bindless_sampler layout qualifier
is missing, which ends up by reporting a ton of INVALID_OPERATION errors. Note
that Feral implemented bindless support against NV_bindless_texture and not
ARB_bindless_texture. The main difference is that bindless_sampler is implicit
for NV_* while it's required for ARB_*. Feral plan to fix this soon.
All ARB_bindless_texture piglit tests pass with this series.
=== Tracking regressions/changes ===
- No regressions with the Intel CI system
- One piglit regression that needs to be fixed
(arb_texture_multisample-sample-position)
- No shader-db changes
- No CPU overhead (glxgears and Heaven in low)
=== Performance results for DOW3 ===
DOW3 exposes two bindless texture modes:
- mode 1: all bindless (ie. no bound samplers)
- mode 2: bound/bindless (ie. only bindless when the limit is reached)
CPU: Intel(R) Core(TM) i5-4460 CPU @ 3.20GHz
NVIDIA blob: 381.22
== GTX 1060 ==
LOW:
- mode 1: 89 FPS
- mode 2: 51 FPS
MEDIUM:
- mode 1: 49 FPS
- mode 2: 28 FPS
HIGH:
- mode 1: 32 FPS
- mode 2: 19 FPS
The GTX 1060 performs very well with the all bindless mode (default), while
the bound/bindless mode is not good at all.
== RX480 ==
LOW:
- mode 1: 67 FPS (-32%)
- mode 2: 75 FPS (+32%)
MEDIUM:
- mode 1: 38 FPS (-28%)
- mode 2: 44 FPS (+57%)
HIGH:
- mode 1: 26 FPS (-23%)
- mode 2: 29 FPS (+52%)
The RX 480 performs very well with the bound/bindless mode (default), while
the all bindless mode still has to be improved.
The most important bottleneck with the all bindless mode is the number of
buffers that have to be added for every command stream. The overhead in the
winsys and in the kernel (amdgpu_cs_ioctl) becomes important in this situation.
This mode is still clearly CPU bound and should be improved (see the "Future
work" section).
Btw, without any optimisations, it was around 35FPS in low (mode 1).
=== Performance results for other Feral titles ===
I didn't record any numbers because these games have been initially
developed/tested against the NVIDIA blob which it's unaffected by a VERY huge
number of resident handles. While the AMD stack is really slow in this
situation. Though, as I said, all Feral games that use bindless work fine, we
just need to improve perf on both sides.
=== Future work ===
I have some ideas to try in order to improve performance with RadeonSI. I will
work on this once this series is upstream.
Please review,
Thanks!
Samuel Pitoiset (65):
mapi: add GL_ARB_bindless_texture entry points
mesa: implement ARB_bindless_texture
mesa: add support for unsigned 64-bit vertex attributes
mesa: add support for glUniformHandleui64*ARB()
mesa: refuse to update sampler parameters when a handle is allocated
mesa: refuse to update tex parameters when a handle is allocated
mesa: refuse to change textures when a handle is allocated
mesa: refuse to change tex buffers when a handle is allocated
mesa: keep track of the current variable in add_uniform_to_shader
mesa: store bindless samplers as PROGRAM_UNIFORM
mesa: add infrastructure for bindless samplers/images bound to units
glsl: process uniform samplers declared bindless
glsl: process uniform images declared bindless
glsl: pass the ir_variable object to set_opaque_binding()
glsl: set the explicit binding value for bindless samplers/images
glsl: add ir_variable::is_bindless()
mesa: add update_single_shader_texture_used() helper
mesa: add update_single_program_texture_state() helper
mesa: update textures for bindless samplers bound to texture units
mesa: pass gl_program to _mesa_associate_uniform_storage()
mesa: associate uniform storage to bindless samplers/images
mesa: handle bindless uniforms bound to texture/image units
mesa: get rid of a workaround for bindless in _mesa_get_uniform()
gallium: add PIPE_CAP_BINDLESS_TEXTURE
gallium: add ARB_bindless_texture interface
ddebug: add ARB_bindless_texture support
trace: add ARB_bindless_texture support
tc: add ARB_bindless_texture support
tgsi: add new Bindless flag to tgsi_instruction_texture
tgsi: add new Bindless flag to tgsi_instruction_memory
tgsi/ureg: accept TGSI_FILE_{CONSTANT,INPUT} for dst registers
st/glsl_to_tgsi: add support for bindless samplers
st/glsl_to_tgsi: add support for bindless images
st/glsl_to_tgsi: add support for bindless pack/unpack operations
st/glsl_to_tgsi: teach the DCE pass about bindless samplers/images
st/glsl_to_tgsi: teach rename_temp_registers() about bindless samplers
tgsi/scan: record bindless samplers/images usage
st/mesa: implement ARB_bindless_texture
st/mesa: make update_single_texture() non-static
st/mesa: make convert_sampler_from_unit() non-static
st/mesa: add st_convert_image_from_unit() helper
st/mesa: add st_create_{texture,image}_handle_from_unit() helper
st/mesa: add infrastructure for storing bound texture/image handles
st/mesa: make bindless samplers/images bound to units resident
st/mesa: do not release sampler views for resident textures
st/mesa: disable per-context seamless cubemap when using texture
handles
st/mesa: enable ARB_bindless_texture
radeonsi: add a slab allocator for resident descriptors
radeonsi: add si_init_descriptor_list() helper
radeonsi: add si_set_sampler_view_desc() helper
radeonsi: add si_set_shader_image_desc() helper
radeonsi: implement ARB_bindless_texture
radeonsi: add all resident buffers to the current CS
radeonsi: only add descriptors in presence of resident handles
radeonsi: add si_update_check_render_feedback() helper
radeonsi: decompress DCC for resident textures/images
radeonsi: decompress resident textures/images before graphics/compute
radeonsi: isolate real framebuffer changes from the decompression
passes
radeonsi: track use of bindless samplers/images from tgsi_shader_info
radeonsi: only decompress resident textures/images when used
radeonsi: upload new descriptors when resident buffers are invalidated
radeonsi: invalidate buffers which are made resident if needed
radeonsi: add support for loading bindless samplers
radeonsi: add support for loading bindless images
radeonsi: enable ARB_bindless_texture
docs/features.txt | 2 +-
docs/relnotes/17.2.0.html | 1 +
src/compiler/glsl/ir.h | 11 +
src/compiler/glsl/ir_uniform.h | 12 +
src/compiler/glsl/link_uniform_initializers.cpp | 42 +-
src/compiler/glsl/link_uniforms.cpp | 156 +++-
src/compiler/glsl/shader_cache.cpp | 47 +
src/gallium/auxiliary/tgsi/tgsi_build.c | 8 +
src/gallium/auxiliary/tgsi/tgsi_scan.c | 37 +
src/gallium/auxiliary/tgsi/tgsi_scan.h | 2 +
src/gallium/auxiliary/tgsi/tgsi_ureg.c | 21 +-
src/gallium/auxiliary/tgsi/tgsi_ureg.h | 16 +-
src/gallium/auxiliary/util/u_threaded_context.c | 147 ++++
.../auxiliary/util/u_threaded_context_calls.h | 4 +
src/gallium/docs/source/screen.rst | 2 +
src/gallium/drivers/ddebug/dd_context.c | 61 ++
src/gallium/drivers/etnaviv/etnaviv_screen.c | 1 +
src/gallium/drivers/freedreno/freedreno_screen.c | 1 +
src/gallium/drivers/i915/i915_screen.c | 1 +
src/gallium/drivers/llvmpipe/lp_screen.c | 1 +
src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 +
src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 +
src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 +
src/gallium/drivers/r300/r300_screen.c | 1 +
src/gallium/drivers/r600/r600_pipe.c | 1 +
src/gallium/drivers/radeon/r600_pipe_common.h | 4 +
src/gallium/drivers/radeonsi/si_blit.c | 131 ++-
src/gallium/drivers/radeonsi/si_compute.c | 2 +
src/gallium/drivers/radeonsi/si_compute.h | 14 +
src/gallium/drivers/radeonsi/si_descriptors.c | 943 +++++++++++++++++++--
src/gallium/drivers/radeonsi/si_hw_context.c | 1 +
src/gallium/drivers/radeonsi/si_pipe.c | 25 +
src/gallium/drivers/radeonsi/si_pipe.h | 68 ++
src/gallium/drivers/radeonsi/si_shader.h | 12 +
src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c | 48 +-
src/gallium/drivers/radeonsi/si_state.c | 10 +-
src/gallium/drivers/radeonsi/si_state.h | 9 +
src/gallium/drivers/softpipe/sp_screen.c | 1 +
src/gallium/drivers/svga/svga_screen.c | 1 +
src/gallium/drivers/swr/swr_screen.cpp | 1 +
src/gallium/drivers/trace/tr_context.c | 114 +++
src/gallium/drivers/vc4/vc4_screen.c | 1 +
src/gallium/drivers/virgl/virgl_screen.c | 1 +
src/gallium/include/pipe/p_context.h | 16 +
src/gallium/include/pipe/p_defines.h | 1 +
src/gallium/include/pipe/p_shader_tokens.h | 6 +-
src/mapi/glapi/gen/ARB_bindless_texture.xml | 100 +++
src/mapi/glapi/gen/Makefile.am | 1 +
src/mapi/glapi/gen/apiexec.py | 3 +
src/mapi/glapi/gen/gl_API.xml | 4 +-
src/mapi/glapi/gen/gl_genexec.py | 1 +
src/mesa/Makefile.sources | 2 +
src/mesa/main/api_loopback.c | 18 +
src/mesa/main/api_loopback.h | 6 +
src/mesa/main/bufferobj.c | 4 +-
src/mesa/main/context.c | 3 +
src/mesa/main/dd.h | 19 +
src/mesa/main/mtypes.h | 86 ++
src/mesa/main/samplerobj.c | 48 ++
src/mesa/main/shared.c | 12 +
src/mesa/main/tests/dispatch_sanity.cpp | 18 +
src/mesa/main/teximage.c | 25 +-
src/mesa/main/texobj.c | 12 +
src/mesa/main/texparam.c | 61 ++
src/mesa/main/texstate.c | 52 +-
src/mesa/main/texturebindless.c | 902 ++++++++++++++++++++
src/mesa/main/texturebindless.h | 96 +++
src/mesa/main/uniform_query.cpp | 208 ++++-
src/mesa/main/uniforms.c | 119 ++-
src/mesa/main/uniforms.h | 16 +
src/mesa/main/varray.c | 23 +
src/mesa/main/varray.h | 3 +
src/mesa/main/vtxfmt.c | 4 +
src/mesa/program/ir_to_mesa.cpp | 36 +-
src/mesa/program/ir_to_mesa.h | 4 +-
src/mesa/program/program.c | 8 +
src/mesa/state_tracker/st_atifs_to_tgsi.c | 2 +-
src/mesa/state_tracker/st_atom_constbuf.c | 6 +
src/mesa/state_tracker/st_atom_image.c | 33 +-
src/mesa/state_tracker/st_atom_sampler.c | 32 +-
src/mesa/state_tracker/st_atom_texture.c | 15 +-
src/mesa/state_tracker/st_cb_texture.c | 84 ++
src/mesa/state_tracker/st_context.c | 2 +
src/mesa/state_tracker/st_context.h | 11 +
src/mesa/state_tracker/st_extensions.c | 1 +
src/mesa/state_tracker/st_glsl_to_nir.cpp | 3 +-
src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 138 ++-
src/mesa/state_tracker/st_mesa_to_tgsi.c | 2 +-
src/mesa/state_tracker/st_pbo.c | 2 +-
src/mesa/state_tracker/st_sampler_view.c | 6 +
src/mesa/state_tracker/st_shader_cache.c | 3 +-
src/mesa/state_tracker/st_texture.c | 213 +++++
src/mesa/state_tracker/st_texture.h | 28 +
src/mesa/vbo/vbo_attrib_tmp.h | 28 +
src/mesa/vbo/vbo_context.h | 2 +
src/mesa/vbo/vbo_exec_api.c | 15 +-
src/mesa/vbo/vbo_save_api.c | 3 +
97 files changed, 4250 insertions(+), 260 deletions(-)
create mode 100644 src/mapi/glapi/gen/ARB_bindless_texture.xml
create mode 100644 src/mesa/main/texturebindless.c
create mode 100644 src/mesa/main/texturebindless.h
--
2.13.0
More information about the mesa-dev
mailing list