[Mesa-dev] [RFC PATCH 00/65] ARB_bindless_texture for RadeonSI

Samuel Pitoiset samuel.pitoiset at gmail.com
Fri May 19 16:52:05 UTC 2017


Hi,

This series implements ARB_bindless_texture for RadeonSI.

Reminder: the GLSL compiler part is already upstream.

This series has been mainly tested with Feral games, here's the list of
existing games that use ARB_bindless_texture (though not by default):

- DXMD
- Hitman
- Dirt Rally
- Mad Max

Today, Feral announced "Warhammer 40,000: Dawn of War III" (called DOW3) which
is going to be released next month. This game *requires* ARB_bindless_texture,
that now explains why I did all this work. :-) So, we have ~3 weeks for merging
this whole series. It would be very nice to have DOW3 support at day one!

=== Tracking bindless problems ===

The following games have been successfully tested:

- Dirt Rally
- Hitman
- Mad Max
- DOW3

For these:

- No rendering issues
- No VM faults (ie. amdgpu.vm_debug=1)

However, DXMD is currently broken because the bindless_sampler layout qualifier
is missing, which ends up by reporting a ton of INVALID_OPERATION errors. Note
that Feral implemented bindless support against NV_bindless_texture and not
ARB_bindless_texture. The main difference is that bindless_sampler is implicit
for NV_* while it's required for ARB_*. Feral plan to fix this soon.

All ARB_bindless_texture piglit tests pass with this series.

=== Tracking regressions/changes ===

- No regressions with the Intel CI system
- One piglit regression that needs to be fixed
  (arb_texture_multisample-sample-position)
- No shader-db changes
- No CPU overhead (glxgears and Heaven in low)

=== Performance results for DOW3 ===

DOW3 exposes two bindless texture modes:
- mode 1: all bindless (ie. no bound samplers)
- mode 2: bound/bindless (ie. only bindless when the limit is reached)

CPU: Intel(R) Core(TM) i5-4460  CPU @ 3.20GHz
NVIDIA blob: 381.22

== GTX 1060 ==

LOW:
 - mode 1: 89 FPS
 - mode 2: 51 FPS

MEDIUM:
 - mode 1: 49 FPS
 - mode 2: 28 FPS

HIGH:
 - mode 1: 32 FPS
 - mode 2: 19 FPS

The GTX 1060 performs very well with the all bindless mode (default), while
the bound/bindless mode is not good at all.

== RX480 ==

LOW:
 - mode 1: 67 FPS (-32%)
 - mode 2: 75 FPS (+32%)

MEDIUM:
 - mode 1: 38 FPS (-28%)
 - mode 2: 44 FPS (+57%)

HIGH:
 - mode 1: 26 FPS (-23%)
 - mode 2: 29 FPS (+52%)

The RX 480 performs very well with the bound/bindless mode (default), while
the all bindless mode still has to be improved.

The most important bottleneck with the all bindless mode is the number of
buffers that have to be added for every command stream. The overhead in the
winsys and in the kernel (amdgpu_cs_ioctl) becomes important in this situation.
This mode is still clearly CPU bound and should be improved (see the "Future
work" section).

Btw, without any optimisations, it was around 35FPS in low (mode 1).

=== Performance results for other Feral titles ===

I didn't record any numbers because these games have been initially
developed/tested against the NVIDIA blob which it's unaffected by a VERY huge
number of resident handles. While the AMD stack is really slow in this
situation. Though, as I said, all Feral games that use bindless work fine, we
just need to improve perf on both sides.

=== Future work ===

I have some ideas to try in order to improve performance with RadeonSI. I will
work on this once this series is upstream.

Please review,
Thanks!

Samuel Pitoiset (65):
  mapi: add GL_ARB_bindless_texture entry points
  mesa: implement ARB_bindless_texture
  mesa: add support for unsigned 64-bit vertex attributes
  mesa: add support for glUniformHandleui64*ARB()
  mesa: refuse to update sampler parameters when a handle is allocated
  mesa: refuse to update tex parameters when a handle is allocated
  mesa: refuse to change textures when a handle is allocated
  mesa: refuse to change tex buffers when a handle is allocated
  mesa: keep track of the current variable in add_uniform_to_shader
  mesa: store bindless samplers as PROGRAM_UNIFORM
  mesa: add infrastructure for bindless samplers/images bound to units
  glsl: process uniform samplers declared bindless
  glsl: process uniform images declared bindless
  glsl: pass the ir_variable object to set_opaque_binding()
  glsl: set the explicit binding value for bindless samplers/images
  glsl: add ir_variable::is_bindless()
  mesa: add update_single_shader_texture_used() helper
  mesa: add update_single_program_texture_state() helper
  mesa: update textures for bindless samplers bound to texture units
  mesa: pass gl_program to _mesa_associate_uniform_storage()
  mesa: associate uniform storage to bindless samplers/images
  mesa: handle bindless uniforms bound to texture/image units
  mesa: get rid of a workaround for bindless in _mesa_get_uniform()
  gallium: add PIPE_CAP_BINDLESS_TEXTURE
  gallium: add ARB_bindless_texture interface
  ddebug: add ARB_bindless_texture support
  trace: add ARB_bindless_texture support
  tc: add ARB_bindless_texture support
  tgsi: add new Bindless flag to tgsi_instruction_texture
  tgsi: add new Bindless flag to tgsi_instruction_memory
  tgsi/ureg: accept TGSI_FILE_{CONSTANT,INPUT} for dst registers
  st/glsl_to_tgsi: add support for bindless samplers
  st/glsl_to_tgsi: add support for bindless images
  st/glsl_to_tgsi: add support for bindless pack/unpack operations
  st/glsl_to_tgsi: teach the DCE pass about bindless samplers/images
  st/glsl_to_tgsi: teach rename_temp_registers() about bindless samplers
  tgsi/scan: record bindless samplers/images usage
  st/mesa: implement ARB_bindless_texture
  st/mesa: make update_single_texture() non-static
  st/mesa: make convert_sampler_from_unit() non-static
  st/mesa: add st_convert_image_from_unit() helper
  st/mesa: add st_create_{texture,image}_handle_from_unit() helper
  st/mesa: add infrastructure for storing bound texture/image handles
  st/mesa: make bindless samplers/images bound to units resident
  st/mesa: do not release sampler views for resident textures
  st/mesa: disable per-context seamless cubemap when using texture
    handles
  st/mesa: enable ARB_bindless_texture
  radeonsi: add a slab allocator for resident descriptors
  radeonsi: add si_init_descriptor_list() helper
  radeonsi: add si_set_sampler_view_desc() helper
  radeonsi: add si_set_shader_image_desc() helper
  radeonsi: implement ARB_bindless_texture
  radeonsi: add all resident buffers to the current CS
  radeonsi: only add descriptors in presence of resident handles
  radeonsi: add si_update_check_render_feedback() helper
  radeonsi: decompress DCC for resident textures/images
  radeonsi: decompress resident textures/images before graphics/compute
  radeonsi: isolate real framebuffer changes from the decompression
    passes
  radeonsi: track use of bindless samplers/images from tgsi_shader_info
  radeonsi: only decompress resident textures/images when used
  radeonsi: upload new descriptors when resident buffers are invalidated
  radeonsi: invalidate buffers which are made resident if needed
  radeonsi: add support for loading bindless samplers
  radeonsi: add support for loading bindless images
  radeonsi: enable ARB_bindless_texture

 docs/features.txt                                  |   2 +-
 docs/relnotes/17.2.0.html                          |   1 +
 src/compiler/glsl/ir.h                             |  11 +
 src/compiler/glsl/ir_uniform.h                     |  12 +
 src/compiler/glsl/link_uniform_initializers.cpp    |  42 +-
 src/compiler/glsl/link_uniforms.cpp                | 156 +++-
 src/compiler/glsl/shader_cache.cpp                 |  47 +
 src/gallium/auxiliary/tgsi/tgsi_build.c            |   8 +
 src/gallium/auxiliary/tgsi/tgsi_scan.c             |  37 +
 src/gallium/auxiliary/tgsi/tgsi_scan.h             |   2 +
 src/gallium/auxiliary/tgsi/tgsi_ureg.c             |  21 +-
 src/gallium/auxiliary/tgsi/tgsi_ureg.h             |  16 +-
 src/gallium/auxiliary/util/u_threaded_context.c    | 147 ++++
 .../auxiliary/util/u_threaded_context_calls.h      |   4 +
 src/gallium/docs/source/screen.rst                 |   2 +
 src/gallium/drivers/ddebug/dd_context.c            |  61 ++
 src/gallium/drivers/etnaviv/etnaviv_screen.c       |   1 +
 src/gallium/drivers/freedreno/freedreno_screen.c   |   1 +
 src/gallium/drivers/i915/i915_screen.c             |   1 +
 src/gallium/drivers/llvmpipe/lp_screen.c           |   1 +
 src/gallium/drivers/nouveau/nv30/nv30_screen.c     |   1 +
 src/gallium/drivers/nouveau/nv50/nv50_screen.c     |   1 +
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c     |   1 +
 src/gallium/drivers/r300/r300_screen.c             |   1 +
 src/gallium/drivers/r600/r600_pipe.c               |   1 +
 src/gallium/drivers/radeon/r600_pipe_common.h      |   4 +
 src/gallium/drivers/radeonsi/si_blit.c             | 131 ++-
 src/gallium/drivers/radeonsi/si_compute.c          |   2 +
 src/gallium/drivers/radeonsi/si_compute.h          |  14 +
 src/gallium/drivers/radeonsi/si_descriptors.c      | 943 +++++++++++++++++++--
 src/gallium/drivers/radeonsi/si_hw_context.c       |   1 +
 src/gallium/drivers/radeonsi/si_pipe.c             |  25 +
 src/gallium/drivers/radeonsi/si_pipe.h             |  68 ++
 src/gallium/drivers/radeonsi/si_shader.h           |  12 +
 src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c  |  48 +-
 src/gallium/drivers/radeonsi/si_state.c            |  10 +-
 src/gallium/drivers/radeonsi/si_state.h            |   9 +
 src/gallium/drivers/softpipe/sp_screen.c           |   1 +
 src/gallium/drivers/svga/svga_screen.c             |   1 +
 src/gallium/drivers/swr/swr_screen.cpp             |   1 +
 src/gallium/drivers/trace/tr_context.c             | 114 +++
 src/gallium/drivers/vc4/vc4_screen.c               |   1 +
 src/gallium/drivers/virgl/virgl_screen.c           |   1 +
 src/gallium/include/pipe/p_context.h               |  16 +
 src/gallium/include/pipe/p_defines.h               |   1 +
 src/gallium/include/pipe/p_shader_tokens.h         |   6 +-
 src/mapi/glapi/gen/ARB_bindless_texture.xml        | 100 +++
 src/mapi/glapi/gen/Makefile.am                     |   1 +
 src/mapi/glapi/gen/apiexec.py                      |   3 +
 src/mapi/glapi/gen/gl_API.xml                      |   4 +-
 src/mapi/glapi/gen/gl_genexec.py                   |   1 +
 src/mesa/Makefile.sources                          |   2 +
 src/mesa/main/api_loopback.c                       |  18 +
 src/mesa/main/api_loopback.h                       |   6 +
 src/mesa/main/bufferobj.c                          |   4 +-
 src/mesa/main/context.c                            |   3 +
 src/mesa/main/dd.h                                 |  19 +
 src/mesa/main/mtypes.h                             |  86 ++
 src/mesa/main/samplerobj.c                         |  48 ++
 src/mesa/main/shared.c                             |  12 +
 src/mesa/main/tests/dispatch_sanity.cpp            |  18 +
 src/mesa/main/teximage.c                           |  25 +-
 src/mesa/main/texobj.c                             |  12 +
 src/mesa/main/texparam.c                           |  61 ++
 src/mesa/main/texstate.c                           |  52 +-
 src/mesa/main/texturebindless.c                    | 902 ++++++++++++++++++++
 src/mesa/main/texturebindless.h                    |  96 +++
 src/mesa/main/uniform_query.cpp                    | 208 ++++-
 src/mesa/main/uniforms.c                           | 119 ++-
 src/mesa/main/uniforms.h                           |  16 +
 src/mesa/main/varray.c                             |  23 +
 src/mesa/main/varray.h                             |   3 +
 src/mesa/main/vtxfmt.c                             |   4 +
 src/mesa/program/ir_to_mesa.cpp                    |  36 +-
 src/mesa/program/ir_to_mesa.h                      |   4 +-
 src/mesa/program/program.c                         |   8 +
 src/mesa/state_tracker/st_atifs_to_tgsi.c          |   2 +-
 src/mesa/state_tracker/st_atom_constbuf.c          |   6 +
 src/mesa/state_tracker/st_atom_image.c             |  33 +-
 src/mesa/state_tracker/st_atom_sampler.c           |  32 +-
 src/mesa/state_tracker/st_atom_texture.c           |  15 +-
 src/mesa/state_tracker/st_cb_texture.c             |  84 ++
 src/mesa/state_tracker/st_context.c                |   2 +
 src/mesa/state_tracker/st_context.h                |  11 +
 src/mesa/state_tracker/st_extensions.c             |   1 +
 src/mesa/state_tracker/st_glsl_to_nir.cpp          |   3 +-
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp         | 138 ++-
 src/mesa/state_tracker/st_mesa_to_tgsi.c           |   2 +-
 src/mesa/state_tracker/st_pbo.c                    |   2 +-
 src/mesa/state_tracker/st_sampler_view.c           |   6 +
 src/mesa/state_tracker/st_shader_cache.c           |   3 +-
 src/mesa/state_tracker/st_texture.c                | 213 +++++
 src/mesa/state_tracker/st_texture.h                |  28 +
 src/mesa/vbo/vbo_attrib_tmp.h                      |  28 +
 src/mesa/vbo/vbo_context.h                         |   2 +
 src/mesa/vbo/vbo_exec_api.c                        |  15 +-
 src/mesa/vbo/vbo_save_api.c                        |   3 +
 97 files changed, 4250 insertions(+), 260 deletions(-)
 create mode 100644 src/mapi/glapi/gen/ARB_bindless_texture.xml
 create mode 100644 src/mesa/main/texturebindless.c
 create mode 100644 src/mesa/main/texturebindless.h

-- 
2.13.0



More information about the mesa-dev mailing list