[Mesa-dev] [RFC PATCH 00/65] ARB_bindless_texture for RadeonSI

Samuel Pitoiset samuel.pitoiset at gmail.com
Mon May 22 08:21:53 UTC 2017



On 05/20/2017 09:01 PM, Nicolai Hähnle wrote:
> On 19.05.2017 18:52, Samuel Pitoiset wrote:
>> Hi,
>>
>> This series implements ARB_bindless_texture for RadeonSI.
>>
>> Reminder: the GLSL compiler part is already upstream.
>>
>> This series has been mainly tested with Feral games, here's the list of
>> existing games that use ARB_bindless_texture (though not by default):
>>
>> - DXMD
>> - Hitman
>> - Dirt Rally
>> - Mad Max
>>
>> Today, Feral announced "Warhammer 40,000: Dawn of War III" (called 
>> DOW3) which
>> is going to be released next month. This game *requires* 
>> ARB_bindless_texture,
>> that now explains why I did all this work. :-) So, we have ~3 weeks 
>> for merging
>> this whole series. It would be very nice to have DOW3 support at day one!
>>
>> === Tracking bindless problems ===
>>
>> The following games have been successfully tested:
>>
>> - Dirt Rally
>> - Hitman
>> - Mad Max
>> - DOW3
>>
>> For these:
>>
>> - No rendering issues
>> - No VM faults (ie. amdgpu.vm_debug=1)
>>
>> However, DXMD is currently broken because the bindless_sampler layout 
>> qualifier
>> is missing, which ends up by reporting a ton of INVALID_OPERATION 
>> errors. Note
>> that Feral implemented bindless support against NV_bindless_texture 
>> and not
>> ARB_bindless_texture. The main difference is that bindless_sampler is 
>> implicit
>> for NV_* while it's required for ARB_*. Feral plan to fix this soon.
>>
>> All ARB_bindless_texture piglit tests pass with this series.
>>
>> === Tracking regressions/changes ===
>>
>> - No regressions with the Intel CI system
>> - One piglit regression that needs to be fixed
>>   (arb_texture_multisample-sample-position)
>> - No shader-db changes
>> - No CPU overhead (glxgears and Heaven in low)
>>
>> === Performance results for DOW3 ===
>>
>> DOW3 exposes two bindless texture modes:
>> - mode 1: all bindless (ie. no bound samplers)
>> - mode 2: bound/bindless (ie. only bindless when the limit is reached)
>>
>> CPU: Intel(R) Core(TM) i5-4460  CPU @ 3.20GHz
>> NVIDIA blob: 381.22
>>
>> == GTX 1060 ==
>>
>> LOW:
>>  - mode 1: 89 FPS
>>  - mode 2: 51 FPS
>>
>> MEDIUM:
>>  - mode 1: 49 FPS
>>  - mode 2: 28 FPS
>>
>> HIGH:
>>  - mode 1: 32 FPS
>>  - mode 2: 19 FPS
>>
>> The GTX 1060 performs very well with the all bindless mode (default), 
>> while
>> the bound/bindless mode is not good at all.
>>
>> == RX480 ==
>>
>> LOW:
>>  - mode 1: 67 FPS (-32%)
>>  - mode 2: 75 FPS (+32%)
>>
>> MEDIUM:
>>  - mode 1: 38 FPS (-28%)
>>  - mode 2: 44 FPS (+57%)
>>
>> HIGH:
>>  - mode 1: 26 FPS (-23%)
>>  - mode 2: 29 FPS (+52%)
> 
> What do the numbers in parenthesis mean? Relative to the GTX 1060?

Exactly.

> 
> Out of curiosity, did you do your tests with threaded gallium? I imagine 
> that it could help a lot already since it pushes the buffer handling 
> onto the gallium thread. And I guess it could be pushed further into the 
> CS submit thread.

Yes, but I don't see any performance differences with threaded gallium.

> 
> Anyway, this is all very nice! While I've obviously seen a lot of bits 
> and pieces, I'll get to reviewing the whole series in earnest soon :-)

Cool, thanks! :)

> 
> Cheers,
> Nicolai
> 
> 
>>
>> The RX 480 performs very well with the bound/bindless mode (default), 
>> while
>> the all bindless mode still has to be improved.
>>
>> The most important bottleneck with the all bindless mode is the number of
>> buffers that have to be added for every command stream. The overhead 
>> in the
>> winsys and in the kernel (amdgpu_cs_ioctl) becomes important in this 
>> situation.
>> This mode is still clearly CPU bound and should be improved (see the 
>> "Future
>> work" section).
>>
>> Btw, without any optimisations, it was around 35FPS in low (mode 1).
>>
>> === Performance results for other Feral titles ===
>>
>> I didn't record any numbers because these games have been initially
>> developed/tested against the NVIDIA blob which it's unaffected by a 
>> VERY huge
>> number of resident handles. While the AMD stack is really slow in this
>> situation. Though, as I said, all Feral games that use bindless work 
>> fine, we
>> just need to improve perf on both sides.
>>
>> === Future work ===
>>
>> I have some ideas to try in order to improve performance with 
>> RadeonSI. I will
>> work on this once this series is upstream.
>>
>> Please review,
>> Thanks!
>>
>> Samuel Pitoiset (65):
>>   mapi: add GL_ARB_bindless_texture entry points
>>   mesa: implement ARB_bindless_texture
>>   mesa: add support for unsigned 64-bit vertex attributes
>>   mesa: add support for glUniformHandleui64*ARB()
>>   mesa: refuse to update sampler parameters when a handle is allocated
>>   mesa: refuse to update tex parameters when a handle is allocated
>>   mesa: refuse to change textures when a handle is allocated
>>   mesa: refuse to change tex buffers when a handle is allocated
>>   mesa: keep track of the current variable in add_uniform_to_shader
>>   mesa: store bindless samplers as PROGRAM_UNIFORM
>>   mesa: add infrastructure for bindless samplers/images bound to units
>>   glsl: process uniform samplers declared bindless
>>   glsl: process uniform images declared bindless
>>   glsl: pass the ir_variable object to set_opaque_binding()
>>   glsl: set the explicit binding value for bindless samplers/images
>>   glsl: add ir_variable::is_bindless()
>>   mesa: add update_single_shader_texture_used() helper
>>   mesa: add update_single_program_texture_state() helper
>>   mesa: update textures for bindless samplers bound to texture units
>>   mesa: pass gl_program to _mesa_associate_uniform_storage()
>>   mesa: associate uniform storage to bindless samplers/images
>>   mesa: handle bindless uniforms bound to texture/image units
>>   mesa: get rid of a workaround for bindless in _mesa_get_uniform()
>>   gallium: add PIPE_CAP_BINDLESS_TEXTURE
>>   gallium: add ARB_bindless_texture interface
>>   ddebug: add ARB_bindless_texture support
>>   trace: add ARB_bindless_texture support
>>   tc: add ARB_bindless_texture support
>>   tgsi: add new Bindless flag to tgsi_instruction_texture
>>   tgsi: add new Bindless flag to tgsi_instruction_memory
>>   tgsi/ureg: accept TGSI_FILE_{CONSTANT,INPUT} for dst registers
>>   st/glsl_to_tgsi: add support for bindless samplers
>>   st/glsl_to_tgsi: add support for bindless images
>>   st/glsl_to_tgsi: add support for bindless pack/unpack operations
>>   st/glsl_to_tgsi: teach the DCE pass about bindless samplers/images
>>   st/glsl_to_tgsi: teach rename_temp_registers() about bindless samplers
>>   tgsi/scan: record bindless samplers/images usage
>>   st/mesa: implement ARB_bindless_texture
>>   st/mesa: make update_single_texture() non-static
>>   st/mesa: make convert_sampler_from_unit() non-static
>>   st/mesa: add st_convert_image_from_unit() helper
>>   st/mesa: add st_create_{texture,image}_handle_from_unit() helper
>>   st/mesa: add infrastructure for storing bound texture/image handles
>>   st/mesa: make bindless samplers/images bound to units resident
>>   st/mesa: do not release sampler views for resident textures
>>   st/mesa: disable per-context seamless cubemap when using texture
>>     handles
>>   st/mesa: enable ARB_bindless_texture
>>   radeonsi: add a slab allocator for resident descriptors
>>   radeonsi: add si_init_descriptor_list() helper
>>   radeonsi: add si_set_sampler_view_desc() helper
>>   radeonsi: add si_set_shader_image_desc() helper
>>   radeonsi: implement ARB_bindless_texture
>>   radeonsi: add all resident buffers to the current CS
>>   radeonsi: only add descriptors in presence of resident handles
>>   radeonsi: add si_update_check_render_feedback() helper
>>   radeonsi: decompress DCC for resident textures/images
>>   radeonsi: decompress resident textures/images before graphics/compute
>>   radeonsi: isolate real framebuffer changes from the decompression
>>     passes
>>   radeonsi: track use of bindless samplers/images from tgsi_shader_info
>>   radeonsi: only decompress resident textures/images when used
>>   radeonsi: upload new descriptors when resident buffers are invalidated
>>   radeonsi: invalidate buffers which are made resident if needed
>>   radeonsi: add support for loading bindless samplers
>>   radeonsi: add support for loading bindless images
>>   radeonsi: enable ARB_bindless_texture
>>
>>  docs/features.txt                                  |   2 +-
>>  docs/relnotes/17.2.0.html                          |   1 +
>>  src/compiler/glsl/ir.h                             |  11 +
>>  src/compiler/glsl/ir_uniform.h                     |  12 +
>>  src/compiler/glsl/link_uniform_initializers.cpp    |  42 +-
>>  src/compiler/glsl/link_uniforms.cpp                | 156 +++-
>>  src/compiler/glsl/shader_cache.cpp                 |  47 +
>>  src/gallium/auxiliary/tgsi/tgsi_build.c            |   8 +
>>  src/gallium/auxiliary/tgsi/tgsi_scan.c             |  37 +
>>  src/gallium/auxiliary/tgsi/tgsi_scan.h             |   2 +
>>  src/gallium/auxiliary/tgsi/tgsi_ureg.c             |  21 +-
>>  src/gallium/auxiliary/tgsi/tgsi_ureg.h             |  16 +-
>>  src/gallium/auxiliary/util/u_threaded_context.c    | 147 ++++
>>  .../auxiliary/util/u_threaded_context_calls.h      |   4 +
>>  src/gallium/docs/source/screen.rst                 |   2 +
>>  src/gallium/drivers/ddebug/dd_context.c            |  61 ++
>>  src/gallium/drivers/etnaviv/etnaviv_screen.c       |   1 +
>>  src/gallium/drivers/freedreno/freedreno_screen.c   |   1 +
>>  src/gallium/drivers/i915/i915_screen.c             |   1 +
>>  src/gallium/drivers/llvmpipe/lp_screen.c           |   1 +
>>  src/gallium/drivers/nouveau/nv30/nv30_screen.c     |   1 +
>>  src/gallium/drivers/nouveau/nv50/nv50_screen.c     |   1 +
>>  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c     |   1 +
>>  src/gallium/drivers/r300/r300_screen.c             |   1 +
>>  src/gallium/drivers/r600/r600_pipe.c               |   1 +
>>  src/gallium/drivers/radeon/r600_pipe_common.h      |   4 +
>>  src/gallium/drivers/radeonsi/si_blit.c             | 131 ++-
>>  src/gallium/drivers/radeonsi/si_compute.c          |   2 +
>>  src/gallium/drivers/radeonsi/si_compute.h          |  14 +
>>  src/gallium/drivers/radeonsi/si_descriptors.c      | 943 
>> +++++++++++++++++++--
>>  src/gallium/drivers/radeonsi/si_hw_context.c       |   1 +
>>  src/gallium/drivers/radeonsi/si_pipe.c             |  25 +
>>  src/gallium/drivers/radeonsi/si_pipe.h             |  68 ++
>>  src/gallium/drivers/radeonsi/si_shader.h           |  12 +
>>  src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c  |  48 +-
>>  src/gallium/drivers/radeonsi/si_state.c            |  10 +-
>>  src/gallium/drivers/radeonsi/si_state.h            |   9 +
>>  src/gallium/drivers/softpipe/sp_screen.c           |   1 +
>>  src/gallium/drivers/svga/svga_screen.c             |   1 +
>>  src/gallium/drivers/swr/swr_screen.cpp             |   1 +
>>  src/gallium/drivers/trace/tr_context.c             | 114 +++
>>  src/gallium/drivers/vc4/vc4_screen.c               |   1 +
>>  src/gallium/drivers/virgl/virgl_screen.c           |   1 +
>>  src/gallium/include/pipe/p_context.h               |  16 +
>>  src/gallium/include/pipe/p_defines.h               |   1 +
>>  src/gallium/include/pipe/p_shader_tokens.h         |   6 +-
>>  src/mapi/glapi/gen/ARB_bindless_texture.xml        | 100 +++
>>  src/mapi/glapi/gen/Makefile.am                     |   1 +
>>  src/mapi/glapi/gen/apiexec.py                      |   3 +
>>  src/mapi/glapi/gen/gl_API.xml                      |   4 +-
>>  src/mapi/glapi/gen/gl_genexec.py                   |   1 +
>>  src/mesa/Makefile.sources                          |   2 +
>>  src/mesa/main/api_loopback.c                       |  18 +
>>  src/mesa/main/api_loopback.h                       |   6 +
>>  src/mesa/main/bufferobj.c                          |   4 +-
>>  src/mesa/main/context.c                            |   3 +
>>  src/mesa/main/dd.h                                 |  19 +
>>  src/mesa/main/mtypes.h                             |  86 ++
>>  src/mesa/main/samplerobj.c                         |  48 ++
>>  src/mesa/main/shared.c                             |  12 +
>>  src/mesa/main/tests/dispatch_sanity.cpp            |  18 +
>>  src/mesa/main/teximage.c                           |  25 +-
>>  src/mesa/main/texobj.c                             |  12 +
>>  src/mesa/main/texparam.c                           |  61 ++
>>  src/mesa/main/texstate.c                           |  52 +-
>>  src/mesa/main/texturebindless.c                    | 902 
>> ++++++++++++++++++++
>>  src/mesa/main/texturebindless.h                    |  96 +++
>>  src/mesa/main/uniform_query.cpp                    | 208 ++++-
>>  src/mesa/main/uniforms.c                           | 119 ++-
>>  src/mesa/main/uniforms.h                           |  16 +
>>  src/mesa/main/varray.c                             |  23 +
>>  src/mesa/main/varray.h                             |   3 +
>>  src/mesa/main/vtxfmt.c                             |   4 +
>>  src/mesa/program/ir_to_mesa.cpp                    |  36 +-
>>  src/mesa/program/ir_to_mesa.h                      |   4 +-
>>  src/mesa/program/program.c                         |   8 +
>>  src/mesa/state_tracker/st_atifs_to_tgsi.c          |   2 +-
>>  src/mesa/state_tracker/st_atom_constbuf.c          |   6 +
>>  src/mesa/state_tracker/st_atom_image.c             |  33 +-
>>  src/mesa/state_tracker/st_atom_sampler.c           |  32 +-
>>  src/mesa/state_tracker/st_atom_texture.c           |  15 +-
>>  src/mesa/state_tracker/st_cb_texture.c             |  84 ++
>>  src/mesa/state_tracker/st_context.c                |   2 +
>>  src/mesa/state_tracker/st_context.h                |  11 +
>>  src/mesa/state_tracker/st_extensions.c             |   1 +
>>  src/mesa/state_tracker/st_glsl_to_nir.cpp          |   3 +-
>>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp         | 138 ++-
>>  src/mesa/state_tracker/st_mesa_to_tgsi.c           |   2 +-
>>  src/mesa/state_tracker/st_pbo.c                    |   2 +-
>>  src/mesa/state_tracker/st_sampler_view.c           |   6 +
>>  src/mesa/state_tracker/st_shader_cache.c           |   3 +-
>>  src/mesa/state_tracker/st_texture.c                | 213 +++++
>>  src/mesa/state_tracker/st_texture.h                |  28 +
>>  src/mesa/vbo/vbo_attrib_tmp.h                      |  28 +
>>  src/mesa/vbo/vbo_context.h                         |   2 +
>>  src/mesa/vbo/vbo_exec_api.c                        |  15 +-
>>  src/mesa/vbo/vbo_save_api.c                        |   3 +
>>  97 files changed, 4250 insertions(+), 260 deletions(-)
>>  create mode 100644 src/mapi/glapi/gen/ARB_bindless_texture.xml
>>  create mode 100644 src/mesa/main/texturebindless.c
>>  create mode 100644 src/mesa/main/texturebindless.h
>>
> 
> 


More information about the mesa-dev mailing list