[Mesa-dev] [PATCH 00/13] Use the constant engine in radeonsi
Nicolai Hähnle
nhaehnle at gmail.com
Thu Apr 14 18:29:20 UTC 2016
On 13.04.2016 20:34, Bas Nieuwenhuizen wrote:
> This series implements updating descriptors using the constant engine.
> This should result in a 0%-3% improvement for CPU bound applications,
> as we only have to upload the change descriptors from the CPU.
Thanks again for looking at this!
> There are very slight performance advantages on the GPU too, as the
> CE uploads the data directly to the L2 cache. This series reduces the
> number of cycles waited on LGKM counters by about 1/6th across multiple
> applications. However this translates in almost negligible performance
> improvements.
>
> Only the amdgpu winsys supports this for now. I think the radeon
> winsys can be extended to support it, but I don't have any radeon
> hardware so it is probably better that someone else implements that.
>
> This series does not include support for partially updating
> uniforms. My gallium interface proof of concept became kind of messy
> and the constant engine patches have been requested for descriptors.
Interesting coincidence that Jakob wrote an email about "Rework uniform
storage" on the same day...
Two related general observations on the series:
1. The CE also has a LOAD_CONST_RAM command. Instead of writing all
descriptors to the CE IB at CS begin, perhaps you could save space and
CPU time by using that for the first upload per CS? Though it also might
make sense to have this depend on the dirty_mask population count.
2. It seems like it should be possible to get rid of the list_dirty flag
now, but I actually don't think that's true, because AFAICS doing that
on top of this series will force a re-upload of all descriptors on every
new CS, which is clearly bad. Still, logically there really shouldn't be
a list_dirty _and_ a dirty_mask.
So how about a dirty_mask and a ce_ram_dirty flag? Do not set the
dirty_mask on begin_new_cs, and use the dirty_mask to gate descriptor
uploads. Set the ce_ram_dirty flag on begin_new_cs, and use it to either
(a) override the dirty_mask as appropriate for the first descriptor
upload in each CS or (b) emit a LOAD_CONST_RAM command.
I'd appreciate if you could address these points - I don't feel too
strongly about the LOAD_CONST_RAM optimization, but having both
list_dirty and dirty_mask is weird.
Thanks,
Nicolai
> Bas Nieuwenhuizen (10):
> winsys/amdgpu: Enlarge const IB size.
> radeonsi: Create CE IB.
> radeonsi: Add dirty_mask to descriptor list.
> radeonsi: Add CE packet definitions.
> radeonsi: Add CE synchronization.
> radeonsi: Allocate chunks of CE ram.
> radeonsi: Add CE uploader.
> radeonsi: Use CE for vertex buffers.
> gallium/util: Add u_bit_scan_consecutive_range64.
> radeonsi: Use CE for all descriptors.
>
> Marek Olšák (3):
> gallium/radeon: move ring_type into winsyses
> winsys/amdgpu: split IB data into a new structure in preparation for
> CE
> winsys/amdgpu: add support for const IB
>
> src/gallium/auxiliary/util/u_math.h | 8 ++
> src/gallium/drivers/radeon/r600_pipe_common.c | 1 +
> src/gallium/drivers/radeon/r600_pipe_common.h | 1 +
> src/gallium/drivers/radeon/radeon_winsys.h | 19 +++-
> src/gallium/drivers/radeonsi/si_descriptors.c | 149 +++++++++++++++++++++-----
> src/gallium/drivers/radeonsi/si_hw_context.c | 4 +-
> src/gallium/drivers/radeonsi/si_pipe.c | 18 ++++
> src/gallium/drivers/radeonsi/si_pipe.h | 6 ++
> src/gallium/drivers/radeonsi/si_state.h | 4 +
> src/gallium/drivers/radeonsi/si_state_draw.c | 24 +++++
> src/gallium/drivers/radeonsi/sid.h | 6 ++
> src/gallium/winsys/amdgpu/drm/amdgpu_bo.c | 5 -
> src/gallium/winsys/amdgpu/drm/amdgpu_bo.h | 6 ++
> src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 125 ++++++++++++++-------
> src/gallium/winsys/amdgpu/drm/amdgpu_cs.h | 26 +++--
> src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 10 +-
> src/gallium/winsys/radeon/drm/radeon_drm_cs.h | 1 +
> 17 files changed, 326 insertions(+), 87 deletions(-)
>
More information about the mesa-dev
mailing list