[Mesa-dev] [PATCH 3/3] radeonsi: improve the computation and comment of scratch_waves
Nicolai Hähnle
nhaehnle at gmail.com
Wed Jun 8 17:08:00 UTC 2016
For the series:
Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
On 08.06.2016 17:36, Marek Olšák wrote:
> From: Marek Olšák <marek.olsak at amd.com>
>
> 2% isn't much. If you think the number should be decreased, please speak up.
> ---
> src/gallium/drivers/radeonsi/si_pipe.c | 22 ++++++++++++++++++----
> 1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/src/gallium/drivers/radeonsi/si_pipe.c b/src/gallium/drivers/radeonsi/si_pipe.c
> index 5d7d2f3..0c601da 100644
> --- a/src/gallium/drivers/radeonsi/si_pipe.c
> +++ b/src/gallium/drivers/radeonsi/si_pipe.c
> @@ -236,11 +236,25 @@ static struct pipe_context *si_create_context(struct pipe_screen *screen,
> R600_COHERENCY_SHADER);
> }
>
> - /* XXX: This is the maximum value allowed. I'm not sure how to compute
> - * this for non-cs shaders. Using the wrong value here can result in
> - * GPU lockups, but the maximum value seems to always work.
> + uint64_t max_threads_per_block;
> + screen->get_compute_param(screen, PIPE_SHADER_IR_TGSI,
> + PIPE_COMPUTE_CAP_MAX_THREADS_PER_BLOCK,
> + &max_threads_per_block);
> +
> + /* The maximum number of scratch waves. Scratch space isn't divided
> + * evenly between CUs. The number is only a function of the number of CUs.
> + * We can decrease the constant to decrease the scratch buffer size.
> + *
> + * sctx->scratch_waves must be >= the maximum posible size of
> + * 1 threadgroup, so that the hw doesn't hang from being unable
> + * to start any.
> + *
> + * The recommended value is 4 per CU at most. Higher numbers don't
> + * bring much benefit, but they still occupy chip resources (think
> + * async compute). I've seen ~2% performance difference between 4 and 32.
> */
> - sctx->scratch_waves = 32 * sscreen->b.info.num_good_compute_units;
> + sctx->scratch_waves = MAX2(32 * sscreen->b.info.num_good_compute_units,
> + max_threads_per_block / 64);
>
> /* Initialize LLVM TargetMachine */
> r600_target = radeon_llvm_get_r600_target(triple);
>
More information about the mesa-dev
mailing list