[Mesa-stable] [PATCH 3/3] i965: Use the correct number of threads for compute shaders.

Francisco Jerez currojerez at riseup.net
Thu Jun 9 20:37:52 UTC 2016


Kenneth Graunke <kenneth at whitecape.org> writes:

> We were programming the number of threads per subslice, when we should
> have been programming the total number of threads on the GPU as a whole.
>
> Thanks to Curro and Jordan for helping track this down!
>
> On Skylake GT3e:
> - Improves performance in Unreal's Elemental Demo by roughly 1.5-1.7x.
> - Improves performance in Synmark's Gl43CSDof by roughly 3.7x.
> - Improves performance in Synmark's Gl43GSCloth by roughly 1.18x.
>
> On Broadwell GT2:
> - Improves performance in Unreal's Elemental Demo by roughly 1.23x.
> - Improves performance in Synmark's Gl43CSDof by roughly 2.0x.
> - Improves performance in Synmark's Gl43GSCloth by 1.47035% +/-
>   0.255654% (n=25).
>
> On Haswell GT3e:
> - Improves performance in Unreal's Elemental Demo (in GL 4.3 mode)
>   by roughly 1.18x.
> - Decreases performance in Gl43CSCloth by -2.88315% +/- 2.54785%?
> - Gl43CSDof is still broken.
>
Does it work if you overallocate the scratch BO size by 128/70 on HSW?
(which is roughly the amount of padding introduced by the shared
function in the scratch space to account for non-existent EUs).

> Cc: "12.0" <mesa-stable at lists.freedesktop.org>
> Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> ---
>  src/mesa/drivers/dri/i965/brw_cs.c        | 4 +++-
>  src/mesa/drivers/dri/i965/gen7_cs_state.c | 4 +++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
>
> (Note that if we drop the previous patch, Haswell will be unchanged.)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_cs.c b/src/mesa/drivers/dri/i965/brw_cs.c
> index 2a25584..c8598d6 100644
> --- a/src/mesa/drivers/dri/i965/brw_cs.c
> +++ b/src/mesa/drivers/dri/i965/brw_cs.c
> @@ -149,8 +149,10 @@ brw_codegen_cs_prog(struct brw_context *brw,
>     }
>  
>     if (prog_data.base.total_scratch) {
> +      const unsigned subslices = MAX2(brw->intelScreen->subslice_total, 1);
>        brw_get_scratch_bo(brw, &brw->cs.base.scratch_bo,
> -                         prog_data.base.total_scratch * brw->max_cs_threads);
> +                         prog_data.base.total_scratch *
> +                         brw->max_cs_threads * subslices);
>     }
>  
>     if (unlikely(INTEL_DEBUG & DEBUG_CS))
> diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> index aff1f4e..0eca651 100644
> --- a/src/mesa/drivers/dri/i965/gen7_cs_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> @@ -80,7 +80,9 @@ brw_upload_cs_state(struct brw_context *brw)
>     const uint32_t vfe_num_urb_entries = brw->gen >= 8 ? 2 : 0;
>     const uint32_t vfe_gpgpu_mode =
>        brw->gen == 7 ? SET_FIELD(1, GEN7_MEDIA_VFE_STATE_GPGPU_MODE) : 0;
> -   OUT_BATCH(SET_FIELD(brw->max_cs_threads - 1, MEDIA_VFE_STATE_MAX_THREADS) |
> +   const uint32_t subslices = MAX2(brw->intelScreen->subslice_total, 1);
> +   OUT_BATCH(SET_FIELD(brw->max_cs_threads * subslices - 1,
> +                       MEDIA_VFE_STATE_MAX_THREADS) |
>               SET_FIELD(vfe_num_urb_entries, MEDIA_VFE_STATE_URB_ENTRIES) |
>               SET_FIELD(1, MEDIA_VFE_STATE_RESET_GTW_TIMER) |
>               SET_FIELD(1, MEDIA_VFE_STATE_BYPASS_GTW) |
> -- 
> 2.8.3
>
> _______________________________________________
> mesa-stable mailing list
> mesa-stable at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-stable
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-stable/attachments/20160609/acb86ab0/attachment-0001.sig>


More information about the mesa-stable mailing list