[Intel-gfx] [PATCH v3 2/3] drm/i915/bxt: Fix inadvertent CPU snooping due to incorrect MOCS config
Yang, Rong R
rong.r.yang at intel.com
Thu Jul 14 08:33:50 UTC 2016
> -----Original Message-----
> From: Deak, Imre
> Sent: Friday, July 1, 2016 21:40
> To: intel-gfx at lists.freedesktop.org
> Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>; Chris Wilson <chris at chris-
> wilson.co.uk>; Yang, Rong R <rong.r.yang at intel.com>; Zhao, Yakui
> <yakui.zhao at intel.com>; Tamminen, Eero T <eero.t.tamminen at intel.com>
> Subject: [PATCH v3 2/3] drm/i915/bxt: Fix inadvertent CPU snooping due to
> incorrect MOCS config
>
> Setting a write-back cache policy in the MOCS entry definition also implies
> snooping, which has a considerable overhead. This is unexpected for a few
> reasons:
> - From user-space's point of view since it didn't want a coherent
> surface (it didn't set the buffer as such via the set caching IOCTL).
> - There is a separate MOCS entry field for snooping (which we never
> set).
> - This MOCS table is about caching in (e)LLC and there is no (e)LLC on
> BXT. There is a separate table for L3 cache control.
>
> Considering the above the current behavior of snooping looks like an
> unintentional side-effect of the WB setting. Changing it to be LLC-UC gets rid
> of the snooping without any ill-effects. For a coherent surface the application
> would use a separate MOCS entry at index 1 and call the set caching IOCTL to
> setup the PTE entries for the corresponding buffer to be snooped. In the
> future we could also add a new MOCS entry for coherent surfaces.
>
> This resulted in 70% improvement in synthetic texturing benchmarks.
>
> Kudos to Valtteri Rantala, Eero Tamminen and Michael T Frederick and Ville
> who helped to narrow the source of problem to the kernel and to the
> snooping behaviour in particular.
>
> With a follow-up change to adjust the 3rd entry value
> igt/gem_mocs_settings is passing after this change.
>
> v2:
> - Rebase on v2 of patch 1/2.
> v3:
> - Set the entry as LLC uncached instead of PTE-passthrough. This way
> we also keep snooping disabled, but we also make the cacheability/
> coherency setting indepent of the PTE which is managed by the
> kernel. (Chris)
About 20% improvement in OpenCL benchmark luxmark.
Add: Tested-by: Rong R Yang <rong.r.yang at intel.com>
> CC: Rong R Yang <rong.r.yang at intel.com>
> CC: Yakui Zhao <yakui.zhao at intel.com>
> CC: Valtteri Rantala <valtteri.rantala at intel.com>
> CC: Eero Tamminen <eero.t.tamminen at intel.com>
> CC: Michael T Frederick <michael.t.frederick at intel.com>
> CC: Ville Syrjälä <ville.syrjala at linux.intel.com>
> CC: Chris Wilson <chris at chris-wilson.co.uk>
> Signed-off-by: Imre Deak <imre.deak at intel.com>
> ---
> drivers/gpu/drm/i915/intel_mocs.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_mocs.c
> b/drivers/gpu/drm/i915/intel_mocs.c
> index d36e609..927825f 100644
> --- a/drivers/gpu/drm/i915/intel_mocs.c
> +++ b/drivers/gpu/drm/i915/intel_mocs.c
> @@ -149,8 +149,8 @@ static const struct drm_i915_mocs_entry
> broxton_mocs_table[] = {
> .l3cc_value = L3_ESC(0) | L3_SCC(0) | L3_CACHEABILITY(L3_WB),
> },
> {
> - /* 0x0000003b */
> - .control_value = LE_CACHEABILITY(LE_WB) |
> + /* 0x00000039 */
> + .control_value = LE_CACHEABILITY(LE_UC) |
> LE_TGT_CACHE(LE_TC_LLC_ELLC) |
> LE_LRUM(3) | LE_AOM(0) | LE_RSC(0) | LE_SCC(0) |
> LE_PFM(0) | LE_SCF(0),
> --
> 2.5.0
More information about the Intel-gfx
mailing list