[Intel-gfx] [PATCH] drm/i915: don't flush TLB on GEN8

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Fri May 27 10:55:42 UTC 2022


On 27/05/2022 10:09, Mauro Carvalho Chehab wrote:
> i915 selftest hangcheck is causing the i915 driver timeouts, as
> reported by Intel CI:
> 
> 	http://gfx-ci.fi.intel.com/cibuglog-ng/issuefilterassoc/24297?query_key=42a999f48fa6ecce068bc8126c069be7c31153b4
> 
> When such test runs, the only output is:
> 
> 	[   68.811639] i915: Performing live selftests with st_random_seed=0xe138eac7 st_timeout=500
> 	[   68.811792] i915: Running hangcheck
> 	[   68.811859] i915: Running intel_hangcheck_live_selftests/igt_hang_sanitycheck
> 	[   68.816910] i915 0000:00:02.0: [drm] Cannot find any crtc or sizes
> 	[   68.841597] i915: Running intel_hangcheck_live_selftests/igt_reset_nop
> 	[   69.346347] igt_reset_nop: 80 resets
> 	[   69.362695] i915: Running intel_hangcheck_live_selftests/igt_reset_nop_engine
> 	[   69.863559] igt_reset_nop_engine(rcs0): 709 resets
> 	[   70.364924] igt_reset_nop_engine(bcs0): 903 resets
> 	[   70.866005] igt_reset_nop_engine(vcs0): 659 resets
> 	[   71.367934] igt_reset_nop_engine(vcs1): 549 resets
> 	[   71.869259] igt_reset_nop_engine(vecs0): 553 resets
> 	[   71.882592] i915: Running intel_hangcheck_live_selftests/igt_reset_idle_engine
> 	[   72.383554] rcs0: Completed 16605 idle resets
> 	[   72.884599] bcs0: Completed 18641 idle resets
> 	[   73.385592] vcs0: Completed 17517 idle resets
> 	[   73.886658] vcs1: Completed 15474 idle resets
> 	[   74.387600] vecs0: Completed 17983 idle resets
> 	[   74.387667] i915: Running intel_hangcheck_live_selftests/igt_reset_active_engine
> 	[   74.889017] rcs0: Completed 747 active resets
> 	[   75.174240] intel_engine_reset(bcs0) failed, err:-110
> 	[   75.174301] bcs0: Completed 525 active resets
> 
> After that, the machine just silently hangs.
> 
> The root cause is that the flush TLB logic is not working as
> expected on GEN8.
> 
> Tested on an Intel NUC5i7RYB with an i7-5557U Broadwell CPU.
> 
> This patch partially reverts the logic by skipping GEN8 from
> the TLB cache flush.

Since I am pretty sure no such failures were spotted when merging the 
feature I assume the failure is sporadic and/or limited to some 
configurations? Do you have any details there? Because it is an 
important security issue we should not revert it lightly.

Regards,

Tvrtko

> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Sushma Venkatesh Reddy <sushma.venkatesh.reddy at intel.com>
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> Cc: Dave Airlie <airlied at redhat.com>
> Cc: Jon Bloomfield <jon.bloomfield at intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Cc: Jani Nikula <jani.nikula at intel.com>
> Cc: stable at vger.kernel.org # Kernel 5.17 and upper
> 
> Fixes: 494c2c9b630e ("drm/i915: Flush TLBs before releasing backing store")
> Signed-off-by: Mauro Carvalho Chehab <mchehab at kernel.org>
> ---
> 
> Patch resent, as the first version was using an old email. That's what happens
> when writing patches on old test machines ;-)
> 
>   drivers/gpu/drm/i915/gt/intel_gt.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 034182f85501..7965a77e5046 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -1191,10 +1191,10 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   	if (GRAPHICS_VER(i915) == 12) {
>   		regs = gen12_regs;
>   		num = ARRAY_SIZE(gen12_regs);
> -	} else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) {
> +	} else if (GRAPHICS_VER(i915) > 8 && GRAPHICS_VER(i915) <= 11) {
>   		regs = gen8_regs;
>   		num = ARRAY_SIZE(gen8_regs);
> -	} else if (GRAPHICS_VER(i915) < 8) {
> +	} else if (GRAPHICS_VER(i915) <= 8) {
>   		return;
>   	}
>   


More information about the Intel-gfx mailing list