[Intel-gfx] [cache coherency bug] i915 and PAT attributes

Andrew Cooper Andrew.Cooper3 at citrix.com
Fri Dec 16 15:30:13 UTC 2022


On 08/12/2022 1:55 pm, Marek Marczykowski-Górecki wrote:
> Hi,
>
> There is an issue with i915 on Xen PV (dom0). The end result is a lot of
> glitches, like here: https://openqa.qubes-os.org/tests/54748#step/startup/8
> (this one is on ADL, Linux 6.1-rc7 as a Xen PV dom0). It's using Xorg
> with "modesetting" driver.
>
> After some iterations of debugging, we narrowed it down to i915 handling
> caching. The main difference is that PAT is setup differently on Xen PV
> than on native Linux. Normally, Linux does have appropriate abstraction
> for that, but apparently something related to i915 doesn't play well
> with it. The specific difference is:
> native linux:
> x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT
> xen pv:
> x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WC  WP  UC  UC
>                                   ~~          ~~      ~~  ~~
>
> The specific impact depends on kernel version and the hardware. The most
> severe issues I see on >=ADL, but some older hardware is affected too -
> sometimes only if composition is disabled in the window manager.
> Some more information is collected at
> https://github.com/QubesOS/qubes-issues/issues/4782 (and few linked
> duplicates...).
>
> Kind-of related commit is here:
> https://github.com/torvalds/linux/commit/bdd8b6c98239cad ("drm/i915:
> replace X86_FEATURE_PAT with pat_enabled()") - it is the place where
> i915 explicitly checks for PAT support, so I'm cc-ing people mentioned
> there too.
>
> Any ideas?
>
> The issue can be easily reproduced without Xen too, by adjusting PAT in
> Linux:
> -----8<-----
> diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
> index 66a209f7eb86..319ab60c8d8c 100644
> --- a/arch/x86/mm/pat/memtype.c
> +++ b/arch/x86/mm/pat/memtype.c
> @@ -400,8 +400,8 @@ void pat_init(void)
>  		 * The reserved slots are unused, but mapped to their
>  		 * corresponding types in the presence of PAT errata.
>  		 */
> -		pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
> -		      PAT(4, WB) | PAT(5, WP) | PAT(6, UC_MINUS) | PAT(7, WT);
> +		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> +		      PAT(4, WC) | PAT(5, WP) | PAT(6, UC)       | PAT(7, UC);
>  	}
>  
>  	if (!pat_bp_initialized) {
> -----8<-----
>

Hello, can anyone help please?

Intel's CI has taken this reproducer of the bug, and confirmed the
regression. 
https://lore.kernel.org/intel-gfx/Y5Hst0bCxQDTN7lK@mail-itl/T/#m4480c15a0d117dce6210562eb542875e757647fb

We're reasonably confident that it is an i915 bug (given the repro with
no Xen in the mix), but we're out of any further ideas.

Thanks,

~Andrew


More information about the Intel-gfx mailing list