[Intel-gfx] Xorg SEGV in Xen PV dom0 after updating from 5.16.18 to 5.17.5

Thorsten Leemhuis regressions at leemhuis.info
Sun May 15 08:30:38 UTC 2022


On 04.05.22 08:48, Juergen Gross wrote:
> On 04.05.22 07:46, Thorsten Leemhuis wrote:
>> Hi, this is your Linux kernel regression tracker. Sending this just to
>> CC the developers of the culprit mentioned below (bdd8b6c98239cad
>> ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")) and the
>> maintainers for the subsystem.
>>
>> While at it a quick note: I wonder if this is problem a similar to one
>> that recently turned up with amdgpu and is fixed by this problem:
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=78b12008f20
> 
> No, this is different.
> 
> I have posted a patch yesterday which should fix the issue:
> 
> https://lore.kernel.org/lkml/20220503132207.17234-3-jgross@suse.com/T/#m75efc68c96d8f7160229b5f3147242221ce0c28c

What happened to that? It looks like there wasn't any progress in the
past week to get this regression fixed, which sometimes happens, but is
kinda undesired when it comes to regressions.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

#regzbot poke

>> Ciao, Thorsten
>>
>> On 04.05.22 02:37, Marek Marczykowski-Górecki wrote:
>>>
>>> After updating from 5.16.18 to 5.17.5 in Xen PV dom0, my Xorg started
>>> crashing when displaying any window mapped from a guest (domU) system.
>>> This is 100% reproducible.
>>> The system is Qubes OS, and it uses a trick that maps windows content
>>> from other guests using Xen grant tables, wrapped as "shared memory"
>>> from Xorg point of view (so, the memory that Xorg mmaps is not just from
>>> another process, but from another VM). That's the ShmPutImage you can
>>> see on the stack trace below.
>>>
>>> Stack trace of thread 12858:
>>> #0  0x00007f80029e17d5 raise (libc.so.6 + 0x3c7d5)
>>> #1  0x00007f80029ca895 abort (libc.so.6 + 0x25895)
>>> #2  0x00005b3469ace0e0 OsAbort (Xorg + 0x1c60e0)
>>> #3  0x00005b3469ad3959 AbortServer (Xorg + 0x1cb959)
>>> #4  0x00005b3469ad46aa FatalError (Xorg + 0x1cc6aa)
>>> #5  0x00005b3469acb450 OsSigHandler (Xorg + 0x1c3450)
>>> #6  0x00007f8002b85a90 __restore_rt (libpthread.so.0 + 0x14a90)
>>> #7  0x00007f8002b0a2a1 __memmove_avx_unaligned_erms (libc.so.6 +
>>> 0x1652a1)
>>> #8  0x00007f80015dfcc9 linear_to_xtiled_faster (iris_dri.so + 0xc91cc9)
>>> #9  0x00007f80015e3477 _isl_memcpy_linear_to_tiled (iris_dri.so +
>>> 0xc95477)
>>> #10 0x00007f8001468440 iris_texture_subdata (iris_dri.so + 0xb1a440)
>>> #11 0x00007f8000a76107 st_TexSubImage (iris_dri.so + 0x128107)
>>> #12 0x00007f8000be9a47 texture_sub_image (iris_dri.so + 0x29ba47)
>>> #13 0x00007f8000becd0c texsubimage_err (iris_dri.so + 0x29ed0c)
>>> #14 0x00007f8000bf2939 _mesa_TexSubImage2D (iris_dri.so + 0x2a4939)
>>> #15 0x00007f800213831f glamor_upload_boxes (libglamoregl.so + 0x1e31f)
>>> #16 0x00007f800213856f glamor_upload_region (libglamoregl.so + 0x1e56f)
>>> #17 0x00007f800212aea6 glamor_put_image (libglamoregl.so + 0x10ea6)
>>> #18 0x00005b3469a4d79c damagePutImage (Xorg + 0x14579c)
>>> #19 0x00005b3469a00a7e ProcShmPutImage (Xorg + 0xf8a7e)
>>> #20 0x00005b3469965a2b Dispatch (Xorg + 0x5da2b)
>>> #21 0x00005b3469969b04 dix_main (Xorg + 0x61b04)
>>> #22 0x00007f80029cc082 __libc_start_main (libc.so.6 + 0x27082)
>>> #23 0x00005b3469952e6e _start (Xorg + 0x4ae6e)
>>>
>>> Disassembly of the surrounding code:
>>>
>>>     0x00007596ae8c82fb <+123>:    ja     0x7596ae8c8338
>>> <__memmove_avx_unaligned_erms+184>
>>>     0x00007596ae8c82fd <+125>:    jb     0x7596ae8c8304
>>> <__memmove_avx_unaligned_erms+132>
>>>     0x00007596ae8c82ff <+127>:    movzbl (%rsi),%ecx
>>>     0x00007596ae8c8302 <+130>:    mov    %cl,(%rdi)
>>>     0x00007596ae8c8304 <+132>:    retq
>>>     0x00007596ae8c8305 <+133>:    vmovdqu (%rsi),%xmm0
>>>     0x00007596ae8c8309 <+137>:    vmovdqu -0x10(%rsi,%rdx,1),%xmm1
>>> => 0x00007596ae8c830f <+143>:    vmovdqu %xmm0,(%rdi)
>>>     0x00007596ae8c8313 <+147>:    vmovdqu %xmm1,-0x10(%rdi,%rdx,1)
>>>     0x00007596ae8c8319 <+153>:    retq
>>>
>>>
>>> I don't see any related kernel or Xen messages at this time. Xorg's SEGV
>>> handler prints also:
>>>
>>>      (EE) Segmentation fault at address 0x3c010
>>>
>>> Git bisect says it's bdd8b6c98239cad ("drm/i915: replace X86_FEATURE_PAT
>>> with pat_enabled()"), and indeed with this commit reverted on top of
>>> 5.17.5 everything works fine.
>>>
>>> I guess this part of dom0's boot dmesg may be relevant:
>>>
>>> [    0.000949] x86/PAT: MTRRs disabled, skipping PAT initialization too.
>>> [    0.000953] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WC  WP 
>>> UC  UC
>>>
>>> Originally reported at
>>> https://github.com/QubesOS/qubes-issues/issues/7479
>>>
>>>   #regzbot introduced bdd8b6c98239cad
>>> #regzbot monitor: https://github.com/QubesOS/qubes-issues/issues/7479
>>>
> 


More information about the Intel-gfx mailing list