[Bug 218993] New: SIGBUS with amdgpu on multi-GPU system on X server with DRI3/GBM

bugzilla-daemon at kernel.org bugzilla-daemon at kernel.org
Thu Jun 27 10:30:23 UTC 2024


https://bugzilla.kernel.org/show_bug.cgi?id=218993

            Bug ID: 218993
           Summary: SIGBUS with amdgpu on multi-GPU system on X server
                    with DRI3/GBM
           Product: Drivers
           Version: 2.5
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri at kernel-bugs.osdl.org
          Reporter: adaha at cendio.se
        Regression: No

Created attachment 306503
  --> https://bugzilla.kernel.org/attachment.cgi?id=306503&action=edit
trace before crash, Xvnc on Ryzen 5 7600, vkcube on Arc A380

I ran into a SIGBUS when using multiple GPUs and DRI with an X server that has
GPU acceleration (TigerVNC's Xvnc). This happened on a machine with:
OS: Fedora 40 running 6.9.5-200.fc40.x86_64
iGPU: Ryzen 5 7600
dGPU: RTX 4060 | Arc A380 | RX 7600

The issue occurs when the X server is configured to use an AMD rendernode, and
an application wants to use a non-AMD rendernode.

When opening the AMD rendernode using gbm_create_device(), a SIGBUS will occur
when gbm_bo_map() is called, if the application wants to use another rendernode
that is not an AMD GPU.

In my setup, /dev/dri/renderD128 is the AMD iGPU, and /dev/dri/renderD129 is an
RTX 4060.

If I run the X server with
$ Xvnc :50 -rendernode /dev/dri/renderD128

and vkcube with renderD129 on the X server
$ DISPLAY=:50 vkcube --gpu_number 1

I get the sigbus:
(EE) 
(EE) Backtrace:
(EE) 0: Xvnc (xorg_backtrace+0x82) [0x560c52b47d42]
(EE) 1: Xvnc (0x560c52991000+0x1b7f4c) [0x560c52b48f4c]
(EE) 2: /lib64/libc.so.6 (0x7f0c99613000+0x40710) [0x7f0c99653710]
(EE) 3: /lib64/libpixman-1.so.0 (0x7f0c99ed0000+0x8a2d0) [0x7f0c99f5a2d0]
(EE) 4: /lib64/libpixman-1.so.0 (pixman_blt+0x81) [0x7f0c99ede8d1]
(EE) 5: Xvnc (vncDRI3SyncPixmapFromGPU+0x10e) [0x560c529f303e]
(EE) 6: Xvnc (0x560c52991000+0x622c3) [0x560c529f32c3]
(EE) 7: Xvnc (dri3_pixmap_from_fds+0xcf) [0x560c52a7fdaf]
(EE) 8: Xvnc (0x560c52991000+0xf1309) [0x560c52a82309]
(EE) 9: Xvnc (Dispatch+0x426) [0x560c52ae3f56]
(EE) 10: Xvnc (dix_main+0x46a) [0x560c52af2d4a]
(EE) 11: /lib64/libc.so.6 (0x7f0c99613000+0x2a088) [0x7f0c9963d088]
(EE) 12: /lib64/libc.so.6 (__libc_start_main+0x8b) [0x7f0c9963d14b]
(EE) 13: Xvnc (_start+0x25) [0x560c529eed75]
(EE) 
(EE) Bus error at address 0x7f0c8e211000
(EE) 
Fatal server error:
(EE) Caught signal 7 (Bus error). Server aborting
(EE) 
Aborted (core dumped)

The same crash occurs when running vkcube on an Arc GPU (A380).

However, running the X server on an Arc or Nvidia GPU, and vkcube on the AMD
GPU, does not cause a crash. Neither does running the X server on AMD, and
vkcube on a different AMD GPU (iGPU & RX 7600 for example).

I've attached a stacktrace with the last call to mmap() before the crash.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.


More information about the dri-devel mailing list