<html>
    <head>
      <base href="https://bugs.freedesktop.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - amdgpu driver crash with kernel NULL pointer dereference"
   href="https://bugs.freedesktop.org/show_bug.cgi?id=111633">111633</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>amdgpu driver crash with kernel NULL pointer dereference
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>DRI
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>unspecified
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>x86-64 (AMD64)
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux (All)
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>not set
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>not set
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>DRM/AMDgpu
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>dri-devel@lists.freedesktop.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>vakevk+freedesktopbugzilla@gmail.com
          </td>
        </tr></table>
      <p>
        <div>
        <pre>I am running on arch linux: Linux arch 5.2.13-arch1-1-ARCH #1 SMP PREEMPT Fri
Sep 6 17:52:33 UTC 2019 x86_64 GNU/Linux

I am running wayland via sway.

My gpu is a Radeon RX Vega 64.

While in my sway session the image on my screen froze but audio from a video
continued to play. I was able to ssh in from a different machine and found this
message with dmesg:

BUG: kernel NULL pointer dereference, address: 0000000000000360
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP PTI
CPU: 1 PID: 12766 Comm: kworker/u16:0 Not tainted 5.2.11-arch1-1-ARCH #1
Hardware name: ASUS All Series/Z87-PLUS, BIOS 2103 08/15/2014
Workqueue: events_unbound commit_work [drm_kms_helper]
RIP: 0010:dc_stream_retain+0x5/0x20 [amdgpu]
<Code and registers omitted. Can post if important and someone reassures me
that it doesn't sensitive information since it looks like a memory dump.>
Call Trace:
 dc_resource_state_copy_construct+0xa0/0xf0 [amdgpu]
 dc_commit_updates_for_stream+0xa63/0xc20 [amdgpu]
 amdgpu_dm_atomic_commit_tail+0xabe/0x19a0 [amdgpu]
 ? commit_tail+0x3c/0x70 [drm_kms_helper]
 commit_tail+0x3c/0x70 [drm_kms_helper]
 process_one_work+0x1d1/0x3e0
 worker_thread+0x4a/0x3d0
 kthread+0xfb/0x130
 ? process_one_work+0x3e0/0x3e0
 ? kthread_park+0x80/0x80
 ret_from_fork+0x35/0x40
Modules linked in: snd_seq_dummy snd_seq tun nft_ct nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 libcrc32c nf_tables_set cfg80211 nf_tables nfnetlink 8021q garp
mrp stp llc intel_rapl nls_iso8859_1 nls_cp437 vfat fat snd_hda_codec_realtek
snd_hda_codec_generic fuse ledtrig_audio ofpart snd_hda_codec_hdmi cmdlinepart
btusb intel_spi_platform snd_hda_intel btrtl x86_pkg_temp_thermal intel_spi
btbcm intel_powerclamp spi_nor btintel eeepc_wmi snd_usb_audio coretemp
snd_hda_codec uvcvideo asus_wmi bluetooth snd_usbmidi_lib iTCO_wdt kvm_intel
snd_hda_core videobuf2_vmalloc mei_hdcp mtd iTCO_vendor_support mxm_wmi
wmi_bmof sparse_keymap snd_hwdep snd_rawmidi snd_seq_device videobuf2_memops
snd_pcm videobuf2_v4l2 snd_timer videobuf2_common snd videodev kvm irqbypass
input_leds ecdh_generic intel_cstate mousedev rfkill intel_uncore mei_me joydev
cdc_acm media ecc e1000e intel_rapl_perf mei soundcore pcc_cpufreq i2c_i801
lpc_ich pcspkr wmi evdev mac_hid ip_tables x_tables ext4
 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid dm_crypt dm_mod
sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ahci
libahci aesni_intel libata aes_x86_64 xhci_pci crypto_simd cryptd glue_helper
xhci_hcd scsi_mod ehci_pci ehci_hcd amdgpu gpu_sched i2c_algo_bit ttm
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgart
CR2: 0000000000000360
---[ end trace 08eaa2e1d713ba4d ]---

At this point I tried killing the sway process but did not succeed even with
`kill -9`. Not even `sudo reboot` completed despite killing the ssh session. I
had to hard reset the machine.

Potentially related is that since roughly a week I have been experiencing
intermittent screen freezes from time to time that would resolve themselves
after about 10 seconds with a message like this in dmesg:

drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR*
[CRTC:47:crtc-0] flip_done timed out
[drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:47:crtc-0] hw_done or
flip_done timed out</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are the assignee for the bug.</li>
      </ul>
    </body>
</html>