[Bug 97500] Cannot unbind GPU from AMDGPU

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sun Sep 25 21:06:31 UTC 2016


https://bugs.freedesktop.org/show_bug.cgi?id=97500

--- Comment #6 from Grazvydas Ignotas <notasas at gmail.com> ---
Created attachment 126782
  --> https://bugs.freedesktop.org/attachment.cgi?id=126782&action=edit
dmesg of powerplay crash

I've sent some patches with fixes, but there seem to be multiple other issues.

One of the problems is that struct amdgpu_i2c_chan contains struct drm_dp_aux,
and on amdgpu_i2c_fini() call, which frees amdgpu_i2c_chan, drm_dp_aux is still
in use. This causes memory corruption. Don't know how to solve this, perhaps
somebody knows this code better?
A hack can be used to trade this corruption for a leak:

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
index 34bab61..8beaee0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
@@ -221,6 +221,8 @@ void amdgpu_i2c_destroy(struct amdgpu_i2c_chan *i2c)
        if (!i2c)
                return;
        i2c_del_adapter(&i2c->adapter);
+       if (i2c->has_aux)
+               return;
        kfree(i2c);
 }

---
Another one is TTM leak, can also be seen in this attachment.
CONFIG_DMA_API_DEBUG reports:

WARNING: CPU: 3 PID: 1666 at lib/dma-debug.c:976
dma_debug_device_change+0x1ca/0x240
pci 0000:01:00.0: DMA-API: device driver has pending DMA allocations while
released from device [count=202]
One of leaked entries details: [device address=0x00000003dcfe9000] [size=4096
bytes] [mapped with DMA_BIDIRECTIONAL] [mapped as coherent]

Mapped at:
 [<ffffffff8163d941>] debug_dma_alloc_coherent+0x41/0x110
 [<ffffffffa0728d84>] ttm_dma_populate+0xb64/0x1150 [ttm]
 [<ffffffffa0b770ac>] amdgpu_ttm_tt_populate+0x35c/0x510 [amdgpu]
 [<ffffffffa0719141>] ttm_tt_bind+0x71/0xd0 [ttm]
 [<ffffffffa071c9d8>] ttm_bo_handle_move_mem+0xa08/0xaa0 [ttm]

---
Next one is powerplay crash in
drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c:3336 ,
dpm_table->sclk_table.count is 0 so array access ends up badly. Could be
related to "DPM is already running right now, no need to enable DPM!" message,
full dmesg attached.

I won't have time to work on this for a while, but maybe somebody else does.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20160925/95fd301f/attachment.html>


More information about the dri-devel mailing list