Re: Kernel panic while “ modprobe amdkfd ; modprobe -r amdkfd ; 4.14.35 kernel

Kuehling, Felix Felix.Kuehling at amd.com
Mon Mar 25 22:36:46 UTC 2019


On 2019-03-22 12:58 p.m., John Donnelly wrote:
> Hello ,
>
> I am investigating a issue reported by a test group concerning this driver.  Their test loads and unloads every kernel module included in the 4.14.35 kernel release . You don’t even need a AMD platform . It occurs on any Intel,  or a  KVM VM instance too.
>
> Kernel panic while “  modprobe amdkfd ;  modprobe -r amdkfd  “
>
> [  329.425334]  ? __slab_free+0x9b/0x2ba
> [  329.427836]  ? process_slab+0x3c1/0x45c
> [  329.430336]  dev_printk_emit+0x4e/0x65
> [  329.432829]  __dev_printk+0x46/0x8b
> [  329.435183]  _dev_info+0x6c/0x85
> [  329.437435]  ? kfree+0x141/0x182
> [  329.439646]  kfd_module_exit+0x37/0x39 [amdkfd]
> [  329.442258]  SyS_delete_module+0x1c3/0x26f
> [  329.444722]  ? entry_SYSCALL_64_after_hwframe+0xaa/0x0
> [  329.447479]  ? entry_SYSCALL_64_after_hwframe+0xa3/0x0
> [  329.450206]  ? entry_SYSCALL_64_after_hwframe+0x9c/0x0
> [  329.452912]  ? entry_SYSCALL_64_after_hwframe+0x95/0x0
> [  329.455586]  do_syscall_64+0x79/0x1ae
> [  329.457766]  entry_SYSCALL_64_after_hwframe+0x151/0x0
> [  329.460369] RIP: 0033:0x7f1757a1b457
> [  329.462502] RSP: 002b:00007ffd62ce1f48 EFLAGS: 00000206 ORIG_RAX:
>
>
>
> Sometimes  the unload works but the message logged is garbage:
>
> [root at jpd-vmbase02 ~]# modprobe -r amdkfd
> [  144.449981] ???????????? hn??蟟??xn??ן??kfd: Removed module

I think this was caused by using dev_info with a kfd_device that didn't 
exist any more. It was fixed by this commit:

commit c393e9b2d51540b74e18e555df14706098dbf2cc
Author: Randy Dunlap <rdunlap at infradead.org>
Date:   Mon Nov 13 18:08:48 2017 +0200

     drm/amdkfd: fix amdkfd use-after-free GP fault

     Fix GP fault caused by dev_info() reference to a struct device*
     after the device has been freed (use after free).
     kfd_chardev_exit() frees the device so 'kfd_device' should not
     be used after calling kfd_chardev_exit().

     Signed-off-by: Randy Dunlap <rdunlap at infradead.org>
     Signed-off-by: Oded Gabbay <oded.gabbay at gmail.com>


>
>
> Is  this something one of team members could have possibly corrected in an upstream version ?

In current kernels, amdkfd is no longer a separate KO. It's part of 
amdgpu now. Also see above. This bug is probably not reproducible any more.

Regards,
   Felix


>
> #define KFD_DRIVER_DESC         "Standalone HSA driver for AMD's GPUs"
> #define KFD_DRIVER_DATE         "20150421"
> #define KFD_DRIVER_MAJOR        0
> #define KFD_DRIVER_MINOR        7
> #define KFD_DRIVER_PATCHLEVEL   2
>
>
> Any advise welcome.
>
>
> Thank you,
>
> John
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


More information about the amd-gfx mailing list