HMM related use-after-free with amdgpu

Jason Gunthorpe jgg at mellanox.com
Mon Jul 15 17:25:21 UTC 2019


On Mon, Jul 15, 2019 at 06:51:06PM +0200, Michel Dänzer wrote:
> 
> With a KASAN enabled kernel built from amd-staging-drm-next, the
> attached use-after-free is pretty reliably detected during a piglit gpu run.

Does this branch you are testing have the hmm.git merged? I think from
the name it does not?

Use after free's of this nature were something that was fixed in
hmm.git..

I don't see an obvious way you can hit something like this with the
new code arrangement..

> P.S. With my standard kernels without KASAN (currently 5.2.y + drm-next
> changes for 5.3), I'm having trouble lately completing a piglit run,
> running into various issues which look like memory corruption, so might
> be related.

I'm skeptical that the AMDGPU implementation of the locking around the
hmm_range & mirror is working, it doesn'r follow the perscribed
pattern at least.

> Jul 15 18:09:29 kaveri kernel: [  560.388751][T12568] ==================================================================
> Jul 15 18:09:29 kaveri kernel: [  560.389063][T12568] BUG: KASAN: use-after-free in __mmu_notifier_release+0x286/0x3e0
> Jul 15 18:09:29 kaveri kernel: [  560.389068][T12568] Read of size 8 at addr ffff88835e1c7cb0 by task amd_pinned_memo/12568
> Jul 15 18:09:29 kaveri kernel: [  560.389071][T12568] 
> Jul 15 18:09:29 kaveri kernel: [  560.389077][T12568] CPU: 9 PID: 12568 Comm: amd_pinned_memo Tainted: G           OE     5.2.0-rc1-00811-g2ad5a7d31bdf #125
> Jul 15 18:09:29 kaveri kernel: [  560.389080][T12568] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK (MS-7A34), BIOS 1.80 09/13/2017
> Jul 15 18:09:29 kaveri kernel: [  560.389084][T12568] Call Trace:
> Jul 15 18:09:29 kaveri kernel: [  560.389091][T12568]  dump_stack+0x7c/0xc0
> Jul 15 18:09:29 kaveri kernel: [  560.389097][T12568]  ? __mmu_notifier_release+0x286/0x3e0
> Jul 15 18:09:29 kaveri kernel: [  560.389101][T12568]  print_address_description+0x65/0x22e
> Jul 15 18:09:29 kaveri kernel: [  560.389106][T12568]  ? __mmu_notifier_release+0x286/0x3e0
> Jul 15 18:09:29 kaveri kernel: [  560.389110][T12568]  ? __mmu_notifier_release+0x286/0x3e0
> Jul 15 18:09:29 kaveri kernel: [  560.389115][T12568]  __kasan_report.cold.3+0x1a/0x3d
> Jul 15 18:09:29 kaveri kernel: [  560.389122][T12568]  ? __mmu_notifier_release+0x286/0x3e0
> Jul 15 18:09:29 kaveri kernel: [  560.389128][T12568]  kasan_report+0xe/0x20
> Jul 15 18:09:29 kaveri kernel: [  560.389132][T12568]  __mmu_notifier_release+0x286/0x3e0

So we are iterating over the mn list and touched free'd memory

> Jul 15 18:09:29 kaveri kernel: [  560.389309][T12568] Allocated by task 12568:
> Jul 15 18:09:29 kaveri kernel: [  560.389314][T12568]  save_stack+0x19/0x80
> Jul 15 18:09:29 kaveri kernel: [  560.389318][T12568]  __kasan_kmalloc.constprop.8+0xc1/0xd0
> Jul 15 18:09:29 kaveri kernel: [  560.389323][T12568]  hmm_get_or_create+0x8f/0x3f0

The memory is probably a struct hmm

> Jul 15 18:09:29 kaveri kernel: [  560.389857][T12568] Freed by task 12568:
> Jul 15 18:09:29 kaveri kernel: [  560.389860][T12568]  save_stack+0x19/0x80
> Jul 15 18:09:29 kaveri kernel: [  560.389864][T12568]  __kasan_slab_free+0x125/0x170
> Jul 15 18:09:29 kaveri kernel: [  560.389867][T12568]  kfree+0xe2/0x290
> Jul 15 18:09:29 kaveri kernel: [  560.389871][T12568]  __mmu_notifier_release+0xef/0x3e0
> Jul 15 18:09:29 kaveri kernel: [  560.389875][T12568]  exit_mmap+0x93/0x400

And the free was also done in notifier_release (presumably the
backtrace is corrupt and this is really in the old hmm_release ->
hmm_put -> hmm_free -> kfree call chain)

Which was not OK, as __mmu_notifier_release doesn't use a 'safe' hlist
iterator, so the release callback can never trigger kfree of a struct
mmu_notifier.

The new hmm.git code does not call kfree from release, it schedules
that through a SRCU which won't run until __mmu_notifier_release
returns, by definition. 

So should be fixed.

Jason


More information about the amd-gfx mailing list