HMM related use-after-free with amdgpu
Jason Gunthorpe
jgg at mellanox.com
Mon Jul 15 17:25:21 UTC 2019
On Mon, Jul 15, 2019 at 06:51:06PM +0200, Michel Dänzer wrote:
>
> With a KASAN enabled kernel built from amd-staging-drm-next, the
> attached use-after-free is pretty reliably detected during a piglit gpu run.
Does this branch you are testing have the hmm.git merged? I think from
the name it does not?
Use after free's of this nature were something that was fixed in
hmm.git..
I don't see an obvious way you can hit something like this with the
new code arrangement..
> P.S. With my standard kernels without KASAN (currently 5.2.y + drm-next
> changes for 5.3), I'm having trouble lately completing a piglit run,
> running into various issues which look like memory corruption, so might
> be related.
I'm skeptical that the AMDGPU implementation of the locking around the
hmm_range & mirror is working, it doesn'r follow the perscribed
pattern at least.
> Jul 15 18:09:29 kaveri kernel: [ 560.388751][T12568] ==================================================================
> Jul 15 18:09:29 kaveri kernel: [ 560.389063][T12568] BUG: KASAN: use-after-free in __mmu_notifier_release+0x286/0x3e0
> Jul 15 18:09:29 kaveri kernel: [ 560.389068][T12568] Read of size 8 at addr ffff88835e1c7cb0 by task amd_pinned_memo/12568
> Jul 15 18:09:29 kaveri kernel: [ 560.389071][T12568]
> Jul 15 18:09:29 kaveri kernel: [ 560.389077][T12568] CPU: 9 PID: 12568 Comm: amd_pinned_memo Tainted: G OE 5.2.0-rc1-00811-g2ad5a7d31bdf #125
> Jul 15 18:09:29 kaveri kernel: [ 560.389080][T12568] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK (MS-7A34), BIOS 1.80 09/13/2017
> Jul 15 18:09:29 kaveri kernel: [ 560.389084][T12568] Call Trace:
> Jul 15 18:09:29 kaveri kernel: [ 560.389091][T12568] dump_stack+0x7c/0xc0
> Jul 15 18:09:29 kaveri kernel: [ 560.389097][T12568] ? __mmu_notifier_release+0x286/0x3e0
> Jul 15 18:09:29 kaveri kernel: [ 560.389101][T12568] print_address_description+0x65/0x22e
> Jul 15 18:09:29 kaveri kernel: [ 560.389106][T12568] ? __mmu_notifier_release+0x286/0x3e0
> Jul 15 18:09:29 kaveri kernel: [ 560.389110][T12568] ? __mmu_notifier_release+0x286/0x3e0
> Jul 15 18:09:29 kaveri kernel: [ 560.389115][T12568] __kasan_report.cold.3+0x1a/0x3d
> Jul 15 18:09:29 kaveri kernel: [ 560.389122][T12568] ? __mmu_notifier_release+0x286/0x3e0
> Jul 15 18:09:29 kaveri kernel: [ 560.389128][T12568] kasan_report+0xe/0x20
> Jul 15 18:09:29 kaveri kernel: [ 560.389132][T12568] __mmu_notifier_release+0x286/0x3e0
So we are iterating over the mn list and touched free'd memory
> Jul 15 18:09:29 kaveri kernel: [ 560.389309][T12568] Allocated by task 12568:
> Jul 15 18:09:29 kaveri kernel: [ 560.389314][T12568] save_stack+0x19/0x80
> Jul 15 18:09:29 kaveri kernel: [ 560.389318][T12568] __kasan_kmalloc.constprop.8+0xc1/0xd0
> Jul 15 18:09:29 kaveri kernel: [ 560.389323][T12568] hmm_get_or_create+0x8f/0x3f0
The memory is probably a struct hmm
> Jul 15 18:09:29 kaveri kernel: [ 560.389857][T12568] Freed by task 12568:
> Jul 15 18:09:29 kaveri kernel: [ 560.389860][T12568] save_stack+0x19/0x80
> Jul 15 18:09:29 kaveri kernel: [ 560.389864][T12568] __kasan_slab_free+0x125/0x170
> Jul 15 18:09:29 kaveri kernel: [ 560.389867][T12568] kfree+0xe2/0x290
> Jul 15 18:09:29 kaveri kernel: [ 560.389871][T12568] __mmu_notifier_release+0xef/0x3e0
> Jul 15 18:09:29 kaveri kernel: [ 560.389875][T12568] exit_mmap+0x93/0x400
And the free was also done in notifier_release (presumably the
backtrace is corrupt and this is really in the old hmm_release ->
hmm_put -> hmm_free -> kfree call chain)
Which was not OK, as __mmu_notifier_release doesn't use a 'safe' hlist
iterator, so the release callback can never trigger kfree of a struct
mmu_notifier.
The new hmm.git code does not call kfree from release, it schedules
that through a SRCU which won't run until __mmu_notifier_release
returns, by definition.
So should be fixed.
Jason
More information about the amd-gfx
mailing list