HMM related? memory leakage

Joe Barnett thejoe at gmail.com
Tue Oct 1 15:13:04 UTC 2019


In the development version of ubuntu, using a 5.3 kernel, running the
dolphin emulator appears to leak memory (there may be other apps that
trigger the same issue, but haven't run into them).  The "used" memory
reported by top grows until the app exits, and does not get freed at that
time.  This is on a dell xps 2-in-1 with hybrid intel/amd-vega-m graphics,
and DRI_PRIME=1.  Some more details at downstream bug report:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1844962

Running a kernel bisection to find the offending commit shows this:

899fbde1464639e3d12eaffdad8481a59b367fcb is the first bad commit
commit 899fbde1464639e3d12eaffdad8481a59b367fcb
Author: Philip Yang <Philip.Yang at amd.com>
Date:   Thu Dec 13 15:35:28 2018 -0500

    drm/amdgpu: replace get_user_pages with HMM mirror helpers

    Use HMM helper function hmm_vma_fault() to get physical pages backing
    userptr and start CPU page table update track of those pages. Then use
    hmm_vma_range_done() to check if those pages are updated before
    amdgpu_cs_submit for gfx or before user queues are resumed for kfd.

    If userptr pages are updated, for gfx, amdgpu_cs_ioctl will restart
    from scratch, for kfd, restore worker is rescheduled to retry.

    HMM simplify the CPU page table concurrent update check, so remove
    guptasklock, mmu_invalidations, last_set_pages fields from
    amdgpu_ttm_tt struct.

    HMM does not pin the page (increase page ref count), so remove related
    operations like release_pages(), put_page(), mark_page_dirty().

    Signed-off-by: Philip Yang <Philip.Yang at amd.com>
    Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
    Reviewed-by: Christian König <christian.koenig at amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher at amd.com>

:040000 040000 0c9f0e2e82e5e4d2d3a4c0daea22eb911244b771
fdcdc7c80f5383486962edf4561e205b55bd8c21 M drivers


$ git bisect log
# bad: [f74c2bb98776e2de508f4d607cd519873065118e] Linux 5.3-rc8
# good: [1c163f4c7b3f621efff9b28a47abb36f7378d783] Linux 5.0
git bisect start 'v5.3-rc8' 'v5.0'
# good: [a2d635decbfa9c1e4ae15cb05b68b2559f7f827c] Merge tag
'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm
git bisect good a2d635decbfa9c1e4ae15cb05b68b2559f7f827c
# good: [a2d635decbfa9c1e4ae15cb05b68b2559f7f827c] Merge tag
'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm
git bisect good a2d635decbfa9c1e4ae15cb05b68b2559f7f827c
# good: [8f6ccf6159aed1f04c6d179f61f6fb2691261e84] Merge tag 'clone3-v5.3'
of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
git bisect good 8f6ccf6159aed1f04c6d179f61f6fb2691261e84
# good: [8f6ccf6159aed1f04c6d179f61f6fb2691261e84] Merge tag 'clone3-v5.3'
of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
git bisect good 8f6ccf6159aed1f04c6d179f61f6fb2691261e84
# bad: [be8454afc50f43016ca8b6130d9673bdd0bd56ec] Merge tag
'drm-next-2019-07-16' of git://anongit.freedesktop.org/drm/drm
git bisect bad be8454afc50f43016ca8b6130d9673bdd0bd56ec
# bad: [be8454afc50f43016ca8b6130d9673bdd0bd56ec] Merge tag
'drm-next-2019-07-16' of git://anongit.freedesktop.org/drm/drm
git bisect bad be8454afc50f43016ca8b6130d9673bdd0bd56ec
# good: [d72619706abc4aa7e540ea882dae883cee7cc3b3] Merge tag 'tty-5.3-rc1'
of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect good d72619706abc4aa7e540ea882dae883cee7cc3b3
# bad: [83145f110eb2ada9d54fcbcf416c02de126381c1] drm/amdgpu: don't
invalidate caches in RELEASE_MEM, only do the writeback
git bisect bad 83145f110eb2ada9d54fcbcf416c02de126381c1
# bad: [b239c01727459ba08c44b79e6225d3c58723f282] drm/amdgpu: add mcbp
driver parameter
git bisect bad b239c01727459ba08c44b79e6225d3c58723f282
# good: [e1dc68a4b149d47536cd001d0d0abadbb62d37bd] drm: atmel-hlcdc: avoid
initializing cfg with zero
git bisect good e1dc68a4b149d47536cd001d0d0abadbb62d37bd
# bad: [c53e4db71276bf257b09010935a04bdafddd458e] drm/amdgpu: cancel
late_init_work before gpu reset
git bisect bad c53e4db71276bf257b09010935a04bdafddd458e
# good: [2da4605dce38b84cd2e5b86686f43adae1b2cacb] drm/amd/display: Use DCN
functions instead of DCE
git bisect good 2da4605dce38b84cd2e5b86686f43adae1b2cacb
# bad: [1c1e53f7f2ce191e6787d3d0648fe8ce7088ceaa] drm/amd/doc: Add XGMI
sysfs documentation
git bisect bad 1c1e53f7f2ce191e6787d3d0648fe8ce7088ceaa
# good: [89cd9d23e9a74d94f0db5bbbaf2ef1f6ede36ae5] drm/amdkfd: avoid HMM
change cause circular lock
git bisect good 89cd9d23e9a74d94f0db5bbbaf2ef1f6ede36ae5
# bad: [0803e7a9e850f9d6397c594d6c6deac9b2b6d696] drm/amdkfd: Allocate hiq
and sdma mqd from mqd trunk
git bisect bad 0803e7a9e850f9d6397c594d6c6deac9b2b6d696
# bad: [972fcdb52fe865a2f639e3200b97e648f34a0f41] drm/amdkfd: Introduce
asic-specific mqd_manager_init function
git bisect bad 972fcdb52fe865a2f639e3200b97e648f34a0f41
# bad: [6c55d6e90e68a4789cbd72a0287026d4dfb4a9f9] drm/amdkfd: support
concurrent userptr update for HMM
git bisect bad 6c55d6e90e68a4789cbd72a0287026d4dfb4a9f9
# bad: [ad595b8634f36f04bf69bef4eff854091d94f8b3] drm/amdgpu: fix HMM
config dependency issue
git bisect bad ad595b8634f36f04bf69bef4eff854091d94f8b3
# bad: [899fbde1464639e3d12eaffdad8481a59b367fcb] drm/amdgpu: replace
get_user_pages with HMM mirror helpers
git bisect bad 899fbde1464639e3d12eaffdad8481a59b367fcb
# first bad commit: [899fbde1464639e3d12eaffdad8481a59b367fcb] drm/amdgpu:
replace get_user_pages with HMM mirror helpers

Let me know if I can provide more information.
Thanks,
-Joe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20191001/7147d31f/attachment.html>


More information about the amd-gfx mailing list