[PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v4

Fri Sep 28 15:07:04 UTC 2018

For B path, we take mm->mmap_sem, then call hmm_vma_get_pfns() or 
get_user_pages(). This is obvious.

For A path, mmu notifier 
mmu_notifier_invalidate_range_start()/mmu_notifier_invalidate_range_end() 
is called in many places, and the calling path is quit complicated 
inside mm, it's not obvious. I checked many of the them, for example:

do_munmap()
   down_write(&mm->mmap_sem)
   arch_unmap()
     mpx_notify_unmap()...
        zap_bt_entries_mapping()
          zap_page_range()
  up_write(&mm->mmap_sem)

void zap_page_range(struct vm_area_struct *vma, unsigned long start,
         unsigned long size)
{
     struct mm_struct *mm = vma->vm_mm;
     struct mmu_gather tlb;
     unsigned long end = start + size;

     lru_add_drain();
     tlb_gather_mmu(&tlb, mm, start, end);
     update_hiwater_rss(mm);
     mmu_notifier_invalidate_range_start(mm, start, end);
     for ( ; vma && vma->vm_start < end; vma = vma->vm_next)
         unmap_single_vma(&tlb, vma, start, end, NULL);
     mmu_notifier_invalidate_range_end(mm, start, end);
     tlb_finish_mmu(&tlb, start, end);
}

So AFAIK it's okay without invalidate_range_end() callback.

Regards,
Philip

On 2018-09-28 01:25 AM, Koenig, Christian wrote:
> No, that is incorrect as well :)
>
> The mmap_sem isn't necessary taken during page table updates.
>
> What you could do is replace get_user_pages() directly with HMM. If 
> I'm not completely mistaken that should work as expected.
>
> Christian.
>
> Am 27.09.2018 22:18 schrieb "Yang, Philip" <Philip.Yang at amd.com>:
> I was trying to understand the way how HMM handle this concurrent 
> issue and how we handle it in amdgpu_ttm_tt_userptr_needs_pages() and 
> amdgpu_ttm_tt_affect_userptr(). HMM uses range->valid flag, we use 
> gtt->mmu_invalidations and gtt->last_set_pages. Both use the same lock 
> plus flag idea actually.
>
> Thanks for the information, now I understand fence 
> ttm_eu_fence_buffer_objects() put to BOs will block CPU page table 
> update. This is another side of this concurrent issue I didn't know.
>
> I had same worry that it has issue without invalidate_range_end() 
> callback as the calling sequence Felix lists. Now I think it's fine 
> after taking a look again today because of mm->mmap_sem usage, this is 
> my understanding:
>
> A path:
>
> down_write(&mm->mmap_sem);
> mmu_notifier_invalidate_range_start()
>     take_lock()
>     release_lock()
> CPU page table update
> mmu_notifier_invalidate_range_end()
> up_write(&mm->mmap_sem);
>
> B path:
>
> again:
> down_read(&mm->mmap_sem);
> hmm_vma_get_pfns()
> up_read(&mm->mmap_sem);
> ....
> ....
> take_lock()
> if (!hmm_vma_range_done()) {
>    release_lock()
>    goto again
> }
> submit command job...
> release_lock()
>
> If you agree, I will submit patch v5 with some minor changes, and 
> submit another patch to replace get_user_page() with HMM.
>
> Regards,
> Philip
>
> On 2018-09-27 11:36 AM, Christian König wrote:
>> Yeah, I've read that as well.
>>
>> My best guess is that we just need to add a call to 
>> hmm_vma_range_done() after taking the lock and also replace 
>> get_user_pages() with hmm_vma_get_pfns().
>>
>> But I'm still not 100% sure how all of that is supposed to work together.
>>
>> Regards,
>> Christian.
>>
>> Am 27.09.2018 um 16:50 schrieb Kuehling, Felix:
>>>
>>> I think the answer is here: 
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/hmm.rst#n216
>>>
>>> Regards,
>>>
>>> Felix
>>>
>>> *From:*Koenig, Christian
>>> *Sent:* Thursday, September 27, 2018 10:30 AM
>>> *To:* Kuehling, Felix <Felix.Kuehling at amd.com>
>>> *Cc:* j.glisse at gmail.com; Yang, Philip <Philip.Yang at amd.com>; 
>>> amd-gfx at lists.freedesktop.org
>>> *Subject:* RE: [PATCH] drm/amdgpu: use HMM mirror callback to 
>>> replace mmu notifier v4
>>>
>>> At least with get_user_pages() that is perfectly possible.
>>>
>>> For HMM it could be that this is prevented somehow.
>>>
>>> Christian.
>>>
>>> Am 27.09.2018 16:27 schrieb "Kuehling, Felix" 
>>> <Felix.Kuehling at amd.com <mailto:Felix.Kuehling at amd.com>>:
>>>
>>> > In this case you can end up accessing pages which are invalidated 
>>> while get_user_pages is in process.
>>>
>>> What’s the sequence of events you have in mind? Something like this?
>>>
>>>   * Page table is updated and triggers MMU notifier
>>>   * amdgpu MMU notifier runs and waits for pending CS to finish
>>>     while holding the read lock
>>>   * New CS starts just after invalidate_range_start MMU notifier
>>>     finishes but before the page table update is done
>>>   * get_user_pages returns outdated physical addresses
>>>
>>> I hope that’s not actually possible and that get_user_pages or 
>>> hmm_vma_fault would block until the page table update is done. That 
>>> is, invalidate_range_start marks the start of a page table update, 
>>> and while that update is in progress, get_user_pages or 
>>> hmm_vma_fault block. Jerome, can you comment on that?
>>>
>>> Thanks,
>>>   Felix
>>>
>>> *From:*Koenig, Christian
>>> *Sent:* Thursday, September 27, 2018 9:59 AM
>>> *To:* Kuehling, Felix <Felix.Kuehling at amd.com 
>>> <mailto:Felix.Kuehling at amd.com>>
>>> *Cc:* Yang, Philip <Philip.Yang at amd.com 
>>> <mailto:Philip.Yang at amd.com>>; amd-gfx at lists.freedesktop.org 
>>> <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse 
>>> <j.glisse at gmail.com <mailto:j.glisse at gmail.com>>
>>> *Subject:* RE: [PATCH] drm/amdgpu: use HMM mirror callback to 
>>> replace mmu notifier v4
>>>
>>> Yeah I understand that, but again that won't work.
>>>
>>> In this case you can end up accessing pages which are invalidated 
>>> while get_user_pages is in process.
>>>
>>> Christian.
>>>
>>> Am 27.09.2018 15:41 schrieb "Kuehling, Felix" 
>>> <Felix.Kuehling at amd.com <mailto:Felix.Kuehling at amd.com>>:
>>>
>>> > I’m not planning to change that. I don’t think there is any need to change it.
>>>
>>> >
>>> > Yeah, but when HMM doesn't provide both the start and the end hock 
>>> of the invalidation this way won't work any more.
>>> >
>>> > So we need to find a solution for this,
>>> > Christian.
>>>
>>> My whole argument is that you don’t need to hold the read lock until 
>>> the invalidate_range_end. Just read_lock and read_unlock in the 
>>> invalidate_range_start function.
>>>
>>> Regards,
>>>
>>> Felix
>>>
>>> *From:*Koenig, Christian
>>> *Sent:* Thursday, September 27, 2018 9:22 AM
>>> *To:* Kuehling, Felix <Felix.Kuehling at amd.com 
>>> <mailto:Felix.Kuehling at amd.com>>
>>> *Cc:* Yang, Philip <Philip.Yang at amd.com 
>>> <mailto:Philip.Yang at amd.com>>; amd-gfx at lists.freedesktop.org 
>>> <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse 
>>> <j.glisse at gmail.com <mailto:j.glisse at gmail.com>>
>>> *Subject:* Re: [PATCH] drm/amdgpu: use HMM mirror callback to 
>>> replace mmu notifier v4
>>>
>>> Am 27.09.2018 um 15:18 schrieb Kuehling, Felix:
>>>
>>>     > The problem is here:
>>>     >
>>>
>>>     > ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
>>>
>>>     > amdgpu_mn_unlock(p->mn);
>>>
>>>     >
>>>     > We need to hold the lock until the fence is added to the
>>>     reservation object.
>>>     >
>>>     > Otherwise somebody could have changed the page tables just in
>>>     the moment between the check of
>>>     amdgpu_ttm_tt_userptr_needs_pages() and adding the fence to the
>>>     reservation object.
>>>
>>>     I’m not planning to change that. I don’t think there is any need
>>>     to change it.
>>>
>>>
>>> Yeah, but when HMM doesn't provide both the start and the end hock 
>>> of the invalidation this way won't work any more.
>>>
>>> So we need to find a solution for this,
>>> Christian.
>>>
>>>     Regards,
>>>
>>>       Felix
>>>
>>>     *From:*Koenig, Christian
>>>     *Sent:* Thursday, September 27, 2018 7:24 AM
>>>     *To:* Kuehling, Felix <Felix.Kuehling at amd.com>
>>>     <mailto:Felix.Kuehling at amd.com>
>>>     *Cc:* Yang, Philip <Philip.Yang at amd.com>
>>>     <mailto:Philip.Yang at amd.com>; amd-gfx at lists.freedesktop.org
>>>     <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse
>>>     <j.glisse at gmail.com> <mailto:j.glisse at gmail.com>
>>>     *Subject:* Re: [PATCH] drm/amdgpu: use HMM mirror callback to
>>>     replace mmu notifier v4
>>>
>>>     Am 27.09.2018 um 13:08 schrieb Kuehling, Felix:
>>>
>>>         > We double check that there wasn't any page table
>>>         modification while we prepared the submission and restart
>>>         the whole process when there actually was some update.
>>>         >
>>>         > The reason why we need to do this is here:
>>>         >
>>>
>>>         > ttm_eu_fence_buffer_objects(&p->ticket, &p->validated,
>>>         p->fence);
>>>         >        amdgpu_mn_unlock(p->mn);
>>>
>>>         >
>>>         > Only after the new fence is added to the buffer object we
>>>         can release the lock so that any invalidation will now block
>>>         on our command submission to finish before it modifies the
>>>         page table.
>>>
>>>         I don’t see why this requires holding the read-lock until
>>>         invalidate_range_end. amdgpu_ttm_tt_affect_userptr gets
>>>         called while the mn read-lock is held in
>>>         invalidate_range_start notifier.
>>>
>>>
>>>     That's not related to amdgpu_ttm_tt_affect_userptr(), this
>>>     function could actually be called outside the lock.
>>>
>>>     The problem is here:
>>>
>>>         ttm_eu_fence_buffer_objects(&p->ticket, &p->validated,
>>>         p->fence);
>>>
>>>         amdgpu_mn_unlock(p->mn);
>>>
>>>
>>>     We need to hold the lock until the fence is added to the
>>>     reservation object.
>>>
>>>     Otherwise somebody could have changed the page tables just in
>>>     the moment between the check of
>>>     amdgpu_ttm_tt_userptr_needs_pages() and adding the fence to the
>>>     reservation object.
>>>
>>>     Regards,
>>>     Christian.
>>>
>>>
>>>         Regards,
>>>
>>>           Felix
>>>
>>>         *From:*Koenig, Christian
>>>         *Sent:* Thursday, September 27, 2018 5:27 AM
>>>         *To:* Kuehling, Felix <Felix.Kuehling at amd.com>
>>>         <mailto:Felix.Kuehling at amd.com>
>>>         *Cc:* Yang, Philip <Philip.Yang at amd.com>
>>>         <mailto:Philip.Yang at amd.com>; amd-gfx at lists.freedesktop.org
>>>         <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse
>>>         <j.glisse at gmail.com> <mailto:j.glisse at gmail.com>
>>>         *Subject:* Re: [PATCH] drm/amdgpu: use HMM mirror callback
>>>         to replace mmu notifier v4
>>>
>>>         That is correct, but take a look what we do when after
>>>         calling the amdgpu_mn_read_lock():
>>>
>>>
>>>                     /* No memory allocation is allowed while holding
>>>             the mn lock */
>>>                     amdgpu_mn_lock(p->mn);
>>>             amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
>>>                             struct amdgpu_bo *bo =
>>>             ttm_to_amdgpu_bo(e->tv.bo);
>>>
>>>                             if
>>>             (amdgpu_ttm_tt_userptr_needs_pages(bo->tbo.ttm)) {
>>>                                     r = -ERESTARTSYS;
>>>                                     goto error_abort;
>>>                             }
>>>                     }
>>>
>>>
>>>         We double check that there wasn't any page table
>>>         modification while we prepared the submission and restart
>>>         the whole process when there actually was some update.
>>>
>>>         The reason why we need to do this is here:
>>>
>>>             ttm_eu_fence_buffer_objects(&p->ticket, &p->validated,
>>>             p->fence);
>>>                     amdgpu_mn_unlock(p->mn);
>>>
>>>
>>>         Only after the new fence is added to the buffer object we
>>>         can release the lock so that any invalidation will now block
>>>         on our command submission to finish before it modifies the
>>>         page table.
>>>
>>>         The only other option would be to add the fence first and
>>>         then check if there was any update to the page tables.
>>>
>>>         The issue with that approach is that adding a fence can't be
>>>         made undone, so if we find that there actually was an update
>>>         to the page tables we would need to somehow turn the CS into
>>>         a dummy (e.g. overwrite all IBs with NOPs or something like
>>>         that) and still submit it.
>>>
>>>         Not sure if that is actually possible.
>>>
>>>         Regards,
>>>         Christian.
>>>
>>>         Am 27.09.2018 um 10:47 schrieb Kuehling, Felix:
>>>
>>>             So back to my previous question:
>>>
>>>             >> But do we really need another lock for this? Wouldn't
>>>             the
>>>
>>>             >> re-validation of userptr BOs (currently calling
>>>             get_user_pages) force
>>>
>>>             >> synchronization with the ongoing page table
>>>             invalidation through the
>>>
>>>             >> mmap_sem or other MM locks?
>>>
>>>             >
>>>
>>>             > No and yes. We don't hold any other locks while doing
>>>             command submission, but I expect that HMM has its own
>>>             mechanism to prevent that.
>>>
>>>             >
>>>
>>>             > Since we don't modify
>>>             amdgpu_mn_lock()/amdgpu_mn_unlock() we are certainly not
>>>             using this mechanism correctly.
>>>
>>>             The existing amdgpu_mn_lock/unlock should block the MMU
>>>             notifier while a command submission is in progress. It
>>>             should also block command submission while an MMU
>>>             notifier is in progress.
>>>
>>>             What we lose with HMM is the ability to hold a read-lock
>>>             for the entire duration of the invalidate_range_start
>>>             until invalidate_range_end. As I understand it, that
>>>             lock is meant to prevent new command submissions while
>>>             the page tables are being updated by the kernel. But my
>>>             point is, that get_user_pages or hmm_vma_fault should do
>>>             the same kind of thing. Before the command submission
>>>             can go ahead, it needs to update the userptr addresses.
>>>             If the page tables are still being updated, it will
>>>             block there even without holding the amdgpu_mn_read_lock.
>>>
>>>             Regards,
>>>
>>>               Felix
>>>
>>>             *From:* Koenig, Christian
>>>             *Sent:* Thursday, September 27, 2018 3:00 AM
>>>             *To:* Kuehling, Felix <Felix.Kuehling at amd.com>
>>>             <mailto:Felix.Kuehling at amd.com>
>>>             *Cc:* Yang, Philip <Philip.Yang at amd.com>
>>>             <mailto:Philip.Yang at amd.com>;
>>>             amd-gfx at lists.freedesktop.org
>>>             <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse
>>>             <j.glisse at gmail.com> <mailto:j.glisse at gmail.com>
>>>             *Subject:* RE: [PATCH] drm/amdgpu: use HMM mirror
>>>             callback to replace mmu notifier v4
>>>
>>>             No, that won't work. We would still run into lock
>>>             inversion problems.
>>>
>>>             What we could do with the scheduler is to turn
>>>             submissions into dummies if we find that the page tables
>>>             are now outdated.
>>>
>>>             But that would be really hacky and I'm not sure if that
>>>             would really work in all cases.
>>>
>>>             Christian.
>>>
>>>             Am 27.09.2018 08:53 schrieb "Kuehling, Felix"
>>>             <Felix.Kuehling at amd.com <mailto:Felix.Kuehling at amd.com>>:
>>>
>>>             I had a chat with Jerome yesterday. He pointed out that
>>>             the new blockable parameter can be used to infer whether
>>>             the MMU notifier is being called  in a reclaim
>>>             operation. So if blockable==true, it should even be safe
>>>             to take the BO reservation lock without problems. I
>>>             think with that we should be able to remove the
>>>             read-write locking completely and go back to locking (or
>>>             try-locking for blockable==false) the reservation locks
>>>             in the MMU notifier?
>>>
>>>             Regards,
>>>               Felix
>>>
>>>             -----Original Message-----
>>>             From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org
>>>             <mailto:amd-gfx-bounces at lists.freedesktop.org>> On
>>>             Behalf Of Christian König
>>>             Sent: Saturday, September 15, 2018 3:47 AM
>>>             To: Kuehling, Felix <Felix.Kuehling at amd.com
>>>             <mailto:Felix.Kuehling at amd.com>>; Yang, Philip
>>>             <Philip.Yang at amd.com <mailto:Philip.Yang at amd.com>>;
>>>             amd-gfx at lists.freedesktop.org
>>>             <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse
>>>             <j.glisse at gmail.com <mailto:j.glisse at gmail.com>>
>>>             Subject: Re: [PATCH] drm/amdgpu: use HMM mirror callback
>>>             to replace mmu notifier v4
>>>
>>>             Am 14.09.2018 um 22:21 schrieb Felix Kuehling:
>>>             > On 2018-09-14 01:52 PM, Christian König wrote:
>>>             >> Am 14.09.2018 um 19:47 schrieb Philip Yang:
>>>             >>> On 2018-09-14 03:51 AM, Christian König wrote:
>>>             >>>> Am 13.09.2018 um 23:51 schrieb Felix Kuehling:
>>>             >>>>> On 2018-09-13 04:52 PM, Philip Yang wrote:
>>>             >>>>> [SNIP]
>>>             >>>>>>    + amdgpu_mn_read_unlock(amn);
>>>             >>>>>> +
>>>             >>>>> amdgpu_mn_read_lock/unlock support recursive
>>>             locking for multiple
>>>             >>>>> overlapping or nested invalidation ranges. But if
>>>             you'r locking
>>>             >>>>> and unlocking in the same function. Is that still
>>>             a concern?
>>>             >>> I don't understand the possible recursive case, but
>>>             >>> amdgpu_mn_read_lock() still support recursive locking.
>>>             >>>> Well the real problem is that unlocking them here
>>>             won't work.
>>>             >>>>
>>>             >>>> We need to hold the lock until we are sure that the
>>>             operation which
>>>             >>>> updates the page tables is completed.
>>>             >>>>
>>>             >>> The reason for this change is because hmm mirror has
>>>             >>> invalidate_start callback, no invalidate_end callback
>>>             >>>
>>>             >>> Check mmu_notifier.c and hmm.c again, below is
>>>             entire logic to
>>>             >>> update CPU page tables and callback:
>>>             >>>
>>>             >>> mn lock amn->lock is used to protect interval tree
>>>             access because
>>>             >>> user may submit/register new userptr anytime.
>>>             >>> This is same for old and new way.
>>>             >>>
>>>             >>> step 2 guarantee the GPU operation is done before
>>>             updating CPU page
>>>             >>> table.
>>>             >>>
>>>             >>> So I think the change is safe. We don't need hold mn
>>>             lock until the
>>>             >>> CPU page tables update is completed.
>>>             >> No, that isn't even remotely correct. The lock
>>>             doesn't protects the
>>>             >> interval tree.
>>>             >>
>>>             >>> Old:
>>>             >>>     1. down_read_non_owner(&amn->lock)
>>>             >>>     2. loop to handle BOs from node->bos through
>>>             interval tree
>>>             >>> amn->object nodes
>>>             >>>         gfx: wait for pending BOs fence operation
>>>             done, mark user
>>>             >>> pages dirty
>>>             >>>         kfd: evict user queues of the process, wait
>>>             for queue
>>>             >>> unmap/map operation done
>>>             >>>     3. update CPU page tables
>>>             >>>     4. up_read(&amn->lock)
>>>             >>>
>>>             >>> New, switch step 3 and 4
>>>             >>>     1. down_read_non_owner(&amn->lock)
>>>             >>>     2. loop to handle BOs from node->bos through
>>>             interval tree
>>>             >>> amn->object nodes
>>>             >>>         gfx: wait for pending BOs fence operation
>>>             done, mark user
>>>             >>> pages dirty
>>>             >>>         kfd: evict user queues of the process, wait
>>>             for queue
>>>             >>> unmap/map operation done
>>>             >>>     3. up_read(&amn->lock)
>>>             >>>     4. update CPU page tables
>>>             >> The lock is there to make sure that we serialize page
>>>             table updates
>>>             >> with command submission.
>>>             > As I understand it, the idea is to prevent command
>>>             submission (adding
>>>             > new fences to BOs) while a page table invalidation is
>>>             in progress.
>>>
>>>             Yes, exactly.
>>>
>>>             > But do we really need another lock for this? Wouldn't the
>>>             > re-validation of userptr BOs (currently calling
>>>             get_user_pages) force
>>>             > synchronization with the ongoing page table
>>>             invalidation through the
>>>             > mmap_sem or other MM locks?
>>>
>>>             No and yes. We don't hold any other locks while doing
>>>             command submission, but I expect that HMM has its own
>>>             mechanism to prevent that.
>>>
>>>             Since we don't modify
>>>             amdgpu_mn_lock()/amdgpu_mn_unlock() we are certainly not
>>>             using this mechanism correctly.
>>>
>>>             Regards,
>>>             Christian.
>>>             _______________________________________________
>>>             amd-gfx mailing list
>>>             amd-gfx at lists.freedesktop.org
>>>             <mailto:amd-gfx at lists.freedesktop.org>
>>>             https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180928/f66eeccf/attachment-0001.html>