[PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v4

Thu Sep 27 20:17:54 UTC 2018

I was trying to understand the way how HMM handle this concurrent issue 
and how we handle it in amdgpu_ttm_tt_userptr_needs_pages() and 
amdgpu_ttm_tt_affect_userptr(). HMM uses range->valid flag, we use 
gtt->mmu_invalidations and gtt->last_set_pages. Both use the same lock 
plus flag idea actually.

Thanks for the information, now I understand fence 
ttm_eu_fence_buffer_objects() put to BOs will block CPU page table 
update. This is another side of this concurrent issue I didn't know.

I had same worry that it has issue without invalidate_range_end() 
callback as the calling sequence Felix lists. Now I think it's fine 
after taking a look again today because of mm->mmap_sem usage, this is 
my understanding:

A path:

down_write(&mm->mmap_sem);
mmu_notifier_invalidate_range_start()
     take_lock()
     release_lock()
CPU page table update
mmu_notifier_invalidate_range_end()
up_write(&mm->mmap_sem);

B path:

again:
down_read(&mm->mmap_sem);
hmm_vma_get_pfns()
up_read(&mm->mmap_sem);
....
....
take_lock()
if (!hmm_vma_range_done()) {
    release_lock()
    goto again
}
submit command job...
release_lock()

If you agree, I will submit patch v5 with some minor changes, and submit 
another patch to replace get_user_page() with HMM.

Regards,
Philip

On 2018-09-27 11:36 AM, Christian König wrote:
> Yeah, I've read that as well.
>
> My best guess is that we just need to add a call to 
> hmm_vma_range_done() after taking the lock and also replace 
> get_user_pages() with hmm_vma_get_pfns().
>
> But I'm still not 100% sure how all of that is supposed to work together.
>
> Regards,
> Christian.
>
> Am 27.09.2018 um 16:50 schrieb Kuehling, Felix:
>>
>> I think the answer is here: 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/hmm.rst#n216
>>
>> Regards,
>>
>>   Felix
>>
>> *From:*Koenig, Christian
>> *Sent:* Thursday, September 27, 2018 10:30 AM
>> *To:* Kuehling, Felix <Felix.Kuehling at amd.com>
>> *Cc:* j.glisse at gmail.com; Yang, Philip <Philip.Yang at amd.com>; 
>> amd-gfx at lists.freedesktop.org
>> *Subject:* RE: [PATCH] drm/amdgpu: use HMM mirror callback to replace 
>> mmu notifier v4
>>
>> At least with get_user_pages() that is perfectly possible.
>>
>> For HMM it could be that this is prevented somehow.
>>
>> Christian.
>>
>> Am 27.09.2018 16:27 schrieb "Kuehling, Felix" <Felix.Kuehling at amd.com 
>> <mailto:Felix.Kuehling at amd.com>>:
>>
>> > In this case you can end up accessing pages which are invalidated 
>> while get_user_pages is in process.
>>
>> What’s the sequence of events you have in mind? Something like this?
>>
>>   * Page table is updated and triggers MMU notifier
>>   * amdgpu MMU notifier runs and waits for pending CS to finish while
>>     holding the read lock
>>   * New CS starts just after invalidate_range_start MMU notifier
>>     finishes but before the page table update is done
>>   * get_user_pages returns outdated physical addresses
>>
>> I hope that’s not actually possible and that get_user_pages or 
>> hmm_vma_fault would block until the page table update is done. That 
>> is, invalidate_range_start marks the start of a page table update, 
>> and while that update is in progress, get_user_pages or hmm_vma_fault 
>> block. Jerome, can you comment on that?
>>
>> Thanks,
>>   Felix
>>
>> *From:*Koenig, Christian
>> *Sent:* Thursday, September 27, 2018 9:59 AM
>> *To:* Kuehling, Felix <Felix.Kuehling at amd.com 
>> <mailto:Felix.Kuehling at amd.com>>
>> *Cc:* Yang, Philip <Philip.Yang at amd.com 
>> <mailto:Philip.Yang at amd.com>>; amd-gfx at lists.freedesktop.org 
>> <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse 
>> <j.glisse at gmail.com <mailto:j.glisse at gmail.com>>
>> *Subject:* RE: [PATCH] drm/amdgpu: use HMM mirror callback to replace 
>> mmu notifier v4
>>
>> Yeah I understand that, but again that won't work.
>>
>> In this case you can end up accessing pages which are invalidated 
>> while get_user_pages is in process.
>>
>> Christian.
>>
>> Am 27.09.2018 15:41 schrieb "Kuehling, Felix" <Felix.Kuehling at amd.com 
>> <mailto:Felix.Kuehling at amd.com>>:
>>
>> > I’m not planning to change that. I don’t think there is any need to 
>> change it.
>>
>> >
>> > Yeah, but when HMM doesn't provide both the start and the end hock 
>> of the invalidation this way won't work any more.
>> >
>> > So we need to find a solution for this,
>> > Christian.
>>
>> My whole argument is that you don’t need to hold the read lock until 
>> the invalidate_range_end. Just read_lock and read_unlock in the 
>> invalidate_range_start function.
>>
>> Regards,
>>
>> Felix
>>
>> *From:*Koenig, Christian
>> *Sent:* Thursday, September 27, 2018 9:22 AM
>> *To:* Kuehling, Felix <Felix.Kuehling at amd.com 
>> <mailto:Felix.Kuehling at amd.com>>
>> *Cc:* Yang, Philip <Philip.Yang at amd.com 
>> <mailto:Philip.Yang at amd.com>>; amd-gfx at lists.freedesktop.org 
>> <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse 
>> <j.glisse at gmail.com <mailto:j.glisse at gmail.com>>
>> *Subject:* Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace 
>> mmu notifier v4
>>
>> Am 27.09.2018 um 15:18 schrieb Kuehling, Felix:
>>
>>     > The problem is here:
>>     >
>>
>>     > ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
>>
>>     > amdgpu_mn_unlock(p->mn);
>>
>>     >
>>     > We need to hold the lock until the fence is added to the
>>     reservation object.
>>     >
>>     > Otherwise somebody could have changed the page tables just in
>>     the moment between the check of
>>     amdgpu_ttm_tt_userptr_needs_pages() and adding the fence to the
>>     reservation object.
>>
>>     I’m not planning to change that. I don’t think there is any need
>>     to change it.
>>
>>
>> Yeah, but when HMM doesn't provide both the start and the end hock of 
>> the invalidation this way won't work any more.
>>
>> So we need to find a solution for this,
>> Christian.
>>
>>     Regards,
>>
>>     Felix
>>
>>     *From:*Koenig, Christian
>>     *Sent:* Thursday, September 27, 2018 7:24 AM
>>     *To:* Kuehling, Felix <Felix.Kuehling at amd.com>
>>     <mailto:Felix.Kuehling at amd.com>
>>     *Cc:* Yang, Philip <Philip.Yang at amd.com>
>>     <mailto:Philip.Yang at amd.com>; amd-gfx at lists.freedesktop.org
>>     <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse
>>     <j.glisse at gmail.com> <mailto:j.glisse at gmail.com>
>>     *Subject:* Re: [PATCH] drm/amdgpu: use HMM mirror callback to
>>     replace mmu notifier v4
>>
>>     Am 27.09.2018 um 13:08 schrieb Kuehling, Felix:
>>
>>         > We double check that there wasn't any page table
>>         modification while we prepared the submission and restart the
>>         whole process when there actually was some update.
>>         >
>>         > The reason why we need to do this is here:
>>         >
>>
>>         > ttm_eu_fence_buffer_objects(&p->ticket, &p->validated,
>>         p->fence);
>>         >        amdgpu_mn_unlock(p->mn);
>>
>>         >
>>         > Only after the new fence is added to the buffer object we
>>         can release the lock so that any invalidation will now block
>>         on our command submission to finish before it modifies the
>>         page table.
>>
>>         I don’t see why this requires holding the read-lock until
>>         invalidate_range_end. amdgpu_ttm_tt_affect_userptr gets
>>         called while the mn read-lock is held in
>>         invalidate_range_start notifier.
>>
>>
>>     That's not related to amdgpu_ttm_tt_affect_userptr(), this
>>     function could actually be called outside the lock.
>>
>>     The problem is here:
>>
>>         ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
>>
>>         amdgpu_mn_unlock(p->mn);
>>
>>
>>     We need to hold the lock until the fence is added to the
>>     reservation object.
>>
>>     Otherwise somebody could have changed the page tables just in the
>>     moment between the check of amdgpu_ttm_tt_userptr_needs_pages()
>>     and adding the fence to the reservation object.
>>
>>     Regards,
>>     Christian.
>>
>>
>>         Regards,
>>
>>           Felix
>>
>>         *From:*Koenig, Christian
>>         *Sent:* Thursday, September 27, 2018 5:27 AM
>>         *To:* Kuehling, Felix <Felix.Kuehling at amd.com>
>>         <mailto:Felix.Kuehling at amd.com>
>>         *Cc:* Yang, Philip <Philip.Yang at amd.com>
>>         <mailto:Philip.Yang at amd.com>; amd-gfx at lists.freedesktop.org
>>         <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse
>>         <j.glisse at gmail.com> <mailto:j.glisse at gmail.com>
>>         *Subject:* Re: [PATCH] drm/amdgpu: use HMM mirror callback to
>>         replace mmu notifier v4
>>
>>         That is correct, but take a look what we do when after
>>         calling the amdgpu_mn_read_lock():
>>
>>
>>                     /* No memory allocation is allowed while holding
>>             the mn lock */
>>                     amdgpu_mn_lock(p->mn);
>>             amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
>>                             struct amdgpu_bo *bo =
>>             ttm_to_amdgpu_bo(e->tv.bo);
>>
>>                             if
>>             (amdgpu_ttm_tt_userptr_needs_pages(bo->tbo.ttm)) {
>>                                     r = -ERESTARTSYS;
>>                                     goto error_abort;
>>                             }
>>                     }
>>
>>
>>         We double check that there wasn't any page table modification
>>         while we prepared the submission and restart the whole
>>         process when there actually was some update.
>>
>>         The reason why we need to do this is here:
>>
>>             ttm_eu_fence_buffer_objects(&p->ticket, &p->validated,
>>             p->fence);
>>                     amdgpu_mn_unlock(p->mn);
>>
>>
>>         Only after the new fence is added to the buffer object we can
>>         release the lock so that any invalidation will now block on
>>         our command submission to finish before it modifies the page
>>         table.
>>
>>         The only other option would be to add the fence first and
>>         then check if there was any update to the page tables.
>>
>>         The issue with that approach is that adding a fence can't be
>>         made undone, so if we find that there actually was an update
>>         to the page tables we would need to somehow turn the CS into
>>         a dummy (e.g. overwrite all IBs with NOPs or something like
>>         that) and still submit it.
>>
>>         Not sure if that is actually possible.
>>
>>         Regards,
>>         Christian.
>>
>>         Am 27.09.2018 um 10:47 schrieb Kuehling, Felix:
>>
>>             So back to my previous question:
>>
>>             >> But do we really need another lock for this? Wouldn't the
>>
>>             >> re-validation of userptr BOs (currently calling
>>             get_user_pages) force
>>
>>             >> synchronization with the ongoing page table
>>             invalidation through the
>>
>>             >> mmap_sem or other MM locks?
>>
>>             >
>>
>>             > No and yes. We don't hold any other locks while doing
>>             command submission, but I expect that HMM has its own
>>             mechanism to prevent that.
>>
>>             >
>>
>>             > Since we don't modify
>>             amdgpu_mn_lock()/amdgpu_mn_unlock() we are certainly not
>>             using this mechanism correctly.
>>
>>             The existing amdgpu_mn_lock/unlock should block the MMU
>>             notifier while a command submission is in progress. It
>>             should also block command submission while an MMU
>>             notifier is in progress.
>>
>>             What we lose with HMM is the ability to hold a read-lock
>>             for the entire duration of the invalidate_range_start
>>             until invalidate_range_end. As I understand it, that lock
>>             is meant to prevent new command submissions while the
>>             page tables are being updated by the kernel. But my point
>>             is, that get_user_pages or hmm_vma_fault should do the
>>             same kind of thing. Before the command submission can go
>>             ahead, it needs to update the userptr addresses. If the
>>             page tables are still being updated, it will block there
>>             even without holding the amdgpu_mn_read_lock.
>>
>>             Regards,
>>
>>               Felix
>>
>>             *From:* Koenig, Christian
>>             *Sent:* Thursday, September 27, 2018 3:00 AM
>>             *To:* Kuehling, Felix <Felix.Kuehling at amd.com>
>>             <mailto:Felix.Kuehling at amd.com>
>>             *Cc:* Yang, Philip <Philip.Yang at amd.com>
>>             <mailto:Philip.Yang at amd.com>;
>>             amd-gfx at lists.freedesktop.org
>>             <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse
>>             <j.glisse at gmail.com> <mailto:j.glisse at gmail.com>
>>             *Subject:* RE: [PATCH] drm/amdgpu: use HMM mirror
>>             callback to replace mmu notifier v4
>>
>>             No, that won't work. We would still run into lock
>>             inversion problems.
>>
>>             What we could do with the scheduler is to turn
>>             submissions into dummies if we find that the page tables
>>             are now outdated.
>>
>>             But that would be really hacky and I'm not sure if that
>>             would really work in all cases.
>>
>>             Christian.
>>
>>             Am 27.09.2018 08:53 schrieb "Kuehling, Felix"
>>             <Felix.Kuehling at amd.com <mailto:Felix.Kuehling at amd.com>>:
>>
>>             I had a chat with Jerome yesterday. He pointed out that
>>             the new blockable parameter can be used to infer whether
>>             the MMU notifier is being called  in a reclaim operation.
>>             So if blockable==true, it should even be safe to take the
>>             BO reservation lock without problems. I think with that
>>             we should be able to remove the read-write locking
>>             completely and go back to locking (or try-locking for
>>             blockable==false) the reservation locks in the MMU notifier?
>>
>>             Regards,
>>               Felix
>>
>>             -----Original Message-----
>>             From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org
>>             <mailto:amd-gfx-bounces at lists.freedesktop.org>> On Behalf
>>             Of Christian König
>>             Sent: Saturday, September 15, 2018 3:47 AM
>>             To: Kuehling, Felix <Felix.Kuehling at amd.com
>>             <mailto:Felix.Kuehling at amd.com>>; Yang, Philip
>>             <Philip.Yang at amd.com <mailto:Philip.Yang at amd.com>>;
>>             amd-gfx at lists.freedesktop.org
>>             <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse
>>             <j.glisse at gmail.com <mailto:j.glisse at gmail.com>>
>>             Subject: Re: [PATCH] drm/amdgpu: use HMM mirror callback
>>             to replace mmu notifier v4
>>
>>             Am 14.09.2018 um 22:21 schrieb Felix Kuehling:
>>             > On 2018-09-14 01:52 PM, Christian König wrote:
>>             >> Am 14.09.2018 um 19:47 schrieb Philip Yang:
>>             >>> On 2018-09-14 03:51 AM, Christian König wrote:
>>             >>>> Am 13.09.2018 um 23:51 schrieb Felix Kuehling:
>>             >>>>> On 2018-09-13 04:52 PM, Philip Yang wrote:
>>             >>>>> [SNIP]
>>             >>>>>>    + amdgpu_mn_read_unlock(amn);
>>             >>>>>> +
>>             >>>>> amdgpu_mn_read_lock/unlock support recursive
>>             locking for multiple
>>             >>>>> overlapping or nested invalidation ranges. But if
>>             you'r locking
>>             >>>>> and unlocking in the same function. Is that still a
>>             concern?
>>             >>> I don't understand the possible recursive case, but
>>             >>> amdgpu_mn_read_lock() still support recursive locking.
>>             >>>> Well the real problem is that unlocking them here
>>             won't work.
>>             >>>>
>>             >>>> We need to hold the lock until we are sure that the
>>             operation which
>>             >>>> updates the page tables is completed.
>>             >>>>
>>             >>> The reason for this change is because hmm mirror has
>>             >>> invalidate_start callback, no invalidate_end callback
>>             >>>
>>             >>> Check mmu_notifier.c and hmm.c again, below is entire
>>             logic to
>>             >>> update CPU page tables and callback:
>>             >>>
>>             >>> mn lock amn->lock is used to protect interval tree
>>             access because
>>             >>> user may submit/register new userptr anytime.
>>             >>> This is same for old and new way.
>>             >>>
>>             >>> step 2 guarantee the GPU operation is done before
>>             updating CPU page
>>             >>> table.
>>             >>>
>>             >>> So I think the change is safe. We don't need hold mn
>>             lock until the
>>             >>> CPU page tables update is completed.
>>             >> No, that isn't even remotely correct. The lock doesn't
>>             protects the
>>             >> interval tree.
>>             >>
>>             >>> Old:
>>             >>>     1. down_read_non_owner(&amn->lock)
>>             >>>     2. loop to handle BOs from node->bos through
>>             interval tree
>>             >>> amn->object nodes
>>             >>>         gfx: wait for pending BOs fence operation
>>             done, mark user
>>             >>> pages dirty
>>             >>>         kfd: evict user queues of the process, wait
>>             for queue
>>             >>> unmap/map operation done
>>             >>>     3. update CPU page tables
>>             >>>     4. up_read(&amn->lock)
>>             >>>
>>             >>> New, switch step 3 and 4
>>             >>>     1. down_read_non_owner(&amn->lock)
>>             >>>     2. loop to handle BOs from node->bos through
>>             interval tree
>>             >>> amn->object nodes
>>             >>>         gfx: wait for pending BOs fence operation
>>             done, mark user
>>             >>> pages dirty
>>             >>>         kfd: evict user queues of the process, wait
>>             for queue
>>             >>> unmap/map operation done
>>             >>>     3. up_read(&amn->lock)
>>             >>>     4. update CPU page tables
>>             >> The lock is there to make sure that we serialize page
>>             table updates
>>             >> with command submission.
>>             > As I understand it, the idea is to prevent command
>>             submission (adding
>>             > new fences to BOs) while a page table invalidation is
>>             in progress.
>>
>>             Yes, exactly.
>>
>>             > But do we really need another lock for this? Wouldn't the
>>             > re-validation of userptr BOs (currently calling
>>             get_user_pages) force
>>             > synchronization with the ongoing page table
>>             invalidation through the
>>             > mmap_sem or other MM locks?
>>
>>             No and yes. We don't hold any other locks while doing
>>             command submission, but I expect that HMM has its own
>>             mechanism to prevent that.
>>
>>             Since we don't modify amdgpu_mn_lock()/amdgpu_mn_unlock()
>>             we are certainly not using this mechanism correctly.
>>
>>             Regards,
>>             Christian.
>>             _______________________________________________
>>             amd-gfx mailing list
>>             amd-gfx at lists.freedesktop.org
>>             <mailto:amd-gfx at lists.freedesktop.org>
>>             https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180927/eec56798/attachment-0001.html>