[PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v4
Philip Yang
philip.yang at amd.com
Fri Sep 28 15:07:04 UTC 2018
For B path, we take mm->mmap_sem, then call hmm_vma_get_pfns() or
get_user_pages(). This is obvious.
For A path, mmu notifier
mmu_notifier_invalidate_range_start()/mmu_notifier_invalidate_range_end()
is called in many places, and the calling path is quit complicated
inside mm, it's not obvious. I checked many of the them, for example:
do_munmap()
down_write(&mm->mmap_sem)
arch_unmap()
mpx_notify_unmap()...
zap_bt_entries_mapping()
zap_page_range()
up_write(&mm->mmap_sem)
void zap_page_range(struct vm_area_struct *vma, unsigned long start,
unsigned long size)
{
struct mm_struct *mm = vma->vm_mm;
struct mmu_gather tlb;
unsigned long end = start + size;
lru_add_drain();
tlb_gather_mmu(&tlb, mm, start, end);
update_hiwater_rss(mm);
mmu_notifier_invalidate_range_start(mm, start, end);
for ( ; vma && vma->vm_start < end; vma = vma->vm_next)
unmap_single_vma(&tlb, vma, start, end, NULL);
mmu_notifier_invalidate_range_end(mm, start, end);
tlb_finish_mmu(&tlb, start, end);
}
So AFAIK it's okay without invalidate_range_end() callback.
Regards,
Philip
On 2018-09-28 01:25 AM, Koenig, Christian wrote:
> No, that is incorrect as well :)
>
> The mmap_sem isn't necessary taken during page table updates.
>
> What you could do is replace get_user_pages() directly with HMM. If
> I'm not completely mistaken that should work as expected.
>
> Christian.
>
> Am 27.09.2018 22:18 schrieb "Yang, Philip" <Philip.Yang at amd.com>:
> I was trying to understand the way how HMM handle this concurrent
> issue and how we handle it in amdgpu_ttm_tt_userptr_needs_pages() and
> amdgpu_ttm_tt_affect_userptr(). HMM uses range->valid flag, we use
> gtt->mmu_invalidations and gtt->last_set_pages. Both use the same lock
> plus flag idea actually.
>
> Thanks for the information, now I understand fence
> ttm_eu_fence_buffer_objects() put to BOs will block CPU page table
> update. This is another side of this concurrent issue I didn't know.
>
> I had same worry that it has issue without invalidate_range_end()
> callback as the calling sequence Felix lists. Now I think it's fine
> after taking a look again today because of mm->mmap_sem usage, this is
> my understanding:
>
> A path:
>
> down_write(&mm->mmap_sem);
> mmu_notifier_invalidate_range_start()
> take_lock()
> release_lock()
> CPU page table update
> mmu_notifier_invalidate_range_end()
> up_write(&mm->mmap_sem);
>
> B path:
>
> again:
> down_read(&mm->mmap_sem);
> hmm_vma_get_pfns()
> up_read(&mm->mmap_sem);
> ....
> ....
> take_lock()
> if (!hmm_vma_range_done()) {
> release_lock()
> goto again
> }
> submit command job...
> release_lock()
>
> If you agree, I will submit patch v5 with some minor changes, and
> submit another patch to replace get_user_page() with HMM.
>
> Regards,
> Philip
>
> On 2018-09-27 11:36 AM, Christian König wrote:
>> Yeah, I've read that as well.
>>
>> My best guess is that we just need to add a call to
>> hmm_vma_range_done() after taking the lock and also replace
>> get_user_pages() with hmm_vma_get_pfns().
>>
>> But I'm still not 100% sure how all of that is supposed to work together.
>>
>> Regards,
>> Christian.
>>
>> Am 27.09.2018 um 16:50 schrieb Kuehling, Felix:
>>>
>>> I think the answer is here:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/hmm.rst#n216
>>>
>>> Regards,
>>>
>>> Felix
>>>
>>> *From:*Koenig, Christian
>>> *Sent:* Thursday, September 27, 2018 10:30 AM
>>> *To:* Kuehling, Felix <Felix.Kuehling at amd.com>
>>> *Cc:* j.glisse at gmail.com; Yang, Philip <Philip.Yang at amd.com>;
>>> amd-gfx at lists.freedesktop.org
>>> *Subject:* RE: [PATCH] drm/amdgpu: use HMM mirror callback to
>>> replace mmu notifier v4
>>>
>>> At least with get_user_pages() that is perfectly possible.
>>>
>>> For HMM it could be that this is prevented somehow.
>>>
>>> Christian.
>>>
>>> Am 27.09.2018 16:27 schrieb "Kuehling, Felix"
>>> <Felix.Kuehling at amd.com <mailto:Felix.Kuehling at amd.com>>:
>>>
>>> > In this case you can end up accessing pages which are invalidated
>>> while get_user_pages is in process.
>>>
>>> What’s the sequence of events you have in mind? Something like this?
>>>
>>> * Page table is updated and triggers MMU notifier
>>> * amdgpu MMU notifier runs and waits for pending CS to finish
>>> while holding the read lock
>>> * New CS starts just after invalidate_range_start MMU notifier
>>> finishes but before the page table update is done
>>> * get_user_pages returns outdated physical addresses
>>>
>>> I hope that’s not actually possible and that get_user_pages or
>>> hmm_vma_fault would block until the page table update is done. That
>>> is, invalidate_range_start marks the start of a page table update,
>>> and while that update is in progress, get_user_pages or
>>> hmm_vma_fault block. Jerome, can you comment on that?
>>>
>>> Thanks,
>>> Felix
>>>
>>> *From:*Koenig, Christian
>>> *Sent:* Thursday, September 27, 2018 9:59 AM
>>> *To:* Kuehling, Felix <Felix.Kuehling at amd.com
>>> <mailto:Felix.Kuehling at amd.com>>
>>> *Cc:* Yang, Philip <Philip.Yang at amd.com
>>> <mailto:Philip.Yang at amd.com>>; amd-gfx at lists.freedesktop.org
>>> <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse
>>> <j.glisse at gmail.com <mailto:j.glisse at gmail.com>>
>>> *Subject:* RE: [PATCH] drm/amdgpu: use HMM mirror callback to
>>> replace mmu notifier v4
>>>
>>> Yeah I understand that, but again that won't work.
>>>
>>> In this case you can end up accessing pages which are invalidated
>>> while get_user_pages is in process.
>>>
>>> Christian.
>>>
>>> Am 27.09.2018 15:41 schrieb "Kuehling, Felix"
>>> <Felix.Kuehling at amd.com <mailto:Felix.Kuehling at amd.com>>:
>>>
>>> > I’m not planning to change that. I don’t think there is any need to change it.
>>>
>>> >
>>> > Yeah, but when HMM doesn't provide both the start and the end hock
>>> of the invalidation this way won't work any more.
>>> >
>>> > So we need to find a solution for this,
>>> > Christian.
>>>
>>> My whole argument is that you don’t need to hold the read lock until
>>> the invalidate_range_end. Just read_lock and read_unlock in the
>>> invalidate_range_start function.
>>>
>>> Regards,
>>>
>>> Felix
>>>
>>> *From:*Koenig, Christian
>>> *Sent:* Thursday, September 27, 2018 9:22 AM
>>> *To:* Kuehling, Felix <Felix.Kuehling at amd.com
>>> <mailto:Felix.Kuehling at amd.com>>
>>> *Cc:* Yang, Philip <Philip.Yang at amd.com
>>> <mailto:Philip.Yang at amd.com>>; amd-gfx at lists.freedesktop.org
>>> <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse
>>> <j.glisse at gmail.com <mailto:j.glisse at gmail.com>>
>>> *Subject:* Re: [PATCH] drm/amdgpu: use HMM mirror callback to
>>> replace mmu notifier v4
>>>
>>> Am 27.09.2018 um 15:18 schrieb Kuehling, Felix:
>>>
>>> > The problem is here:
>>> >
>>>
>>> > ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
>>>
>>> > amdgpu_mn_unlock(p->mn);
>>>
>>> >
>>> > We need to hold the lock until the fence is added to the
>>> reservation object.
>>> >
>>> > Otherwise somebody could have changed the page tables just in
>>> the moment between the check of
>>> amdgpu_ttm_tt_userptr_needs_pages() and adding the fence to the
>>> reservation object.
>>>
>>> I’m not planning to change that. I don’t think there is any need
>>> to change it.
>>>
>>>
>>> Yeah, but when HMM doesn't provide both the start and the end hock
>>> of the invalidation this way won't work any more.
>>>
>>> So we need to find a solution for this,
>>> Christian.
>>>
>>> Regards,
>>>
>>> Felix
>>>
>>> *From:*Koenig, Christian
>>> *Sent:* Thursday, September 27, 2018 7:24 AM
>>> *To:* Kuehling, Felix <Felix.Kuehling at amd.com>
>>> <mailto:Felix.Kuehling at amd.com>
>>> *Cc:* Yang, Philip <Philip.Yang at amd.com>
>>> <mailto:Philip.Yang at amd.com>; amd-gfx at lists.freedesktop.org
>>> <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse
>>> <j.glisse at gmail.com> <mailto:j.glisse at gmail.com>
>>> *Subject:* Re: [PATCH] drm/amdgpu: use HMM mirror callback to
>>> replace mmu notifier v4
>>>
>>> Am 27.09.2018 um 13:08 schrieb Kuehling, Felix:
>>>
>>> > We double check that there wasn't any page table
>>> modification while we prepared the submission and restart
>>> the whole process when there actually was some update.
>>> >
>>> > The reason why we need to do this is here:
>>> >
>>>
>>> > ttm_eu_fence_buffer_objects(&p->ticket, &p->validated,
>>> p->fence);
>>> > amdgpu_mn_unlock(p->mn);
>>>
>>> >
>>> > Only after the new fence is added to the buffer object we
>>> can release the lock so that any invalidation will now block
>>> on our command submission to finish before it modifies the
>>> page table.
>>>
>>> I don’t see why this requires holding the read-lock until
>>> invalidate_range_end. amdgpu_ttm_tt_affect_userptr gets
>>> called while the mn read-lock is held in
>>> invalidate_range_start notifier.
>>>
>>>
>>> That's not related to amdgpu_ttm_tt_affect_userptr(), this
>>> function could actually be called outside the lock.
>>>
>>> The problem is here:
>>>
>>> ttm_eu_fence_buffer_objects(&p->ticket, &p->validated,
>>> p->fence);
>>>
>>> amdgpu_mn_unlock(p->mn);
>>>
>>>
>>> We need to hold the lock until the fence is added to the
>>> reservation object.
>>>
>>> Otherwise somebody could have changed the page tables just in
>>> the moment between the check of
>>> amdgpu_ttm_tt_userptr_needs_pages() and adding the fence to the
>>> reservation object.
>>>
>>> Regards,
>>> Christian.
>>>
>>>
>>> Regards,
>>>
>>> Felix
>>>
>>> *From:*Koenig, Christian
>>> *Sent:* Thursday, September 27, 2018 5:27 AM
>>> *To:* Kuehling, Felix <Felix.Kuehling at amd.com>
>>> <mailto:Felix.Kuehling at amd.com>
>>> *Cc:* Yang, Philip <Philip.Yang at amd.com>
>>> <mailto:Philip.Yang at amd.com>; amd-gfx at lists.freedesktop.org
>>> <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse
>>> <j.glisse at gmail.com> <mailto:j.glisse at gmail.com>
>>> *Subject:* Re: [PATCH] drm/amdgpu: use HMM mirror callback
>>> to replace mmu notifier v4
>>>
>>> That is correct, but take a look what we do when after
>>> calling the amdgpu_mn_read_lock():
>>>
>>>
>>> /* No memory allocation is allowed while holding
>>> the mn lock */
>>> amdgpu_mn_lock(p->mn);
>>> amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
>>> struct amdgpu_bo *bo =
>>> ttm_to_amdgpu_bo(e->tv.bo);
>>>
>>> if
>>> (amdgpu_ttm_tt_userptr_needs_pages(bo->tbo.ttm)) {
>>> r = -ERESTARTSYS;
>>> goto error_abort;
>>> }
>>> }
>>>
>>>
>>> We double check that there wasn't any page table
>>> modification while we prepared the submission and restart
>>> the whole process when there actually was some update.
>>>
>>> The reason why we need to do this is here:
>>>
>>> ttm_eu_fence_buffer_objects(&p->ticket, &p->validated,
>>> p->fence);
>>> amdgpu_mn_unlock(p->mn);
>>>
>>>
>>> Only after the new fence is added to the buffer object we
>>> can release the lock so that any invalidation will now block
>>> on our command submission to finish before it modifies the
>>> page table.
>>>
>>> The only other option would be to add the fence first and
>>> then check if there was any update to the page tables.
>>>
>>> The issue with that approach is that adding a fence can't be
>>> made undone, so if we find that there actually was an update
>>> to the page tables we would need to somehow turn the CS into
>>> a dummy (e.g. overwrite all IBs with NOPs or something like
>>> that) and still submit it.
>>>
>>> Not sure if that is actually possible.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 27.09.2018 um 10:47 schrieb Kuehling, Felix:
>>>
>>> So back to my previous question:
>>>
>>> >> But do we really need another lock for this? Wouldn't
>>> the
>>>
>>> >> re-validation of userptr BOs (currently calling
>>> get_user_pages) force
>>>
>>> >> synchronization with the ongoing page table
>>> invalidation through the
>>>
>>> >> mmap_sem or other MM locks?
>>>
>>> >
>>>
>>> > No and yes. We don't hold any other locks while doing
>>> command submission, but I expect that HMM has its own
>>> mechanism to prevent that.
>>>
>>> >
>>>
>>> > Since we don't modify
>>> amdgpu_mn_lock()/amdgpu_mn_unlock() we are certainly not
>>> using this mechanism correctly.
>>>
>>> The existing amdgpu_mn_lock/unlock should block the MMU
>>> notifier while a command submission is in progress. It
>>> should also block command submission while an MMU
>>> notifier is in progress.
>>>
>>> What we lose with HMM is the ability to hold a read-lock
>>> for the entire duration of the invalidate_range_start
>>> until invalidate_range_end. As I understand it, that
>>> lock is meant to prevent new command submissions while
>>> the page tables are being updated by the kernel. But my
>>> point is, that get_user_pages or hmm_vma_fault should do
>>> the same kind of thing. Before the command submission
>>> can go ahead, it needs to update the userptr addresses.
>>> If the page tables are still being updated, it will
>>> block there even without holding the amdgpu_mn_read_lock.
>>>
>>> Regards,
>>>
>>> Felix
>>>
>>> *From:* Koenig, Christian
>>> *Sent:* Thursday, September 27, 2018 3:00 AM
>>> *To:* Kuehling, Felix <Felix.Kuehling at amd.com>
>>> <mailto:Felix.Kuehling at amd.com>
>>> *Cc:* Yang, Philip <Philip.Yang at amd.com>
>>> <mailto:Philip.Yang at amd.com>;
>>> amd-gfx at lists.freedesktop.org
>>> <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse
>>> <j.glisse at gmail.com> <mailto:j.glisse at gmail.com>
>>> *Subject:* RE: [PATCH] drm/amdgpu: use HMM mirror
>>> callback to replace mmu notifier v4
>>>
>>> No, that won't work. We would still run into lock
>>> inversion problems.
>>>
>>> What we could do with the scheduler is to turn
>>> submissions into dummies if we find that the page tables
>>> are now outdated.
>>>
>>> But that would be really hacky and I'm not sure if that
>>> would really work in all cases.
>>>
>>> Christian.
>>>
>>> Am 27.09.2018 08:53 schrieb "Kuehling, Felix"
>>> <Felix.Kuehling at amd.com <mailto:Felix.Kuehling at amd.com>>:
>>>
>>> I had a chat with Jerome yesterday. He pointed out that
>>> the new blockable parameter can be used to infer whether
>>> the MMU notifier is being called in a reclaim
>>> operation. So if blockable==true, it should even be safe
>>> to take the BO reservation lock without problems. I
>>> think with that we should be able to remove the
>>> read-write locking completely and go back to locking (or
>>> try-locking for blockable==false) the reservation locks
>>> in the MMU notifier?
>>>
>>> Regards,
>>> Felix
>>>
>>> -----Original Message-----
>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org
>>> <mailto:amd-gfx-bounces at lists.freedesktop.org>> On
>>> Behalf Of Christian König
>>> Sent: Saturday, September 15, 2018 3:47 AM
>>> To: Kuehling, Felix <Felix.Kuehling at amd.com
>>> <mailto:Felix.Kuehling at amd.com>>; Yang, Philip
>>> <Philip.Yang at amd.com <mailto:Philip.Yang at amd.com>>;
>>> amd-gfx at lists.freedesktop.org
>>> <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse
>>> <j.glisse at gmail.com <mailto:j.glisse at gmail.com>>
>>> Subject: Re: [PATCH] drm/amdgpu: use HMM mirror callback
>>> to replace mmu notifier v4
>>>
>>> Am 14.09.2018 um 22:21 schrieb Felix Kuehling:
>>> > On 2018-09-14 01:52 PM, Christian König wrote:
>>> >> Am 14.09.2018 um 19:47 schrieb Philip Yang:
>>> >>> On 2018-09-14 03:51 AM, Christian König wrote:
>>> >>>> Am 13.09.2018 um 23:51 schrieb Felix Kuehling:
>>> >>>>> On 2018-09-13 04:52 PM, Philip Yang wrote:
>>> >>>>> [SNIP]
>>> >>>>>> + amdgpu_mn_read_unlock(amn);
>>> >>>>>> +
>>> >>>>> amdgpu_mn_read_lock/unlock support recursive
>>> locking for multiple
>>> >>>>> overlapping or nested invalidation ranges. But if
>>> you'r locking
>>> >>>>> and unlocking in the same function. Is that still
>>> a concern?
>>> >>> I don't understand the possible recursive case, but
>>> >>> amdgpu_mn_read_lock() still support recursive locking.
>>> >>>> Well the real problem is that unlocking them here
>>> won't work.
>>> >>>>
>>> >>>> We need to hold the lock until we are sure that the
>>> operation which
>>> >>>> updates the page tables is completed.
>>> >>>>
>>> >>> The reason for this change is because hmm mirror has
>>> >>> invalidate_start callback, no invalidate_end callback
>>> >>>
>>> >>> Check mmu_notifier.c and hmm.c again, below is
>>> entire logic to
>>> >>> update CPU page tables and callback:
>>> >>>
>>> >>> mn lock amn->lock is used to protect interval tree
>>> access because
>>> >>> user may submit/register new userptr anytime.
>>> >>> This is same for old and new way.
>>> >>>
>>> >>> step 2 guarantee the GPU operation is done before
>>> updating CPU page
>>> >>> table.
>>> >>>
>>> >>> So I think the change is safe. We don't need hold mn
>>> lock until the
>>> >>> CPU page tables update is completed.
>>> >> No, that isn't even remotely correct. The lock
>>> doesn't protects the
>>> >> interval tree.
>>> >>
>>> >>> Old:
>>> >>> 1. down_read_non_owner(&amn->lock)
>>> >>> 2. loop to handle BOs from node->bos through
>>> interval tree
>>> >>> amn->object nodes
>>> >>> gfx: wait for pending BOs fence operation
>>> done, mark user
>>> >>> pages dirty
>>> >>> kfd: evict user queues of the process, wait
>>> for queue
>>> >>> unmap/map operation done
>>> >>> 3. update CPU page tables
>>> >>> 4. up_read(&amn->lock)
>>> >>>
>>> >>> New, switch step 3 and 4
>>> >>> 1. down_read_non_owner(&amn->lock)
>>> >>> 2. loop to handle BOs from node->bos through
>>> interval tree
>>> >>> amn->object nodes
>>> >>> gfx: wait for pending BOs fence operation
>>> done, mark user
>>> >>> pages dirty
>>> >>> kfd: evict user queues of the process, wait
>>> for queue
>>> >>> unmap/map operation done
>>> >>> 3. up_read(&amn->lock)
>>> >>> 4. update CPU page tables
>>> >> The lock is there to make sure that we serialize page
>>> table updates
>>> >> with command submission.
>>> > As I understand it, the idea is to prevent command
>>> submission (adding
>>> > new fences to BOs) while a page table invalidation is
>>> in progress.
>>>
>>> Yes, exactly.
>>>
>>> > But do we really need another lock for this? Wouldn't the
>>> > re-validation of userptr BOs (currently calling
>>> get_user_pages) force
>>> > synchronization with the ongoing page table
>>> invalidation through the
>>> > mmap_sem or other MM locks?
>>>
>>> No and yes. We don't hold any other locks while doing
>>> command submission, but I expect that HMM has its own
>>> mechanism to prevent that.
>>>
>>> Since we don't modify
>>> amdgpu_mn_lock()/amdgpu_mn_unlock() we are certainly not
>>> using this mechanism correctly.
>>>
>>> Regards,
>>> Christian.
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> <mailto:amd-gfx at lists.freedesktop.org>
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180928/f66eeccf/attachment-0001.html>
More information about the amd-gfx
mailing list