[PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v4
Christian König
christian.koenig at amd.com
Thu Sep 27 13:21:33 UTC 2018
Am 27.09.2018 um 15:18 schrieb Kuehling, Felix:
>
> > The problem is here:
> >
>
> > ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
>
> > amdgpu_mn_unlock(p->mn);
>
> >
> > We need to hold the lock until the fence is added to the reservation
> object.
> >
> > Otherwise somebody could have changed the page tables just in the
> moment between the check of amdgpu_ttm_tt_userptr_needs_pages() and
> adding the fence to the reservation object.
>
> I’m not planning to change that. I don’t think there is any need to
> change it.
>
Yeah, but when HMM doesn't provide both the start and the end hock of
the invalidation this way won't work any more.
So we need to find a solution for this,
Christian.
> Regards,
>
> Felix
>
> *From:*Koenig, Christian
> *Sent:* Thursday, September 27, 2018 7:24 AM
> *To:* Kuehling, Felix <Felix.Kuehling at amd.com>
> *Cc:* Yang, Philip <Philip.Yang at amd.com>;
> amd-gfx at lists.freedesktop.org; Jerome Glisse <j.glisse at gmail.com>
> *Subject:* Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace
> mmu notifier v4
>
> Am 27.09.2018 um 13:08 schrieb Kuehling, Felix:
>
> > We double check that there wasn't any page table modification
> while we prepared the submission and restart the whole process
> when there actually was some update.
> >
> > The reason why we need to do this is here:
> >
>
> > ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
> > amdgpu_mn_unlock(p->mn);
>
> >
> > Only after the new fence is added to the buffer object we can
> release the lock so that any invalidation will now block on our
> command submission to finish before it modifies the page table.
>
>
> I don’t see why this requires holding the read-lock until
> invalidate_range_end. amdgpu_ttm_tt_affect_userptr gets called
> while the mn read-lock is held in invalidate_range_start notifier.
>
>
> That's not related to amdgpu_ttm_tt_affect_userptr(), this function
> could actually be called outside the lock.
>
> The problem is here:
>
> ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
>
> amdgpu_mn_unlock(p->mn);
>
>
> We need to hold the lock until the fence is added to the reservation
> object.
>
> Otherwise somebody could have changed the page tables just in the
> moment between the check of amdgpu_ttm_tt_userptr_needs_pages() and
> adding the fence to the reservation object.
>
> Regards,
> Christian.
>
>
>
> Regards,
>
> Felix
>
> *From:*Koenig, Christian
> *Sent:* Thursday, September 27, 2018 5:27 AM
> *To:* Kuehling, Felix <Felix.Kuehling at amd.com>
> <mailto:Felix.Kuehling at amd.com>
> *Cc:* Yang, Philip <Philip.Yang at amd.com>
> <mailto:Philip.Yang at amd.com>; amd-gfx at lists.freedesktop.org
> <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse
> <j.glisse at gmail.com> <mailto:j.glisse at gmail.com>
> *Subject:* Re: [PATCH] drm/amdgpu: use HMM mirror callback to
> replace mmu notifier v4
>
> That is correct, but take a look what we do when after calling the
> amdgpu_mn_read_lock():
>
>
>
> /* No memory allocation is allowed while holding the
> mn lock */
> amdgpu_mn_lock(p->mn);
> amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
> struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
>
> if
> (amdgpu_ttm_tt_userptr_needs_pages(bo->tbo.ttm)) {
> r = -ERESTARTSYS;
> goto error_abort;
> }
> }
>
>
> We double check that there wasn't any page table modification
> while we prepared the submission and restart the whole process
> when there actually was some update.
>
> The reason why we need to do this is here:
>
>
> ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
> amdgpu_mn_unlock(p->mn);
>
>
> Only after the new fence is added to the buffer object we can
> release the lock so that any invalidation will now block on our
> command submission to finish before it modifies the page table.
>
> The only other option would be to add the fence first and then
> check if there was any update to the page tables.
>
> The issue with that approach is that adding a fence can't be made
> undone, so if we find that there actually was an update to the
> page tables we would need to somehow turn the CS into a dummy
> (e.g. overwrite all IBs with NOPs or something like that) and
> still submit it.
>
> Not sure if that is actually possible.
>
> Regards,
> Christian.
>
> Am 27.09.2018 um 10:47 schrieb Kuehling, Felix:
>
> So back to my previous question:
>
> >> But do we really need another lock for this? Wouldn't the
>
> >> re-validation of userptr BOs (currently calling
> get_user_pages) force
>
> >> synchronization with the ongoing page table invalidation
> through the
>
> >> mmap_sem or other MM locks?
>
> >
>
> > No and yes. We don't hold any other locks while doing
> command submission, but I expect that HMM has its own
> mechanism to prevent that.
>
> >
>
> > Since we don't modify amdgpu_mn_lock()/amdgpu_mn_unlock() we
> are certainly not using this mechanism correctly.
>
> The existing amdgpu_mn_lock/unlock should block the MMU
> notifier while a command submission is in progress. It should
> also block command submission while an MMU notifier is in
> progress.
>
> What we lose with HMM is the ability to hold a read-lock for
> the entire duration of the invalidate_range_start until
> invalidate_range_end. As I understand it, that lock is meant
> to prevent new command submissions while the page tables are
> being updated by the kernel. But my point is, that
> get_user_pages or hmm_vma_fault should do the same kind of
> thing. Before the command submission can go ahead, it needs to
> update the userptr addresses. If the page tables are still
> being updated, it will block there even without holding the
> amdgpu_mn_read_lock.
>
> Regards,
>
> Felix
>
> *From:* Koenig, Christian
> *Sent:* Thursday, September 27, 2018 3:00 AM
> *To:* Kuehling, Felix <Felix.Kuehling at amd.com>
> <mailto:Felix.Kuehling at amd.com>
> *Cc:* Yang, Philip <Philip.Yang at amd.com>
> <mailto:Philip.Yang at amd.com>; amd-gfx at lists.freedesktop.org
> <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse
> <j.glisse at gmail.com> <mailto:j.glisse at gmail.com>
> *Subject:* RE: [PATCH] drm/amdgpu: use HMM mirror callback to
> replace mmu notifier v4
>
> No, that won't work. We would still run into lock inversion
> problems.
>
> What we could do with the scheduler is to turn submissions
> into dummies if we find that the page tables are now outdated.
>
> But that would be really hacky and I'm not sure if that would
> really work in all cases.
>
> Christian.
>
> Am 27.09.2018 08:53 schrieb "Kuehling, Felix"
> <Felix.Kuehling at amd.com <mailto:Felix.Kuehling at amd.com>>:
>
> I had a chat with Jerome yesterday. He pointed out that the
> new blockable parameter can be used to infer whether the MMU
> notifier is being called in a reclaim operation. So if
> blockable==true, it should even be safe to take the BO
> reservation lock without problems. I think with that we should
> be able to remove the read-write locking completely and go
> back to locking (or try-locking for blockable==false) the
> reservation locks in the MMU notifier?
>
> Regards,
> Felix
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org
> <mailto:amd-gfx-bounces at lists.freedesktop.org>> On Behalf Of
> Christian König
> Sent: Saturday, September 15, 2018 3:47 AM
> To: Kuehling, Felix <Felix.Kuehling at amd.com
> <mailto:Felix.Kuehling at amd.com>>; Yang, Philip
> <Philip.Yang at amd.com <mailto:Philip.Yang at amd.com>>;
> amd-gfx at lists.freedesktop.org
> <mailto:amd-gfx at lists.freedesktop.org>; Jerome Glisse
> <j.glisse at gmail.com <mailto:j.glisse at gmail.com>>
> Subject: Re: [PATCH] drm/amdgpu: use HMM mirror callback to
> replace mmu notifier v4
>
> Am 14.09.2018 um 22:21 schrieb Felix Kuehling:
> > On 2018-09-14 01:52 PM, Christian König wrote:
> >> Am 14.09.2018 um 19:47 schrieb Philip Yang:
> >>> On 2018-09-14 03:51 AM, Christian König wrote:
> >>>> Am 13.09.2018 um 23:51 schrieb Felix Kuehling:
> >>>>> On 2018-09-13 04:52 PM, Philip Yang wrote:
> >>>>> [SNIP]
> >>>>>> + amdgpu_mn_read_unlock(amn);
> >>>>>> +
> >>>>> amdgpu_mn_read_lock/unlock support recursive locking for
> multiple
> >>>>> overlapping or nested invalidation ranges. But if you'r
> locking
> >>>>> and unlocking in the same function. Is that still a concern?
> >>> I don't understand the possible recursive case, but
> >>> amdgpu_mn_read_lock() still support recursive locking.
> >>>> Well the real problem is that unlocking them here won't work.
> >>>>
> >>>> We need to hold the lock until we are sure that the
> operation which
> >>>> updates the page tables is completed.
> >>>>
> >>> The reason for this change is because hmm mirror has
> >>> invalidate_start callback, no invalidate_end callback
> >>>
> >>> Check mmu_notifier.c and hmm.c again, below is entire
> logic to
> >>> update CPU page tables and callback:
> >>>
> >>> mn lock amn->lock is used to protect interval tree access
> because
> >>> user may submit/register new userptr anytime.
> >>> This is same for old and new way.
> >>>
> >>> step 2 guarantee the GPU operation is done before updating
> CPU page
> >>> table.
> >>>
> >>> So I think the change is safe. We don't need hold mn lock
> until the
> >>> CPU page tables update is completed.
> >> No, that isn't even remotely correct. The lock doesn't
> protects the
> >> interval tree.
> >>
> >>> Old:
> >>> 1. down_read_non_owner(&amn->lock)
> >>> 2. loop to handle BOs from node->bos through interval tree
> >>> amn->object nodes
> >>> gfx: wait for pending BOs fence operation done,
> mark user
> >>> pages dirty
> >>> kfd: evict user queues of the process, wait for queue
> >>> unmap/map operation done
> >>> 3. update CPU page tables
> >>> 4. up_read(&amn->lock)
> >>>
> >>> New, switch step 3 and 4
> >>> 1. down_read_non_owner(&amn->lock)
> >>> 2. loop to handle BOs from node->bos through interval tree
> >>> amn->object nodes
> >>> gfx: wait for pending BOs fence operation done,
> mark user
> >>> pages dirty
> >>> kfd: evict user queues of the process, wait for queue
> >>> unmap/map operation done
> >>> 3. up_read(&amn->lock)
> >>> 4. update CPU page tables
> >> The lock is there to make sure that we serialize page table
> updates
> >> with command submission.
> > As I understand it, the idea is to prevent command
> submission (adding
> > new fences to BOs) while a page table invalidation is in
> progress.
>
> Yes, exactly.
>
> > But do we really need another lock for this? Wouldn't the
> > re-validation of userptr BOs (currently calling
> get_user_pages) force
> > synchronization with the ongoing page table invalidation
> through the
> > mmap_sem or other MM locks?
>
> No and yes. We don't hold any other locks while doing command
> submission, but I expect that HMM has its own mechanism to
> prevent that.
>
> Since we don't modify amdgpu_mn_lock()/amdgpu_mn_unlock() we
> are certainly not using this mechanism correctly.
>
> Regards,
> Christian.
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> <mailto:amd-gfx at lists.freedesktop.org>
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180927/b4ccde80/attachment-0001.html>
More information about the amd-gfx
mailing list