[PATCH v2 02/15] mm/mmu_notifier: add an interval tree notifier
John Hubbard
jhubbard at nvidia.com
Thu Nov 7 20:53:56 UTC 2019
On 11/7/19 12:06 PM, Jason Gunthorpe wrote:
...
>>
>> Also, it is best moved down to be next to the new MNR structs, so that all the
>> MNR stuff is in one group.
>
> I agree with Jerome, this enum is part of the 'struct
> mmu_notifier_range' (ie the description of the invalidation) and it
> doesn't really matter that only these new notifiers can be called with
> this type, it is still part of the mmu_notifier_range.
>
OK.
> The comment already says it only applies to the mmu_range_notifier
> scheme..
>
>>> #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
>>> @@ -222,6 +228,26 @@ struct mmu_notifier {
>>> unsigned int users;
>>> };
>>>
>>
>> That should also be moved down, next to the new structs.
>
> Which this?
I was referring to MMU_NOTIFIER_RANGE_BLOCKABLE, above. Trying
to put all the new range notifier stuff in one place. But maybe not,
if these are really not as separate as I thought.
>
>>> +/**
>>> + * struct mmu_range_notifier_ops
>>> + * @invalidate: Upon return the caller must stop using any SPTEs within this
>>> + * range, this function can sleep. Return false if blocking was
>>> + * required but range is non-blocking
>>> + */
>>
>> How about this (I'm not sure I fully understand the return value, though):
>>
>> /**
>> * struct mmu_range_notifier_ops
>> * @invalidate: Upon return the caller must stop using any SPTEs within this
>> * range.
>> *
>> * This function is permitted to sleep.
>> *
>> * @Return: false if blocking was required, but @range is
>> * non-blocking.
>> *
>> */
>
> Is this kdoc format for function pointers?
heh, I'm sort of winging it, I'm not sure how function pointers are supposed
to be documented in kdoc. Actually the only key take-away here is to write
"This function can sleep"
as a separate sentence..
...
>> c) Rename new one. Ideas:
>>
>> struct mmu_interval_notifier
>> struct mmu_range_intersection
>> ...other ideas?
>
> This odd duality has already cause some confusion, but names here are
> hard. mmu_interval_notifier is the best alternative I've heard.
>
> Changing this name is a lot of work - are we happy
> 'mmu_interval_notifier' is the right choice?
Yes, it's my favorite too. I'd vote for going with that.
...
>>
>>
>> OK, this either needs more documentation and assertions, or a different
>> approach. Because I see addition, subtraction, AND, OR and booleans
>> all being applied to this field, and it's darn near hopeless to figure
>> out whether or not it really is even or odd at the right times.
>
> This is a standard design for a seqlock scheme and follows the
> existing design of the linux seq lock.
>
> The lower bit indicates the lock'd state and the upper bits indicate
> the generation of the lock
>
> The operations on the lock itself are then:
> seq |= 1 # Take the lock
> seq++ # Release an acquired lock
> seq & 1 # True if locked
>
> Which is how this is written
Very nice, would you be open to putting that into (any) one of the comment
headers? That's an unusually clear and concise description:
/*
* This is a standard design for a seqlock scheme and follows the
* existing design of the linux seq lock.
*
* The lower bit indicates the lock'd state and the upper bits indicate
* the generation of the lock
*
* The operations on the lock itself are then:
* seq |= 1 # Take the lock
* seq++ # Release an acquired lock
* seq & 1 # True if locked
*/
>
>> Different approach: why not just add a mmn_mm->is_invalidating
>> member variable? It's not like you're short of space in that struct.
>
> Splitting it makes alot of stuff more complex and unnatural.
>
OK, agreed.
> The ops above could be put in inline wrappers, but they only occur
> only in functions already called mn_itree_inv_start_range() and
> mn_itree_inv_end() and mn_itree_is_invalidating().
>
> There is the one 'take the lock' outlier in
> __mmu_range_notifier_insert() though
>
>>> +static void mn_itree_inv_end(struct mmu_notifier_mm *mmn_mm)
>>> +{
>>> + struct mmu_range_notifier *mrn;
>>> + struct hlist_node *next;
>>> + bool need_wake = false;
>>> +
>>> + spin_lock(&mmn_mm->lock);
>>> + if (--mmn_mm->active_invalidate_ranges ||
>>> + !mn_itree_is_invalidating(mmn_mm)) {
>>> + spin_unlock(&mmn_mm->lock);
>>> + return;
>>> + }
>>> +
>>> + mmn_mm->invalidate_seq++;
>>
>> Is this the right place for an assertion that this is now an even value?
>
> Yes, but I'm reluctant to add such a runtime check on this fastish path..
> How about a comment?
Sure.
>
>>> + need_wake = true;
>>> +
>>> + /*
>>> + * The inv_end incorporates a deferred mechanism like
>>> + * rtnl_lock(). Adds and removes are queued until the final inv_end
>>
>> Let me point out that rtnl_lock() itself is a one-liner that calls mutex_lock().
>> But I suppose if one studies that file closely there is more. :)
>
> Lets change that to rtnl_unlock() then
Thanks :)
...
>>> + * mrn->invalidate_seq is always set to an odd value. This ensures
>>
>> This claim just looks wrong the first N times one reads the code, given that
>> there is mmu_range_set_seq() to set it to an arbitrary value! Maybe
>> you mean
>
> mmu_range_set_seq() is NOT to be used to set to an arbitary value, it
> must only be used to set to the value provided in the invalidate()
> callback and that value is always odd. Lets make this super clear:
>
> /*
> * mrn->invalidate_seq must always be set to an odd value via
> * mmu_range_set_seq() using the provided cur_seq from
> * mn_itree_inv_start_range(). This ensures that if seq does wrap we
> * will always clear the below sleep in some reasonable time as
> * mmn_mm->invalidate_seq is even in the idle state.
> */
>
OK, that helps a lot.
...
>>> + mrn->invalidate_seq = mmn_mm->invalidate_seq - 1;
>>
>> Ohhh, checkmate. I lose. Why is *subtracting* the right thing to do
>> for seq numbers here? I'm acutely unhappy trying to figure this out.
>> I suspect it's another unfortunate side effect of trying to use the
>> lower bit of the seq number (even/odd) for something else.
>
> No, this is actually done for the seq number itself. We need to
> generate a seq number that is != the current invalidate_seq as this
> new mrn is not invalidating.
>
> The best seq to use is one that the invalidate_seq will not reach for
> a long time, ie 'invalidate_seq + MAX' which is expressed as -1
>
> The even/odd thing just takes care of itself naturally here as
> invalidate_seq is guarenteed even and -1 creates both an odd mrn value
> and a good seq number.
>
> The algorithm would actually work correctly if this was
> 'mrn->invalidate_seq = 1', but occasionally things would block when
> they don't need to block.
>
> Lets add a comment:
>
> /*
> * The starting seq for a mrn not under invalidation should be
> * odd, not equal to the current invalidate_seq and
> * invalidate_seq should not 'wrap' to the new seq any time
> * soon.
> */
Very helpful. How about this additional tweak:
/*
* The starting seq for a mrn not under invalidation should be
* odd, not equal to the current invalidate_seq and
* invalidate_seq should not 'wrap' to the new seq any time
* soon. Subtracting 1 from the current (even) value achieves that.
*/
>
>>> +int mmu_range_notifier_insert(struct mmu_range_notifier *mrn,
>>> + unsigned long start, unsigned long length,
>>> + struct mm_struct *mm)
>>> +{
>>> + struct mmu_notifier_mm *mmn_mm;
>>> + int ret;
>>
>> Hmmm, I think a later patch improperly changes the above to "int ret = 0;".
>> I'll check on that. It's correct here, though.
>
> Looks OK in my tree?
Nope, that's how I found it. The top of your mmu_notifier branch has this:
int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
{
struct mmu_notifier_mm *mmn_mm = range->mm->mmu_notifier_mm;
int ret = 0;
if (mmn_mm->has_interval) {
ret = mn_itree_invalidate(mmn_mm, range);
if (ret)
return ret;
}
if (!hlist_empty(&mmn_mm->list))
return mn_hlist_invalidate_range_start(mmn_mm, range);
return 0;
}
>
>>> + might_lock(&mm->mmap_sem);
>>> +
>>> + mmn_mm = smp_load_acquire(&mm->mmu_notifier_mm);
>>
>> What does the above pair with? Should have a comment that specifies that.
>
> smp_load_acquire() always pairs with smp_store_release() to the same
> memory, there is only one store, is a comment really needed?
>
> Below are the comment updates I made, thanks!
>
> Jason
>
> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> index 51b92ba013ddce..065c95002e9602 100644
> --- a/include/linux/mmu_notifier.h
> +++ b/include/linux/mmu_notifier.h
> @@ -302,15 +302,15 @@ void mmu_range_notifier_remove(struct mmu_range_notifier *mrn);
> /**
> * mmu_range_set_seq - Save the invalidation sequence
> * @mrn - The mrn passed to invalidate
> - * @cur_seq - The cur_seq passed to invalidate
> + * @cur_seq - The cur_seq passed to the invalidate() callback
> *
> * This must be called unconditionally from the invalidate callback of a
> * struct mmu_range_notifier_ops under the same lock that is used to call
> * mmu_range_read_retry(). It updates the sequence number for later use by
> - * mmu_range_read_retry().
> + * mmu_range_read_retry(). The provided cur_seq will always be odd.
> *
> - * If the user does not call mmu_range_read_begin() or mmu_range_read_retry()
> - * then this call is not required.
> + * If the caller does not call mmu_range_read_begin() or
> + * mmu_range_read_retry() then this call is not required.
> */
> static inline void mmu_range_set_seq(struct mmu_range_notifier *mrn,
> unsigned long cur_seq)
> @@ -348,8 +348,9 @@ static inline bool mmu_range_read_retry(struct mmu_range_notifier *mrn,
> * collided with this lock and a future mmu_range_read_retry() will return
> * true.
> *
> - * False is not reliable and only suggests a collision has not happened. It
> - * can be called many times and does not have to hold the user provided lock.
> + * False is not reliable and only suggests a collision may not have
> + * occured. It can be called many times and does not have to hold the user
> + * provided lock.
> *
> * This call can be used as part of loops and other expensive operations to
> * expedite a retry.
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 2b7485919ecfeb..afe1e2d94183f8 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -51,7 +51,8 @@ struct mmu_notifier_mm {
> * This is a collision-retry read-side/write-side 'lock', a lot like a
> * seqcount, however this allows multiple write-sides to hold it at
> * once. Conceptually the write side is protecting the values of the PTEs in
> - * this mm, such that PTES cannot be read into SPTEs while any writer exists.
> + * this mm, such that PTES cannot be read into SPTEs (shadow PTEs) while any
> + * writer exists.
> *
> * Note that the core mm creates nested invalidate_range_start()/end() regions
> * within the same thread, and runs invalidate_range_start()/end() in parallel
> @@ -64,12 +65,13 @@ struct mmu_notifier_mm {
> *
> * The write side has two states, fully excluded:
> * - mm->active_invalidate_ranges != 0
> - * - mnn->invalidate_seq & 1 == True
> + * - mnn->invalidate_seq & 1 == True (odd)
> * - some range on the mm_struct is being invalidated
> * - the itree is not allowed to change
> *
> * And partially excluded:
> * - mm->active_invalidate_ranges != 0
> + * - mnn->invalidate_seq & 1 == False (even)
> * - some range on the mm_struct is being invalidated
> * - the itree is allowed to change
> *
> @@ -131,12 +133,13 @@ static void mn_itree_inv_end(struct mmu_notifier_mm *mmn_mm)
> return;
> }
>
> + /* Make invalidate_seq even */
> mmn_mm->invalidate_seq++;
> need_wake = true;
>
> /*
> * The inv_end incorporates a deferred mechanism like
> - * rtnl_lock(). Adds and removes are queued until the final inv_end
> + * rtnl_unlock(). Adds and removes are queued until the final inv_end
> * happens then they are progressed. This arrangement for tree updates
> * is used to avoid using a blocking lock during
> * invalidate_range_start.
> @@ -230,10 +233,11 @@ unsigned long mmu_range_read_begin(struct mmu_range_notifier *mrn)
> spin_unlock(&mmn_mm->lock);
>
> /*
> - * mrn->invalidate_seq is always set to an odd value. This ensures
> - * that if seq does wrap we will always clear the below sleep in some
> - * reasonable time as mmn_mm->invalidate_seq is even in the idle
> - * state.
> + * mrn->invalidate_seq must always be set to an odd value via
> + * mmu_range_set_seq() using the provided cur_seq from
> + * mn_itree_inv_start_range(). This ensures that if seq does wrap we
> + * will always clear the below sleep in some reasonable time as
> + * mmn_mm->invalidate_seq is even in the idle state.
> */
> lock_map_acquire(&__mmu_notifier_invalidate_range_start_map);
> lock_map_release(&__mmu_notifier_invalidate_range_start_map);
> @@ -892,6 +896,12 @@ static int __mmu_range_notifier_insert(struct mmu_range_notifier *mrn,
> mrn->invalidate_seq = mmn_mm->invalidate_seq;
> } else {
> WARN_ON(mn_itree_is_invalidating(mmn_mm));
> + /*
> + * The starting seq for a mrn not under invalidation should be
> + * odd, not equal to the current invalidate_seq and
> + * invalidate_seq should not 'wrap' to the new seq any time
> + * soon.
> + */
> mrn->invalidate_seq = mmn_mm->invalidate_seq - 1;
> interval_tree_insert(&mrn->interval_tree, &mmn_mm->itree);
> }
>
Looks good. We're just polishing up minor points now, so you can add:
Reviewed-by: John Hubbard <jhubbard at nvidia.com>
thanks,
--
John Hubbard
NVIDIA
More information about the amd-gfx
mailing list