[PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly

Jason Gunthorpe jgg at ziepe.ca
Thu Mar 12 16:37:34 UTC 2020


On Thu, Mar 12, 2020 at 04:16:33PM +0000, Steven Price wrote:
> > Actually, while you are looking at this, do you think we should be
> > adding at least READ_ONCE in the pagewalk.c walk_* functions? The
> > multiple references of pmd, pud, etc without locking seems sketchy to
> > me.
> 
> I agree it seems worrying. I'm not entirely sure whether the holding of
> mmap_sem is sufficient,

I looked at this question, and at least for PMD, mmap_sem is not
sufficient. I didn't easilly figure it out for the other ones

I'm guessing if PMD is not safe then none of them are.

> this isn't something that I changed so I've just
> been hoping that it's sufficient since it seems to have been working
> (whether that's by chance because the compiler didn't generate multiple
> reads I've no idea). For walking the kernel's page tables the lack of
> READ_ONCE is also not great, but at least for PTDUMP we don't care too much
> about accuracy and it should be crash proof because there's no RCU grace
> period. And again the code I was replacing didn't have any special
> protection.
>
> I can't see any harm in updating the code to include READ_ONCE and I'm happy
> to review a patch.

The reason I ask is because hmm's walkers often have this pattern
where they get the pointer and then de-ref it (again) then
immediately have to recheck the 'again' conditions of the walker
itself because the re-read may have given a different value.

Having the walker deref the pointer and pass the value it into the ops
for use rather than repeatedly de-refing an unlocked value seems like
a much safer design to me.

If this also implicitly relies on a RCU grace period then it is also
missing RCU locking...

I also didn't quite understand why walk_pte_range() skipped locking
the pte in the no_vma case - I don't get why vma would be related to
locking here.

I also saw that hmm open coded the pte walk, presumably for
performance, so I was thinking of adding some kind of pte_range()
callback to avoid the expensive indirect function call per pte, but
hmm also can't have the pmd locked...

Jason


More information about the amd-gfx mailing list