[PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
Jason Gunthorpe
jgg at ziepe.ca
Fri Mar 13 22:51:57 UTC 2020
On Fri, Mar 13, 2020 at 02:04:46PM -0700, Matthew Wilcox wrote:
> On Fri, Mar 13, 2020 at 04:55:50PM -0300, Jason Gunthorpe wrote:
> > On Thu, Mar 12, 2020 at 05:02:18PM +0000, Steven Price wrote:
> > > On 12/03/2020 16:37, Jason Gunthorpe wrote:
> > > > On Thu, Mar 12, 2020 at 04:16:33PM +0000, Steven Price wrote:
> > > > > > Actually, while you are looking at this, do you think we should be
> > > > > > adding at least READ_ONCE in the pagewalk.c walk_* functions? The
> > > > > > multiple references of pmd, pud, etc without locking seems sketchy to
> > > > > > me.
> > > > >
> > > > > I agree it seems worrying. I'm not entirely sure whether the holding of
> > > > > mmap_sem is sufficient,
> > > >
> > > > I looked at this question, and at least for PMD, mmap_sem is not
> > > > sufficient. I didn't easilly figure it out for the other ones
> > > >
> > > > I'm guessing if PMD is not safe then none of them are.
> > > >
> > > > > this isn't something that I changed so I've just
> > > > > been hoping that it's sufficient since it seems to have been working
> > > > > (whether that's by chance because the compiler didn't generate multiple
> > > > > reads I've no idea). For walking the kernel's page tables the lack of
> > > > > READ_ONCE is also not great, but at least for PTDUMP we don't care too much
> > > > > about accuracy and it should be crash proof because there's no RCU grace
> > > > > period. And again the code I was replacing didn't have any special
> > > > > protection.
> > > > >
> > > > > I can't see any harm in updating the code to include READ_ONCE and I'm happy
> > > > > to review a patch.
> > > >
> > > > The reason I ask is because hmm's walkers often have this pattern
> > > > where they get the pointer and then de-ref it (again) then
> > > > immediately have to recheck the 'again' conditions of the walker
> > > > itself because the re-read may have given a different value.
> > > >
> > > > Having the walker deref the pointer and pass the value it into the ops
> > > > for use rather than repeatedly de-refing an unlocked value seems like
> > > > a much safer design to me.
> > >
> > > Yeah that sounds like a good idea.
> >
> > I'm looking at this now.. The PUD is also changing under the read
> > mmap_sem - and I was able to think up some race conditiony bugs
> > related to this. Have some patches now..
> >
> > However, I haven't been able to understand why walk_page_range()
> > doesn't check pud_present() or pmd_present() before calling
> > pmd_offset_map() or pte_offset_map().
> >
> > As far as I can see a non-present entry has a swap entry encoded in
> > it, and thus it seems like it is a bad idea to pass a non-present
> > entry to the two map functions. I think those should only be called
> > when the entry points to the next level in the page table (so there
> > is something to map?)
> >
> > I see you added !present tests for the !vma case, but why only there?
> >
> > Is this a bug? Do you know how it works?
> >
> > Is it something that was missed when people added non-present PUD and
> > PMD's?
>
> ... I'm sorry, I did what now?
No, no, just widening to see if someone knows
> As far as I can tell, you're talking
> about mm/pagewalk.c, and the only commit I have in that file is
> a00cc7d9dd93d66a3fb83fc52aa57a4bec51c517 ("mm, x86: add support for
> PUD-sized transparent hugepages", which I think I was pretty clear
> from the commit message is basically copy-and-paste from the PMD
> code.
Right, which added the split_huge_pud() which seems maybe related to
pud_present, or maybe not, I don't know.
> I have no clue why most of the decisions in the MM were made.
Fun!
Jason
More information about the amd-gfx
mailing list