[PATCH 3/4] mm: simplify device private page handling in hmm_range_fault
Jason Gunthorpe
jgg at ziepe.ca
Fri Mar 20 00:03:45 UTC 2020
On Thu, Mar 19, 2020 at 03:56:50PM -0700, Ralph Campbell wrote:
> Adding linux-kselftest at vger.kernel.org for the test config question.
>
> On 3/19/20 11:17 AM, Jason Gunthorpe wrote:
> > On Tue, Mar 17, 2020 at 04:14:31PM -0700, Ralph Campbell wrote:
> > >
> > > On 3/17/20 5:59 AM, Christoph Hellwig wrote:
> > > > On Tue, Mar 17, 2020 at 09:47:55AM -0300, Jason Gunthorpe wrote:
> > > > > I've been using v7 of Ralph's tester and it is working well - it has
> > > > > DEVICE_PRIVATE support so I think it can test this flow too. Ralph are
> > > > > you able?
> > > > >
> > > > > This hunk seems trivial enough to me, can we include it now?
> > > >
> > > > I can send a separate patch for it once the tester covers it. I don't
> > > > want to add it to the original patch as it is a significant behavior
> > > > change compared to the existing code.
> > > >
> > >
> > > Attached is an updated version of my HMM tests based on linux-5.6.0-rc6.
> > > I ran this OK with Jason's 8+1 HMM patches, Christoph's 1-5 misc HMM clean ups,
> > > and Christoph's 1-4 device private page changes applied.
> >
> > I'd like to get this to mergable, it looks pretty good now, but I have
> > no idea about selftests - and I'm struggling to even compile the tools
> > dir
> >
> > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > > index 69def4a9df00..4d22ce7879a7 100644
> > > +++ b/lib/Kconfig.debug
> > > @@ -2162,6 +2162,18 @@ config TEST_MEMINIT
> > > If unsure, say N.
> > > +config TEST_HMM
> > > + tristate "Test HMM (Heterogeneous Memory Management)"
> > > + depends on DEVICE_PRIVATE
> > > + select HMM_MIRROR
> > > + select MMU_NOTIFIER
> >
> > extra spaces
>
> Will fix in v8.
>
> > In general I wonder if it even makes sense that DEVICE_PRIVATE is user
> > selectable?
>
> Should tests enable the feature or the feature enable the test?
> IMHO, if the feature is being compiled into the kernel, that should
> enable the menu item for the test. If the feature isn't selected,
> no need to test it :-)
I ment if DEVICE_PRIVATE should be a user selectable option at all, or
should it be turned on when a driver like nouveau is selected.
Is there some downside to enabling DEVICE_PRIVATE?
> > The notifier holds a mmgrab, no need for another one
>
> OK. I'll replace dmirror->mm with dmirror->notifier.mm.
Right that is good too
> > > + filp->private_data = dmirror;
> >
> > Not sure what this comment means
>
> I'll change the comment to:
> /*
> * The first open of the device character file registers the address
> * space of the process doing the open() system call with the device.
> * Subsequent file opens by other processes will have access to the
> * first process' address space.
> */
How does this happen? The function looks like it always does the same thing
> > > +static bool dmirror_interval_invalidate(struct mmu_interval_notifier *mni,
> > > + const struct mmu_notifier_range *range,
> > > + unsigned long cur_seq)
> > > +{
> > > + struct dmirror *dmirror = container_of(mni, struct dmirror, notifier);
> > > + struct mm_struct *mm = dmirror->mm;
> > > +
> > > + /*
> > > + * If the process doesn't exist, we don't need to invalidate the
> > > + * device page table since the address space will be torn down.
> > > + */
> > > + if (!mmget_not_zero(mm))
> > > + return true;
> >
> > Why? Don't the notifiers provide for this already.
> >
> > mmget_not_zero() is required before calling hmm_range_fault() though
Oh... This is the invalidate_all path during invalidation
IMHO you should test the invalidation reason in the range to exclude
this.
But xa_erase looks totally safe so there should be no reason to do
that.
> This is a workaround for a problem I don't quite understand.
> If you change tools/testing/selftests/vm/hmm-tests.c line 868 to
> ASSERT_EQ(ret, -1);
> Then the test will abort, core dump, and cause two problems,
> 1) the migrated page will be faulted back to system memory in order to write
> it to the core dump. This triggers lockdep_assert_held(&walk.mm->mmap_sem)
> in walk_page_range().
Has the migration stuff become entangled with the xarray?
> [ 137.980718] Code: 80 2f 1a 83 c6 05 e9 8d 7b 01 01 e8 3e b1 b1 fe e9 05 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 41 56 41 55 41 54 55 <48> 89 fd 53 4c 8d 6d 10 e8 3c fc ff ff 49 89 c4 4c 89 e0 83 e0 03
> [ 137.999461] RSP: 0018:ffffc900015e77c8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> [ 138.007028] RAX: ffff8886e508c408 RBX: 0000000000000000 RCX: ffffffff82626c89
> [ 138.014159] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffffc900015e78a0
> [ 138.021293] RBP: ffffc900015e78a0 R08: ffffffff811461c4 R09: fffff520002bcf17
> [ 138.028426] R10: fffff520002bcf16 R11: 0000000000000003 R12: 0000000002606d10
> [ 138.035557] R13: ffff8886e508c448 R14: 0000000000000031 R15: ffffffffa06546a0
> [ 138.042701] ? do_raw_spin_lock+0x104/0x1d0
> [ 138.046888] ? xas_store+0x19/0xa60
> [ 138.050390] xas_store+0x5b3/0xa60
> [ 138.053806] ? register_lock_class+0x860/0x860
> [ 138.058267] __xa_erase+0x96/0x110
> [ 138.061673] ? xas_store+0xa60/0xa60
> [ 138.065267] xa_erase+0x19/0x30
oh, it is doing this:
static void mn_itree_release(struct mmu_notifier_subscriptions *subscriptions,
struct mm_struct *mm)
{
struct mmu_notifier_range range = {
.flags = MMU_NOTIFIER_RANGE_BLOCKABLE,
.event = MMU_NOTIFY_RELEASE,
.mm = mm,
.start = 0,
.end = ULONG_MAX,
};
ie it is sitting doing a huge number of xa_erases, I suppose. Probably
in normal exit the notifier is removed before the mm is destroyed.
The xa_erase needs to be a bit smarter to jump over gaps in the tree
perhaps some
xa_for_each()
xa_erase()
pattern?
> > Also I get this:
> >
> > lib/test_hmm.c: In function ‘dmirror_devmem_fault_alloc_and_copy’:
> > lib/test_hmm.c:1041:25: warning: unused variable ‘vma’ [-Wunused-variable]
> > 1041 | struct vm_area_struct *vma = args->vma;
> >
> > But this is a kernel bug, due to alloc_page_vma being a #define not a
> > static inline and me having CONFIG_NUMA off in this .config
>
> Fixed.
in gfp.h?
Jason
More information about the amd-gfx
mailing list