[RFC v1 1/3] mm/mmu_notifier: Add a new notifier for mapping updates (new pages)
Kasireddy, Vivek
vivek.kasireddy at intel.com
Sat Jul 29 00:46:59 UTC 2023
Hi Jason,
> > > > > If you still need the memory mapped then you re-call
> hmm_range_fault
> > > > > and re-obtain it. hmm_range_fault will resolve all the races and you
> > > > > get new pages.
> > >
> > > > IIUC, for my udmabuf use-case, it looks like calling hmm_range_fault
> > > > immediately after an invalidate (range notification) would preemptively
> > > fault in
> > > > new pages before a write. The problem with that is if a read occurs on
> > > those
> > > > new pages, then the data is incorrect as a write may not have
> > > > happened yet.
> > >
> > > It cannot be, if you use hmm_range_fault correctly you cannot get
> > > corruption no matter what is done to the mmap'd memfd. If there is
> > > otherwise it is a hmm_range_fault bug plain and simple.
> > >
> > > > Ideally, what I am looking for is for getting new pages at the time of or
> after
> > > > a write; until then, it is ok to use the old pages given my use-case.
> > >
> > > It is wrong, if you are synchronizing the vma then you must use the
> > > latest copy. If your use case can tolerate it then keep a 'not
> > > present' indication for the missing pages until you actually need
> > > them, but dmabuf doesn't really provide an API for that.
> > >
> > > > I think the difference comes down to whether we (udmabuf driver)
> want to
> > > > grab the new pages after getting notified about a PTE update because
> > > > of a fault
> > >
> > > Why? You still haven't explained why you want this.
> > Ok, let me explain using one of the udmabuf selftests (added in patch #3)
> > to describe the problem (sorry, I'd have to use the terms memfd, hole, etc)
> > I am trying to solve:
> > size = MEMFD_SIZE * page_size;
> > memfd = create_memfd_with_seals(size, false);
> > addr1 = mmap_fd(memfd, size);
> > write_to_memfd(addr1, size, 'a');
> > buf = create_udmabuf_list(devfd, memfd, size);
> > addr2 = mmap_fd(buf, NUM_PAGES * NUM_ENTRIES * getpagesize());
> > punch_hole(memfd, MEMFD_SIZE / 2);
> > -> At this point, if I were to read addr1, it'd still have "a" in relevant areas
> > because a new write hasn't happened yet. And, since this results in an
> > invalidation (notification) of the associated VMA range, I could register
> > a callback in udmabuf driver and get notified but I am not sure how or
> > why that would be useful.
>
> When you get an invalidation you trigger dmabuf move, which revokes
> the importes use of the dmabuf because the underlying memory has
> changed. This is exactly the same as a GPU driver migrating memory
> to/fro CPU memory.
>
> >
> > write_to_memfd(addr1, size, 'b');
> > -> Here, the hole gets refilled as a result of the above writes which trigger
> > faults and the PTEs are updated to point to new pages. When this
> happens,
> > the udmabuf driver needs to be made aware of the new pages that
> were
> > faulted in because of the new writes.
>
> You only need this because you are not processing the invalidate.
>
> > a way to get notified when the hole is written to, the solution I came
> up
> > with is to either add a new notifier or add calls to change_pte() when
> the
> > PTEs do get updated. However, considering your suggestion to use
> > hmm_range_fault(), it is not clear to me how it would help while the
> hole
> > is being written to as the writes occur outside of the
> > udmabuf driver.
>
> You have the design backwards.
>
> When a dmabuf importer asks for the dmabuf to be present you call
> hmm_range_fault() and you get back whatever memory is appropriate. The
> importer can then use it.
>
> If the underlying memory changes then you get the invalidation and you
> trigger move. The importer stops using the memory and the underlying
> pages change.
>
> Later the importer decides it needs the memory again so it again asks
> for the dmabuf to be present, which does hmm_range_fault and gets
> whatever is appropriate at the time.
Unless I am missing something, I think just doing the above still won't solve
the problem. Consider this sequence:
write_to_memfd(addr1, size, 'a');
buf = create_udmabuf_list(devfd, memfd, size);
addr2 = mmap_fd(buf, NUM_PAGES * NUM_ENTRIES * getpagesize());
read(addr2);
write_to_memfd(addr1, size, 'b');
punch_hole(memfd, MEMFD_SIZE / 2);
-> Since we can process the invalidate at this point, as per your suggestion,
we can trigger dmabuf move to let the importers know that the dmabuf's
backing memory has changed (or moved).
read(addr2);
-> Because there is a hole, we can handle the read by either providing the
old pages or zero pages (if using hmm_range_fault()) to the importers.
Maybe it is against convention, but I think it makes sense to provide old
pages (that were mapped before the hole punch) because the importers
have not read the data in these pages ('b' above) yet. And, another reason
to provide old pages is because the data in these pages is shown in a window
on the Host's screen so it doesn't make sense to show zero page data.
-> write_to_memfd(addr1, size, 'c');
As the hole gets refilled (with new pages) after the above write, AFAIU, we
have to tell the importers again that since the backing memory has changed,
(new pages) they need to recreate their mappings. But herein lies the problem:
from inside the udmabuf driver, we cannot know when this write occurs, so we
would not be able to notify the importers of the dmabuf move. Since Qemu knows
about this write, I was initially thinking of adding a new udmabuf IOCTL (something
like UDMABUF_PREPARE) to have Qemu let udmabuf know after writes occur.
This would provide an opportunity after an invalidate to notify the importers of
the (dmabuf) move. I think this would solve the problem with changes only to the
udmabuf driver (including adding the new IOCTL + handling the invalidate) but I
was hoping to solve it in a generic way by adding a new notifier or using change_pte()
to get notified about PTE updates (when faults occur).
Thanks,
Vivek
> Jason
More information about the dri-devel
mailing list