[RFC v1 1/3] mm/mmu_notifier: Add a new notifier for mapping updates (new pages)

Fri Aug 4 00:14:59 UTC 2023

David Hildenbrand <david at redhat.com> writes:

> On 03.08.23 14:14, Jason Gunthorpe wrote:
>> On Thu, Aug 03, 2023 at 07:35:51AM +0000, Kasireddy, Vivek wrote:
>>> Hi Jason,
>>>
>>>>>> Right, the "the zero pages are changed into writable pages" in your
>>>>>> above comment just might not apply, because there won't be any page
>>>>>> replacement (hopefully :) ).
>>>>
>>>>> If the page replacement does not happen when there are new writes to the
>>>>> area where the hole previously existed, then would we still get an
>>>> invalidate
>>>>> when this happens? Is there any other way to get notified when the zeroed
>>>>> page is written to if the invalidate does not get triggered?
>>>>
>>>> What David is saying is that memfd does not use the zero page
>>>> optimization for hole punches. Any access to the memory, including
>>>> read-only access through hmm_range_fault() will allocate unique
>>>> pages. Since there is no zero page and no zero-page replacement there
>>>> is no issue with invalidations.
>> 
>>> It looks like even with hmm_range_fault(), the invalidate does not get
>>> triggered when the hole is refilled with new pages because of writes.
>>> This is probably because hmm_range_fault() does not fault in any pages
>>> that get invalidated later when writes occur.
>> hmm_range_fault() returns the current content of the VMAs, or it
>> faults. If it returns pages then it came from one of these two places.
>> If your VMA is incoherent with what you are doing then you have
>> bigger
>> problems, or maybe you found a bug.

Note it will only fault in pages if HMM_PFN_REQ_FAULT is specified. You
are setting that however you aren't setting HMM_PFN_REQ_WRITE which is
what would trigger a fault to bring in the new pages. Does setting that
fix the issue you are seeing?

>>> The above log messages are seen immediately after the hole is punched. As
>>> you can see, hmm_range_fault() returns the pfns of old pages and not zero
>>> pages. And, I see the below messages (with patch #2 in this series applied)
>>> as the hole is refilled after writes:
>> I don't know what you are doing, but it is something wrong or you've
>> found a bug in the memfds.
>
>
> Maybe THP is involved? I recently had to dig that out for an internal
> discussion:
>
> "Currently when truncating shmem file, if the range is partial of THP
> (start or end is in the middle of THP), the pages actually will just get
> cleared rather than being freed unless the range cover the whole THP.
> Even though all the subpages are truncated (randomly or sequentially),
> the THP may still be kept in page cache.  This might be fine for some
> usecases which prefer preserving THP."
>
> My recollection is that this behavior was never changed.
>
> https://lore.kernel.org/all/1575420174-19171-1-git-send-email-yang.shi@linux.alibaba.com/