[PATCH 1/7] mm: Add write-protect and clean utilities for address space ranges

Thomas Hellström (VMware) thomas_os at shipmail.org
Wed Sep 18 17:44:02 UTC 2019


On 9/18/19 4:41 PM, Kirill A. Shutemov wrote:
> On Wed, Sep 18, 2019 at 02:59:08PM +0200, Thomas Hellström (VMware) wrote:
>> From: Thomas Hellstrom <thellstrom at vmware.com>
>>
>> Add two utilities to a) write-protect and b) clean all ptes pointing into
>> a range of an address space.
>> The utilities are intended to aid in tracking dirty pages (either
>> driver-allocated system memory or pci device memory).
>> The write-protect utility should be used in conjunction with
>> page_mkwrite() and pfn_mkwrite() to trigger write page-faults on page
>> accesses. Typically one would want to use this on sparse accesses into
>> large memory regions. The clean utility should be used to utilize
>> hardware dirtying functionality and avoid the overhead of page-faults,
>> typically on large accesses into small memory regions.
>>
>> The added file "as_dirty_helpers.c" is initially listed as maintained by
>> VMware under our DRM driver. If somebody would like it elsewhere,
>> that's of course no problem.
> After quick glance, it looks a lot as rmap code duplication. Why not
> extend rmap_walk() interface instead to cover range of pages?

There appears to exist quite a few pagetable walks in the mm code. "Take 
1" of this patch series modified the "apply_to_page_range" interface and 
used that. But the interface modification was actually what eventually 
caused Linus to reject the code. While it is entirely possible to do a 
proper modification following Linus' and Christoph's guidelines, that 
code doesn't allow for huge pages and populates all page table levels. 
We will soon probably want to support huge pages and do not want to 
populate. The number of altered code-paths itself IMO motivates yet 
another pagetable walk implementation.

The walk code currently resembling the present patch the most is the 
unmap_mapping_range() implementation.

The rmap_walk() is not very well suited since it operates on a struct 
page and the code of this patch has no notion of struct pages.

So my thoughts on this is that the interface should in time move towards 
the code in mm/pagewalk.c. If we eventually have more users of an 
address-space pagewalk or want to re-implement unmap_mapping_range() 
using a generic pagewalk, we should move the walk to pagewalk.c and 
reuse its structures, but implement separate code for the walk since we 
can't split huge pages and we can't take the mmap_sem. Meanwhile we 
should keep the code separate in as_dirty_helpers.c

>
>> Cc: Andrew Morton <akpm at linux-foundation.org>
>> Cc: Matthew Wilcox <willy at infradead.org>
>> Cc: Will Deacon <will.deacon at arm.com>
>> Cc: Peter Zijlstra <peterz at infradead.org>
>> Cc: Rik van Riel <riel at surriel.com>
>> Cc: Minchan Kim <minchan at kernel.org>
>> Cc: Michal Hocko <mhocko at suse.com>
>> Cc: Huang Ying <ying.huang at intel.com>
>> Cc: Souptick Joarder <jrdr.linux at gmail.com>
>> Cc: "Jérôme Glisse" <jglisse at redhat.com>
>> Cc: linux-mm at kvack.org
>> Cc: linux-kernel at vger.kernel.org
>>
>> Signed-off-by: Thomas Hellstrom <thellstrom at vmware.com>
>> Reviewed-by: Ralph Campbell <rcampbell at nvidia.com> #v1
>> ---
>>   MAINTAINERS           |   1 +
>>   include/linux/mm.h    |  13 +-
>>   mm/Kconfig            |   3 +
>>   mm/Makefile           |   1 +
>>   mm/as_dirty_helpers.c | 392 ++++++++++++++++++++++++++++++++++++++++++
>>   5 files changed, 409 insertions(+), 1 deletion(-)
>>   create mode 100644 mm/as_dirty_helpers.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index c2d975da561f..b596c7cf4a85 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -5287,6 +5287,7 @@ T:	git git://people.freedesktop.org/~thomash/linux
>>   S:	Supported
>>   F:	drivers/gpu/drm/vmwgfx/
>>   F:	include/uapi/drm/vmwgfx_drm.h
>> +F:	mm/as_dirty_helpers.c
> Emm.. No. Core MM functinality cannot belong to random driver.

OK. I'll put it under core MM.

/Thomas





More information about the dri-devel mailing list