[PATCH 07/13] mm: remove the page_shift member from struct hmm_range
Christoph Hellwig
hch at lst.de
Tue Jul 30 13:14:30 UTC 2019
On Tue, Jul 30, 2019 at 12:55:17PM +0000, Jason Gunthorpe wrote:
> I suspect this was added for the ODP conversion that does use both
> page sizes. I think the ODP code for this is kind of broken, but I
> haven't delved into that..
>
> The challenge is that the driver needs to know what page size to
> configure the hardware before it does any range stuff.
>
> The other challenge is that the HW is configured to do only one page
> size, and if the underlying CPU page side changes it goes south.
>
> What I would prefer is if the driver could somehow dynamically adjust
> the the page size after each dma map, but I don't know if ODP HW can
> do that.
>
> Since this is all driving toward making ODP use this maybe we should
> keep this API?
>
> I'm not sure I can loose the crappy huge page support in ODP.
The problem is that I see no way how to use the current API. To know
the huge page size you need to have the vma, and the current API
doesn't require a vma to be passed in.
That's why I suggested an api where we pass in a flag that huge pages
are ok into hmm_range_fault, and it then could pass the shift out, and
limits itself to a single vma (which it normally doesn't, that is an
additional complication). But all this seems really awkward in terms
of an API still. AFAIK ODP is only used by mlx5, and mlx5 unlike other
IB HCAs can use scatterlist style MRs with variable length per entry,
so even if we pass multiple pages per entry from hmm it could coalesce
them. The best API for mlx4 would of course be to pass a biovec-style
variable length structure that hmm_fault could fill out, but that would
be a major restructure.
More information about the amd-gfx
mailing list