Separating xe_vma- and page-table state

Wed Mar 13 10:56:12 UTC 2024

On Wed, 2024-03-13 at 01:27 +0000, Matthew Brost wrote:
> On Tue, Mar 12, 2024 at 05:02:20PM -0600, Zeng, Oak wrote:
> > Hi Thomas,
> 
> 

....

> Thomas:
> 
> I like the idea of VMAs in the PT code function being marked as const
> and having the xe_pt_state as non const. It makes ownership very
> clear.
> 
> Not sure how that will fit into [1] as that series passes around
> a "struct xe_vm_ops" which is a list of "struct xe_vma_op". It does
> this
> to make "struct xe_vm_ops" a single atomic operation. The VMAs are
> extracted either the GPUVM base operation or "struct xe_vma_op".
> Maybe
> these can be const? I'll look into that but this might not work out
> in
> practice.
> 
> Agree also unsure how 1:N xe_vma <-> xe_pt_state relationship fits in
> hmmptrs. Could you explain your thinking here?

There is a need for hmmptrs to be sparse. When we fault we create a
chunk of PTEs that we populate. This chunk could potentially be large
and covering the whole CPU vma or it could be limited to, say 2MiB and
aligned to allow for large page-table entries. In Oak's POC these
chunks are called "svm ranges"

So the question arises, how do we map that to the current vma
management and page-table code? There are basically two ways:

1) Split VMAs so they are either fully populated or unpopulated, each
svm_range becomes an xe_vma.
2) Create xe_pt_range / xe_pt_state whatever with an 1:1 mapping with
the svm_mange and a 1:N mapping with xe_vmas.

Initially my thinking was that 1) Would be the simplest approach with
the code we have today. I lifted that briefly with Sima and he answered
"And why would we want to do that?", and the answer at hand was ofc
that the page-table code worked with vmas. Or rather that we mix vma
state (the hmmptr range / attributes) and page-table state (the regions
of the hmmptr that are actually populated), so it would be a
consequence of our current implementation (limitations).

With the suggestion to separate vma state and pt state, the xe_svm
ranges map to pt state and are managed per hmmptr vma. The vmas would
then be split mainly as a result of UMD mapping something else (bo) on
top, or UMD giving new memory attributes for a range (madvise type of
operations).

/Thomas