Separating xe_vma- and page-table state

Wed Mar 13 01:27:02 UTC 2024

On Tue, Mar 12, 2024 at 05:02:20PM -0600, Zeng, Oak wrote:
> Hi Thomas,

Going to reply to both Thomas and Oak to not split this thread.

> 
> > -----Original Message-----
> > From: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > Sent: Tuesday, March 12, 2024 3:43 AM
> > To: intel-xe at lists.freedesktop.org
> > Cc: Brost, Matthew <matthew.brost at intel.com>; Zeng, Oak
> > <oak.zeng at intel.com>
> > Subject: Separating xe_vma- and page-table state
> > 
> > Hi,
> > 
> > It's IMO become apparent both in the system allocator discussion and in
> > the patch that enables the presence of invalid vmas 
> > that we need to be
> > better at separating xe_vma and page-table state, so that xe_vma state
> > would contain things that are mostly immutable and that the user
> > requested: PAT index, memory attributes, requested tile presence etc,
> > whereas the page-table state would contain mutable state like actual
> > tile presence, invalidaton state and MMU notifier.

Thomas:

100% agree with the idea. There are 2 distict parts of xe_vma - what I
call GPUVM state and PT state. In addition, there is what I call IOCTL
state (sync objs, etc...) that is updated in various places in xe_vm.c.

Functions to update these respective states should be clearly seperated.
In the current implementaion is a complete mess in this regard, you
mention xe_vm_unbind_vma below, in its current state there is no way
this function could be easily be reused.

My series [1] vastly improves this seperation. It still could be split a
bit better though. In my next rebase I can look at this series through
the lens of clearly maintaining seperation + likely update [2] to do an
unbind rather than a invalidation.

[1] https://patchwork.freedesktop.org/series/125608/
[2] https://patchwork.freedesktop.org/series/130935/

> 
> It is a valid reasoning to me... if we want to do what community want us to do with system allocator, 
> And if we want to meet our umd's requirement of "free w/o vm_unbind", yes,  we need this "invalid" vma concept.
> 
> The strange thing is, it seems Matt can still achieve the goal without introducing invalid vma concept... it doesn't look like he has
> This concept in his patch....
>

Oak:

I'm a little confused by what mean by "invalid" vma concept. Does
that mean no GPU backing or page tables? That is essentially what I call
system allocator VMA in [3], right?

Also based on Thomas say below related to xe_vm_unbind_vma and [2] we
should be able to have VMA that points to a backing store (either
userptr or BO) but doesn't have GPU mappings. We do have this for
faulting VMs before the initial bind of a VMA but we cannot currently
dynamically change this (we can invalidate page tables but cannot remove
them without destroying the VMA). We should fix that.

[3] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-system-allocator/-/commit/3fcc83c9b075364a9d83415ca73a9f9625543d7c

> 
> > 
> > So far we have had no reason to separate the two, but with hmmptr we
> > would likely end up with multiple page-table regions per xe-vma, 
> 
> Can we still maintain 1 xe-vma : 1 page table region relationship for simplicity?
> 
> This requires vma splitting. i.e., if you have a hmmptr vma cover range [0~100],
> And fault happens at range [40~60], then we will end up with 3 vmas:
> [0~40], a dummy hmmptr vma, not mapped gpu
> [40~60], hmmptr vma mapped to gp
> [60~100], dummy, not mapped to gpu.
> 
> Does this work for you? Or do you see a benefit of not splitting vma?
>

Thomas / Oak:

Agree for at least the initial implementation a 1 to 1 relationship will
be easier. Also unsure what the benefit of not splitting VMAs is?

That being said, I don't think this is something we need to decide now.

> 
> 
> and
> > with the patch discussed earlier we could've easily reused
> > xe_vm_unbind_vma() that only touches the mutable page-table state and
> > does the correct locking.
> > 
> > The page table could would then typically take a const xe_vma *, and
> > and xe_pt_state *, or whatever we choose to call it. All xe_vmas except
> > hmmptr ones would have an 1:1 xe_vma <-> xe_pt_state relationship.

Thomas:

I like the idea of VMAs in the PT code function being marked as const
and having the xe_pt_state as non const. It makes ownership very clear.

Not sure how that will fit into [1] as that series passes around
a "struct xe_vm_ops" which is a list of "struct xe_vma_op". It does this
to make "struct xe_vm_ops" a single atomic operation. The VMAs are
extracted either the GPUVM base operation or "struct xe_vma_op". Maybe
these can be const? I'll look into that but this might not work out in
practice.

Agree also unsure how 1:N xe_vma <-> xe_pt_state relationship fits in
hmmptrs. Could you explain your thinking here?

> > 
> 
> Matt has POC codes which works for me for system allocator w/o this vma/pt_state splitting: https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-system-allocator/-/commits/system_allocator_poc?ref_type=heads
> 
> Can you take a look also... 
> 

Thomas:

It might be helpful to look at that branch too, the last 3 patches [3]
[4] [5] implement a PoC system allocator design built on top of [1],
using existing userptrs for now, and is based on our discussions in the
system allocator design doc. Coding is sometimes the easiest way to
convay ideas and this roughly what I had in mind. Not complete by any
means and still tons of work to do to make this a working solution.

[4] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-system-allocator/-/commit/3fcc83c9b075364a9d83415ca73a9f9625543d7c
[5] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-system-allocator/-/commit/d2371e7c007406fb5e84e6605da238d12caa5c62
[6] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-system-allocator/-/commit/91cf8b38428ae9180c92435d3519193d074e837a

> When I worked on system allocator HLD, I pictured the invalid vma concept also. But somehow Matt was able to make it working without such concept....
>

Oak:

Still unclear on this. Maybe we can chat offline to clear this up.

Matt

> Oak
> 
> > Thoughts, comments?
> > 
> > Thanks,
> > Thomas