Separating xe_vma- and page-table state
Matthew Brost
matthew.brost at intel.com
Wed Mar 13 01:27:02 UTC 2024
On Tue, Mar 12, 2024 at 05:02:20PM -0600, Zeng, Oak wrote:
> Hi Thomas,
Going to reply to both Thomas and Oak to not split this thread.
>
> > -----Original Message-----
> > From: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > Sent: Tuesday, March 12, 2024 3:43 AM
> > To: intel-xe at lists.freedesktop.org
> > Cc: Brost, Matthew <matthew.brost at intel.com>; Zeng, Oak
> > <oak.zeng at intel.com>
> > Subject: Separating xe_vma- and page-table state
> >
> > Hi,
> >
> > It's IMO become apparent both in the system allocator discussion and in
> > the patch that enables the presence of invalid vmas
> > that we need to be
> > better at separating xe_vma and page-table state, so that xe_vma state
> > would contain things that are mostly immutable and that the user
> > requested: PAT index, memory attributes, requested tile presence etc,
> > whereas the page-table state would contain mutable state like actual
> > tile presence, invalidaton state and MMU notifier.
Thomas:
100% agree with the idea. There are 2 distict parts of xe_vma - what I
call GPUVM state and PT state. In addition, there is what I call IOCTL
state (sync objs, etc...) that is updated in various places in xe_vm.c.
Functions to update these respective states should be clearly seperated.
In the current implementaion is a complete mess in this regard, you
mention xe_vm_unbind_vma below, in its current state there is no way
this function could be easily be reused.
My series [1] vastly improves this seperation. It still could be split a
bit better though. In my next rebase I can look at this series through
the lens of clearly maintaining seperation + likely update [2] to do an
unbind rather than a invalidation.
[1] https://patchwork.freedesktop.org/series/125608/
[2] https://patchwork.freedesktop.org/series/130935/
>
> It is a valid reasoning to me... if we want to do what community want us to do with system allocator,
> And if we want to meet our umd's requirement of "free w/o vm_unbind", yes, we need this "invalid" vma concept.
>
> The strange thing is, it seems Matt can still achieve the goal without introducing invalid vma concept... it doesn't look like he has
> This concept in his patch....
>
Oak:
I'm a little confused by what mean by "invalid" vma concept. Does
that mean no GPU backing or page tables? That is essentially what I call
system allocator VMA in [3], right?
Also based on Thomas say below related to xe_vm_unbind_vma and [2] we
should be able to have VMA that points to a backing store (either
userptr or BO) but doesn't have GPU mappings. We do have this for
faulting VMs before the initial bind of a VMA but we cannot currently
dynamically change this (we can invalidate page tables but cannot remove
them without destroying the VMA). We should fix that.
[3] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-system-allocator/-/commit/3fcc83c9b075364a9d83415ca73a9f9625543d7c
>
> >
> > So far we have had no reason to separate the two, but with hmmptr we
> > would likely end up with multiple page-table regions per xe-vma,
>
> Can we still maintain 1 xe-vma : 1 page table region relationship for simplicity?
>
> This requires vma splitting. i.e., if you have a hmmptr vma cover range [0~100],
> And fault happens at range [40~60], then we will end up with 3 vmas:
> [0~40], a dummy hmmptr vma, not mapped gpu
> [40~60], hmmptr vma mapped to gp
> [60~100], dummy, not mapped to gpu.
>
> Does this work for you? Or do you see a benefit of not splitting vma?
>
Thomas / Oak:
Agree for at least the initial implementation a 1 to 1 relationship will
be easier. Also unsure what the benefit of not splitting VMAs is?
That being said, I don't think this is something we need to decide now.
>
>
> and
> > with the patch discussed earlier we could've easily reused
> > xe_vm_unbind_vma() that only touches the mutable page-table state and
> > does the correct locking.
> >
> > The page table could would then typically take a const xe_vma *, and
> > and xe_pt_state *, or whatever we choose to call it. All xe_vmas except
> > hmmptr ones would have an 1:1 xe_vma <-> xe_pt_state relationship.
Thomas:
I like the idea of VMAs in the PT code function being marked as const
and having the xe_pt_state as non const. It makes ownership very clear.
Not sure how that will fit into [1] as that series passes around
a "struct xe_vm_ops" which is a list of "struct xe_vma_op". It does this
to make "struct xe_vm_ops" a single atomic operation. The VMAs are
extracted either the GPUVM base operation or "struct xe_vma_op". Maybe
these can be const? I'll look into that but this might not work out in
practice.
Agree also unsure how 1:N xe_vma <-> xe_pt_state relationship fits in
hmmptrs. Could you explain your thinking here?
> >
>
> Matt has POC codes which works for me for system allocator w/o this vma/pt_state splitting: https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-system-allocator/-/commits/system_allocator_poc?ref_type=heads
>
> Can you take a look also...
>
Thomas:
It might be helpful to look at that branch too, the last 3 patches [3]
[4] [5] implement a PoC system allocator design built on top of [1],
using existing userptrs for now, and is based on our discussions in the
system allocator design doc. Coding is sometimes the easiest way to
convay ideas and this roughly what I had in mind. Not complete by any
means and still tons of work to do to make this a working solution.
[4] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-system-allocator/-/commit/3fcc83c9b075364a9d83415ca73a9f9625543d7c
[5] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-system-allocator/-/commit/d2371e7c007406fb5e84e6605da238d12caa5c62
[6] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-system-allocator/-/commit/91cf8b38428ae9180c92435d3519193d074e837a
> When I worked on system allocator HLD, I pictured the invalid vma concept also. But somehow Matt was able to make it working without such concept....
>
Oak:
Still unclear on this. Maybe we can chat offline to clear this up.
Matt
> Oak
>
> > Thoughts, comments?
> >
> > Thanks,
> > Thomas
More information about the Intel-xe
mailing list