Cross-device and cross-driver HMM support

Wed Apr 3 09:16:36 UTC 2024

Am 03.04.24 um 00:57 schrieb Dave Airlie:
> On Wed, 27 Mar 2024 at 19:52, Thomas Hellström
> <thomas.hellstrom at linux.intel.com> wrote:
>> Hi!
>>
>> With our SVM mirror work we'll soon start looking at HMM cross-device
>> support. The identified needs are
>>
>> 1) Instead of migrating foreign device memory to system when the
>> current device is faulting, leave it in place...
>> 1a) for access using internal interconnect,
>> 1b) for access using PCIE p2p (probably mostly as a reference)

I still agree with Sima that we won't see P2P based on HMM between 
devices anytime soon if ever.

The basic problem is that you are missing a lot of fundamental inter 
device infrastructure.

E.g. there is no common representation of DMA addresses with address 
spaces. In other words you need to know the device which does DMA for an 
address to make sense.

Additional to that we don't have a representation for internal 
connections, e.g. the common kernel has no idea that device A and device 
B can talk directly to each other, but not with device C.

>>
>> 2) Request a foreign device to migrate memory range a..b of a CPU
>> mm_struct to local shareable device memory on that foreign device.
>>
>> and we plan to add an infrastructure for this. Probably this can be
>> done initially without too much (or any) changes to the hmm code
>> itself.
>>
>> So the question is basically whether anybody is interested in a
>> drm-wide solution for this and in that case also whether anybody sees
>> the need for cross-driver support?

We have use cases for this as well, yes.

For now XGMI support is something purely AMDGPU internal, but 
essentially we would like to have that as common framework so that NICs 
and other devices could interconnect as well.

>>
>> Otherwise any objections against us starting out with an xe driver
>> helper implementation that could be lifted to drm-level when needed?
> I think you'd probably have a better chance of getting others to help
> review it, if we started out outside the driver as much as possible.

Yeah, completely agree. Especially we need to start with infrastructure 
and not some in driver hack, we already have the later and it's clearly 
a dead end.

Regards,
Christian.

>
> I don't think gpuvm would have worked out as well if we'd just kept it
> inside nouveau from the start, it at least forces you to think about
> what should be driver specific here.
>
>> Finally any suggestions or pointers to existing solutions for this?
> I think nvidia's uvm might have some of this type of code, but no idea
> how you'd even consider starting to use it as a reference,
>
> Dave.