[PATCH v10 5/5] drm/xe/xe_vm: Implement xe_vm_get_property_ioctl

Tue Mar 25 23:36:14 UTC 2025

On Tue, Mar 25, 2025 at 08:14:13PM +0530, Cavitt, Jonathan wrote:
> From: Jadav, Raag <raag.jadav at intel.com> 
> > On Tue, Mar 25, 2025 at 03:01:27AM +0530, Cavitt, Jonathan wrote:
> > > From: Jadav, Raag <raag.jadav at intel.com> 
> > > > On Mon, Mar 24, 2025 at 10:27:08PM +0530, Cavitt, Jonathan wrote:
> > > > > From: Jadav, Raag <raag.jadav at intel.com> 
> > > > > > On Thu, Mar 20, 2025 at 03:26:15PM +0000, Jonathan Cavitt wrote:
> > > > > > > Add support for userspace to request a list of observed faults
> > > > > > > from a specified VM.
> > > > > > 
> > > > > > ...
> > > > > > 
> > > > > > > +static int xe_vm_get_property_size(struct xe_vm *vm, u32 property)
> > > > > > > +{
> > > > > > > +	int size = -EINVAL;
> > > > > > 
> > > > > > Mixing size and error codes is usually received with mixed feelings.
> > > > > > 
> > > > > > > +
> > > > > > > +	switch (property) {
> > > > > > > +	case DRM_XE_VM_GET_PROPERTY_FAULTS:
> > > > > > > +		spin_lock(&vm->faults.lock);
> > > > > > > +		size = vm->faults.len * sizeof(struct xe_vm_fault);
> > > > > > 
> > > > > > size_mul() and,
> > > > > > [1] perhaps fill it up into the pointer passed by the caller here?
> > > > > 
> > > > > "The pointer passed by the caller".  You mean the args pointer?
> > > > > 
> > > > > We'd still need to check that the args->size value is empty here before overwriting
> > > > > it, and we'd also still need to return the size to the ioctl so we can verify it's
> > > > > acceptable later in xe_vm_get_property_verify_size.
> > > > > 
> > > > > Unless you want to merge those two processes together into here?
> > > > 
> > > > The semantics are a bit fuzzy to me. Why do we have a single ioctl for
> > > > two different processes? Shouldn't they be handled separately?
> > > 
> > > No.  Sorry.  Let me clarify.
> > > "two different processes" = getting the size + verifying the size.
> > 
> > Yes, which seems like they should be handlded with _FAULT_NUM and
> > _FAULT_DATA ioctls but I guess we're way past it now.
> 
> The current implementation mirrors xe_query.  Should we have separate
> queries for getting the size of the query data and getting the data itself
> in xe_query?

Let's not break a well established API.

> And just to preempt the question: this cannot be an xe_query because
> the size of the returned data depends on the target VM, which cannot
> be passed to the xe_query structure on the first pass when calculating
> the size.  And just reporting the maximum possible size was rejected
> separately. 

Sure, makes sense.

> > I'm also not much informed about the history here. Is there a real
> > usecase behind exposing them? What is the user expected to do with
> > this information?
> 
> This is a request from Vulkan, and is necessary to satisfy the requirements
> for one of their interfaces.  Specifically,
> https://registry.khronos.org/vulkan/specs/latest/man/html/VK_EXT_device_fault.html

It says this should be a subsequence of device lost. What is the criteria
for it wrt xe?

A big enough fault will probably result in a coredump. So why not just
reuse it?

Raag