[PATCH] drm/xe: Do not allow scratch page on long running jobs

Lucas De Marchi lucas.demarchi at intel.com
Wed Nov 6 22:45:08 UTC 2024


On Thu, Sep 19, 2024 at 03:52:24PM +0000, Matthew Brost wrote:
>On Thu, Sep 19, 2024 at 12:47:04PM +0200, Nirmoy Das wrote:
>> This was agreed upon but not enforced, so enforce the limitation
>> of not allowing scratch pages on long-running jobs unless absolutely
>> required, which is currently the case for a DG2 SKU.
>>
>> Cc: Gwan-gyeong Mun <gwan-gyeong.mun at intel.com>
>> Cc: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
>> Cc: Matthew Brost <matthew.brost at intel.com>
>
>I think this type of change we will need a maintainer to bless and

Sorry, but nack as this will cause breakage on UMD.

>likely someone from the compute UMD.
>
>It does LGTM, with that:
>Acked-by: Matthew Brost <matthew.brost at intel.com>
>
>> Cc: Stuart Summers <stuart.summers at intel.com>
>> Cc: "Thomas Hellström" <thomas.hellstrom at linux.intel.com>
>> Suggested-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
>> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_vm.c | 16 +++++++++++++++-
>>  1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index a3d7cb7cfd22..92dc55f3479b 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -1711,6 +1711,11 @@ find_ufence_get(struct xe_sync_entry *syncs, u32 num_syncs)
>>  	return NULL;
>>  }
>>
>> +static bool xe_vm_device_requires_scratch_page(struct xe_device *xe)
>> +{
>> +	return XE_WA(xe_root_mmio_gt(xe), 14016763929);
>> +}
>> +
>>  #define ALL_DRM_XE_VM_CREATE_FLAGS (DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE | \
>>  				    DRM_XE_VM_CREATE_FLAG_LR_MODE | \
>>  				    DRM_XE_VM_CREATE_FLAG_FAULT_MODE)
>> @@ -1730,7 +1735,7 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
>>  	if (XE_IOCTL_DBG(xe, args->extensions))
>>  		return -EINVAL;
>>
>> -	if (XE_WA(xe_root_mmio_gt(xe), 14016763929))
>> +	if (xe_vm_device_requires_scratch_page(xe))
>>  		args->flags |= DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE;
>>
>>  	if (XE_IOCTL_DBG(xe, args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE &&
>> @@ -1747,6 +1752,15 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
>>  			 args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE))
>>  		return -EINVAL;
>>
>> +	/*
>> +	 * Scratch page is disabled for long running job unless absolutely
>> +	 * required as this can introduce hidden bugs.
>> +	 */
>> +	if (!xe_vm_device_requires_scratch_page(xe) &&
>> +	    XE_IOCTL_DBG(xe, args->flags & DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE &&
>> +			 (args->flags & DRM_XE_VM_CREATE_FLAG_LR_MODE)))

I don't think this should be imposed by kernel by simply breaking
userspace. Rather get our UMDs changed which should cover most of the
users. Possibly all if we consider long running jobs. Then much later if
we see there are no users we can consider something.

Lucas De Marchi


>> +		return -EINVAL;
>> +
>>  	if (XE_IOCTL_DBG(xe, !(args->flags & DRM_XE_VM_CREATE_FLAG_LR_MODE) &&
>>  			 args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE))
>>  		return -EINVAL;
>> --
>> 2.46.0
>>


More information about the Intel-xe mailing list