[Intel-xe] [PATCH 1/1] drm/xe: timeout needs to be a signed value

Upadhyay, Tejas tejas.upadhyay at intel.com
Tue Sep 26 08:25:40 UTC 2023



> -----Original Message-----
> From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of
> fei.yang at intel.com
> Sent: Tuesday, September 26, 2023 9:43 AM
> To: intel-xe at lists.freedesktop.org
> Cc: Roper, Matthew D <matthew.d.roper at intel.com>; De Marchi, Lucas
> <lucas.demarchi at intel.com>
> Subject: [Intel-xe] [PATCH 1/1] drm/xe: timeout needs to be a signed value
> 
> From: Fei Yang <fei.yang at intel.com>
> 
> In xe_wait_user_fence_ioctl, the timeout is currently defined as unsigned
> long. That could potentially pass a negative value to the schedule_timeout()
> call because nsecs_to_jiffies() returns an unsigned long which gets used as
> signed long.
> 
> [ 187.732238] schedule_timeout: wrong timeout value fffffffffffffc18 [
> 187.733180] CPU: 0 PID: 792 Comm: test_thread_dim Tainted: G U 6.4.0-xe #1
> [ 187.735019] Call Trace:
> [ 187.735373] <TASK>
> [ 187.735687] dump_stack_lvl+0x92/0xb0
> [ 187.736193] schedule_timeout+0x348/0x430 [ 187.736739] ?
> __might_fault+0x67/0xd0 [ 187.737255] ? check_chain_key+0x224/0x2d0 [
> 187.737812] ? __pfx_schedule_timeout+0x10/0x10 [ 187.738429] ?
> __might_fault+0x6b/0xd0 [ 187.738946] ? __pfx_lock_release+0x10/0x10 [
> 187.739512] ? __pfx_lock_release+0x10/0x10 [ 187.740080]
> wait_woken+0x86/0x100 [ 187.740556]
> xe_wait_user_fence_ioctl+0x34b/0xe00 [xe] [ 187.741281] ?
> __pfx_xe_wait_user_fence_ioctl+0x10/0x10 [xe] [ 187.742075] ?
> lock_acquire+0x169/0x3d0 [ 187.742601] ? check_chain_key+0x224/0x2d0 [
> 187.743158] ? drm_dev_enter+0x9/0xe0 [drm] [ 187.743740] ?
> __pfx_woken_wake_function+0x10/0x10
> [ 187.744388] ? drm_dev_exit+0x11/0x50 [drm] [ 187.744969] ?
> __pfx_lock_release+0x10/0x10 [ 187.745536] ? __might_fault+0x67/0xd0 [
> 187.746052] ? check_chain_key+0x224/0x2d0 [ 187.746610]
> drm_ioctl_kernel+0x172/0x250 [drm] [ 187.747242] ?
> __pfx_xe_wait_user_fence_ioctl+0x10/0x10 [xe] [ 187.748037] ?
> __pfx_drm_ioctl_kernel+0x10/0x10 [drm] [ 187.748729] ?
> __pfx_xe_wait_user_fence_ioctl+0x10/0x10 [xe] [ 187.749524] ?
> __pfx_xe_wait_user_fence_ioctl+0x10/0x10 [xe] [ 187.750319]
> drm_ioctl+0x35e/0x620 [drm] [ 187.750871] ? __pfx_drm_ioctl+0x10/0x10
> [drm] [ 187.751495] ? restore_fpregs_from_fpstate+0x99/0x140
> [ 187.752172] ? __pfx_restore_fpregs_from_fpstate+0x10/0x10
> [ 187.752901] ? mark_held_locks+0x24/0x90 [ 187.753438]
> __x64_sys_ioctl+0xb4/0xf0 [ 187.753954] do_syscall_64+0x3f/0x90 [
> 187.754450] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [ 187.755127] RIP: 0033:0x7f4e6651aaff
> [ 187.755623] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00
> 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89
> c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 [ 187.757995] RSP:
> 002b:00007fff05f37a50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [
> 187.758995] RAX: ffffffffffffffda RBX: 000055eca47c8130 RCX:
> 00007f4e6651aaff [ 187.759935] RDX: 00007fff05f37b60 RSI:
> 00000000c050644b RDI: 0000000000000004 [ 187.760874] RBP:
> 0000000000000017 R08: 0000000000000017 R09: 7fffffffffffffff [ 187.761814]
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [
> 187.762753] R13: 0000000000000000 R14: 0000000000000000 R15:
> 00007f4e65d19ce0 [ 187.763694] </TASK>
> 
> Fixes: 9f801f10c0d6 ("drm/xe: Use nanoseconds instead of jiffies in uapi for
> user fence")
> Signed-off-by: Fei Yang <fei.yang at intel.com>
> Reviewed-by: Andi Shyti <andi.shyti at linux.intel.com>
> Cc: Zbigniew Kempczyński <zbigniew.kempczynski at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_wait_user_fence.c | 55 +++++++++++++++++--------
>  1 file changed, 37 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_wait_user_fence.c
> b/drivers/gpu/drm/xe/xe_wait_user_fence.c
> index 761eed3a022f..3ac4cd24d5b4 100644
> --- a/drivers/gpu/drm/xe/xe_wait_user_fence.c
> +++ b/drivers/gpu/drm/xe/xe_wait_user_fence.c
> @@ -85,17 +85,45 @@ static int check_hw_engines(struct xe_device *xe,
>  			 DRM_XE_UFENCE_WAIT_VM_ERROR)
>  #define MAX_OP		DRM_XE_UFENCE_WAIT_LTE
> 
> -static unsigned long to_jiffies_timeout(struct drm_xe_wait_user_fence *args)
> +static long to_jiffies_timeout(struct xe_device *xe,
> +			       struct drm_xe_wait_user_fence *args)

What is the use of "xe" passed here in param?
 
>  {
> -	unsigned long timeout;
> +	unsigned long long t;
> +	long timeout;
> 
> -	if (args->flags & DRM_XE_UFENCE_WAIT_ABSTIME)
> -		return drm_timeout_abs_to_jiffies(args->timeout);
> +	/*
> +	 * For negative timeout we want to wait "forever" by setting
> +	 * MAX_SCHEDULE_TIMEOUT. But we have to assign this value also
> +	 * to args->timeout to avoid being zeroed on the signal delivery
> +	 * (see arithmetics after wait).
> +	 */
> +	if (args->timeout < 0) {
> +		args->timeout = MAX_SCHEDULE_TIMEOUT;
> +		return MAX_SCHEDULE_TIMEOUT;
> +	}
> 
> -	if (args->timeout == MAX_SCHEDULE_TIMEOUT || args->timeout ==
> 0)
> -		return args->timeout;
> +	if (args->timeout == 0)
> +		return 0;
> 
> -	timeout = nsecs_to_jiffies(args->timeout);
> +	/*
> +	 * Save the timeout to an u64 variable because nsecs_to_jiffies
> +	 * might return a value that overflows s32 variable.
> +	 */
> +	if (args->flags & DRM_XE_UFENCE_WAIT_ABSTIME)
> +		t = drm_timeout_abs_to_jiffies(args->timeout);
> +	else
> +		t = nsecs_to_jiffies(args->timeout);
> +
> +	/*
> +	 * Anything greater then MAX_SCHEDULE_TIMEOUT is meaningless,
> +	 * also we don't want to cap it at MAX_SCHEDULE_TIMEOUT because
> +	 * apparently user doesn't mean to wait forever, otherwise the
> +	 * args->timeout should have been set to a negative value.
> +	 */
> +	if (t > MAX_SCHEDULE_TIMEOUT)
> +		timeout = MAX_SCHEDULE_TIMEOUT - 1;

Does LONG_MAX hit overflow here?

Thanks,
Tejas

> +	else
> +		timeout = t;
> 
>  	return timeout ?: 1;
>  }
> @@ -114,7 +142,7 @@ int xe_wait_user_fence_ioctl(struct drm_device *dev,
> void *data,
>  	int err;
>  	bool no_engines = args->flags & DRM_XE_UFENCE_WAIT_SOFT_OP
> ||
>  		args->flags & DRM_XE_UFENCE_WAIT_VM_ERROR;
> -	unsigned long timeout;
> +	long timeout;
>  	ktime_t start;
> 
>  	if (XE_IOCTL_DBG(xe, args->extensions) || XE_IOCTL_DBG(xe, args-
> >pad) || @@ -169,16 +197,7 @@ int xe_wait_user_fence_ioctl(struct
> drm_device *dev, void *data,
>  		addr = vm->async_ops.error_capture.addr;
>  	}
> 
> -	/*
> -	 * For negative timeout we want to wait "forever" by setting
> -	 * MAX_SCHEDULE_TIMEOUT. But we have to assign this value also
> -	 * to args->timeout to avoid being zeroed on the signal delivery
> -	 * (see arithmetics after wait).
> -	 */
> -	if (args->timeout < 0)
> -		args->timeout = MAX_SCHEDULE_TIMEOUT;
> -
> -	timeout = to_jiffies_timeout(args);
> +	timeout = to_jiffies_timeout(xe, args);
> 
>  	start = ktime_get();
> 
> --
> 2.25.1



More information about the Intel-xe mailing list