[PATCH] drm/xe/uapi: Add DRM_XE_QUERY_CONFIG_FLAG_HAS_DIS_NULL_QUERY

Fri Jun 27 18:15:10 UTC 2025

On Fri, Jun 27, 2025 at 10:57:54AM -0700, Matthew Brost wrote:

Opps, typo.

> On Fri, Jun 27, 2025 at 03:37:34AM -0600, Upadhyay, Tejas wrote:
> > 
> > 
> > > -----Original Message-----
> > > From: Gote, Nitin R <nitin.r.gote at intel.com>
> > > Sent: 27 June 2025 13:44
> > > To: Brost, Matthew <matthew.brost at intel.com>
> > > Cc: Vivi, Rodrigo <rodrigo.vivi at intel.com>; Upadhyay, Tejas
> > > <tejas.upadhyay at intel.com>; intel-xe at lists.freedesktop.org;
> > > joonas.lahtinen at linux.intel.com; Ghimiray, Himal Prasad
> > > <himal.prasad.ghimiray at intel.com>
> > > Subject: RE: [PATCH] drm/xe/uapi: Add
> > > DRM_XE_QUERY_CONFIG_FLAG_HAS_DIS_NULL_QUERY
> > > 
> > > Hi Matt,
> > > >
> > > > Hi Matt and Rodrigo,
> > > >
> > > > > On Wed, Jun 25, 2025 at 03:28:05PM +0530, Nitin Gote wrote:
> > > > > > Add the DRM_XE_QUERY_CONFIG_FLAG_HAS_DIS_NULL_QUERY device
> > > > query
> > > > > flag
> > > > > > which indicates whether a device supports DIS_NULL_QUERY (Disable
> > > > > > null anyhit shader query mechanism). The intent is for UMDs to use
> > > > > > this query and opt-in DRM_XE_EXEC_QUEUE_DIS_NULL_QUERY flag to
> > > > > > disable null query mechanism for anyhit shader by setting
> > > > > > DIS_NULL_QUERY bit of RT_CTRL register for Xe2.
> > > > > >
> > > > > > v2:
> > > > > >    - Use xe_rtp_match_first_render_or_compute() api to check
> > > > > >      render_or_compute. (Tejas)
> > > > > >    - Validate args->flags (Tejas/Matthew)
> > > > > >    - Add proper kernel-doc for both
> > > > DRM_XE_EXEC_QUEUE_DIS_NULL_QUERY
> > > > > >      and DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT. (Matthew)
> > > > >
> > > > > Matt asked for full spelling of DISABLE, no?!
> > > >
> > > > Thank you for clarifying the uAPI naming convention.
> > > > I will update it.
> > > >
> > > > >
> > > > > Also, where are the UMD patches/merge-requests for this? And their ack?
> > > > >
> > > > From the Mesa team (Palli, Tapani) is going to send the corresponding
> > > > Mesa patch.
> > > > Meanwhile, I floated this patch.
> > > >
> > > > > Thanks,
> > > > > Rodrigo.
> > > > >
> > > > > >
> > > > > > Signed-off-by: Nitin Gote <nitin.r.gote at intel.com>
> > > > > > ---
> > > > > >  drivers/gpu/drm/xe/xe_exec_queue.c | 11 ++++++++++-
> > > > > >  drivers/gpu/drm/xe/xe_query.c      |  3 ++-
> > > > > >  include/uapi/drm/xe_drm.h          | 12 ++++++++++++
> > > > > >  3 files changed, 24 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > index fee22358cc09..ef8b49d2242a 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > > > > @@ -26,6 +26,9 @@
> > > > > >  #include "xe_trace.h"
> > > > > >  #include "xe_vm.h"
> > > > > >  #include "xe_pxp.h"
> > > > > > +#include "xe_gt_mcr.h"
> > > > > > +#include "regs/xe_gt_regs.h"
> > > > > > +#include "xe_rtp.h"
> > > > > >
> > > > > >  enum xe_exec_queue_sched_prop {
> > > > > >  	XE_EXEC_QUEUE_JOB_TIMEOUT = 0,
> > > > > > @@ -597,7 +600,8 @@ int xe_exec_queue_create_ioctl(struct
> > > > > > drm_device
> > > > > *dev, void *data,
> > > > > >  	u32 len;
> > > > > >  	int err;
> > > > > >
> > > > > > -	if (XE_IOCTL_DBG(xe, args->flags &
> > > > > ~DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT) ||
> > > > > > +	if (XE_IOCTL_DBG(xe, args->flags &
> > > > > > +	    ~(DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT |
> > > > > > +DRM_XE_EXEC_QUEUE_DIS_NULL_QUERY)) ||
> > > > > >  	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
> > > > > >  		return -EINVAL;
> > > > > >
> > > > > > @@ -693,6 +697,11 @@ int xe_exec_queue_create_ioctl(struct
> > > > > > drm_device
> > > > > *dev, void *data,
> > > > > >  		}
> > > > > >  	}
> > > > > >
> > > > > > +	if (((GRAPHICS_VER(xe) >= 20) && (GRAPHICS_VER(xe) < 30)) &&
> > > > > > +	    (args->flags & DRM_XE_EXEC_QUEUE_DIS_NULL_QUERY) &&
> > > > > > +	    xe_rtp_match_first_render_or_compute(q->gt, q->hwe))
> > > > > > +		xe_gt_mcr_multicast_write(q->gt, RT_CTRL,
> > > DIS_NULL_QUERY);
> > > > > > +
> > > >
> > > > Also, as per Matt's suggestion in rev1, I'm exploring the WA BB
> > > > program to set this.
> > > >
> > > 
> > > If we move this implementation from the exec queue creation time to
> > > the submission time and write a WW BB program, then this register will be set
> > > for every exec queue submission.
> > > Let's say if submission happens 3 times, then register gets written 3 times,
> > > which seems unnecessary.
> > > 
> 
> No, completetly unnecessary, we have been over this. See below.
> 

s/unnecessary/necessary

Matt

> > > So, having implementation in the exec queue create time, the register will be
> > > updated only once, and we can disable it in the exec queue destroy.
> > > 
> > > What do you think? Could you please suggest?
> > 
> > Hi Nitin/Matt,
> > 
> > Are we talking below cases,
> > 
> > (if kept in WA_BB) :
> > User1 --> submits on rcs0 -> register will be updated for rcs0 (User2 will not be affected)
> > User2--> submits at the same time on rcs1 -> register will be updated for only rcs1 (User1 will not be affected)
> > 
> > (if kept in exec_queue_create/destroy) :
> > User1 --> creates exec_queue (register is written(not sure on which instance of rcs?) --> submits on rcs0 (User2 will be impacted if used same instance as User1, which isn't possible)
> > User2--> creates exec_queue (register is written(not sure on which instance of rcs, but for sure not the instance User1 is using?) -> submits at the same time on rcs1  (User1 will be impacted if used same instance as User2, which isn't possible)
> > 
> > If above cases are true then both look same, isn't it? Unless I am missing something.
> > 
> 
> Tejas — I'm not really following your examples, but no, writing at exec
> queue creation time is not the same as writing a WA BB.
> 
> This is a very basic process isolation concept that I would expect all
> KMD developers to understand. If not, then please take the time to learn
> it — this isn't something I should have to explain repeatedly. A quick
> Google search or a query to ChatGPT will explain process isolation quite
> well. With just four queries, I got an explanation for this exact
> problem, including why this must be part of the context state (i.e., the
> WA BB), since work gets switched on the hardware.
> 
> Here were my chatgpt queries:
> 
> "what is process isolation"
> 
> "in a computer chip why can't you set a global register wrt to process
> isolation"
> 
> "more in the context the register changes behavior of computer chip
> rather than data sharing"
> 
> "say in the context of kernel submitting work on behalf of a UMD which
> changes global register before each submit, why is that a bad idea"
> 
> Anyway, let me give an example of why the current implementation will
> not work:
> 
> Example:
> User1 wants DRM_XE_QUERY_CONFIG_FLAG_HAS_DIS_NULL_QUERY set.
> User2 wants DRM_XE_QUERY_CONFIG_FLAG_HAS_DIS_NULL_QUERY cleared.
> 
> If a global register is modified at exec queue creation time (i.e., it
> affects the entire GPU’s settings), then both users cannot have
> different behaviors — and we’re now broken.
> 
> The only way for User1 and User2 to have different settings is to make
> this part of the context state (WA BB), which dynamically modifies the
> registers as work from each user is scheduled on the hardware.
> 
> Matt
> 
> > Tejas
> > > 
> > > Thanks,
> > > Nitin
> > > 
> > > > > >  	q->xef = xe_file_get(xef);
> > > > > >
> > > > > >  	/* user id alloc must always be last in ioctl to prevent UAF */
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_query.c
> > > > > > b/drivers/gpu/drm/xe/xe_query.c index e8e1743dcb1e..c5002b8c9ac0
> > > > > > 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_query.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_query.c
> > > > > > @@ -344,7 +344,8 @@ static int query_config(struct xe_device *xe,
> > > > > > struct
> > > > > drm_xe_device_query *query)
> > > > > >  		config->info[DRM_XE_QUERY_CONFIG_FLAGS] |=
> > > > > >
> > > > > 	DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR;
> > > > > >  	config->info[DRM_XE_QUERY_CONFIG_FLAGS] |=
> > > > > > -			DRM_XE_QUERY_CONFIG_FLAG_HAS_LOW_LATENCY;
> > > > > > +
> > > 	(DRM_XE_QUERY_CONFIG_FLAG_HAS_LOW_LATENCY
> > > > > |
> > > > > > +
> > > > > 	DRM_XE_QUERY_CONFIG_FLAG_HAS_DIS_NULL_QUERY);
> > > > > >  	config->info[DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT] =
> > > > > >  		xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K ? SZ_64K :
> > > > > SZ_4K;
> > > > > >  	config->info[DRM_XE_QUERY_CONFIG_VA_BITS] = xe->info.va_bits;
> > > > > > diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> > > > > > index 8e8bbdec8c5c..7c30e707346c 100644
> > > > > > --- a/include/uapi/drm/xe_drm.h
> > > > > > +++ b/include/uapi/drm/xe_drm.h
> > > > > > @@ -397,6 +397,8 @@ struct drm_xe_query_mem_regions {
> > > > > >   *      has low latency hint support
> > > > > >   *    - %DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR - Flag
> > > is
> > > > > set if the
> > > > > >   *      device has CPU address mirroring support
> > > > > > + *    - %DRM_XE_QUERY_CONFIG_FLAG_HAS_DIS_NULL_QUERY - Flag is
> > > set
> > > > if
> > > > > the
> > > > > > + *      device has null query support for anyhit shader.
> > > > > >   *  - %DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT - Minimal memory
> > > > > alignment
> > > > > >   *    required by this device, typically SZ_4K or SZ_64K
> > > > > >   *  - %DRM_XE_QUERY_CONFIG_VA_BITS - Maximum bits of a virtual
> > > > > > address @@ -415,6 +417,7 @@ struct drm_xe_query_config {
> > > > > >  	#define DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM	(1 << 0)
> > > > > >  	#define DRM_XE_QUERY_CONFIG_FLAG_HAS_LOW_LATENCY	(1 <<
> > > > > 1)
> > > > > >  	#define DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR
> > > > > 	(1 << 2)
> > > > > > +	#define DRM_XE_QUERY_CONFIG_FLAG_HAS_DIS_NULL_QUERY
> > > 	(1 <<
> > > > > 3)
> > > > > >  #define DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT		2
> > > > > >  #define DRM_XE_QUERY_CONFIG_VA_BITS			3
> > > > > >  #define DRM_XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY	4
> > > > > > @@ -1270,7 +1273,16 @@ struct drm_xe_exec_queue_create {
> > > > > >  	/** @vm_id: VM to use for this exec queue */
> > > > > >  	__u32 vm_id;
> > > > > >
> > > > > > +	/** DRM_XE_QUERY_CONFIG_FLAG_HAS_LOW_LATENCY - \
> > > > > > +	 *	Flag is set if the device has low latency hint support
> > > > > > +	 */
> > > > > >  #define DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT	(1 << 0)
> > > > > > +
> > > > > > +	/** DRM_XE_EXEC_QUEUE_DIS_NULL_QUERY - \
> > > > > > +	 *	flag is use to disable null query check for Anyhit shader
> > > > > > +	 */
> > > > > > +#define DRM_XE_EXEC_QUEUE_DIS_NULL_QUERY	(1 << 1)
> > > > > > +
> > > > > >  	/** @flags: flags to use for this exec queue */
> > > > > >  	__u32 flags;
> > > > > >
> > > > > > --
> > > > > > 2.25.1
> > > > > >