[PATCH] drm/xe: Add page queue multiplier

Wed Apr 23 17:05:25 UTC 2025

On Thu, Apr 17, 2025 at 09:00:50PM -0500, Lucas De Marchi wrote:
> On Tue, Apr 08, 2025 at 08:59:15AM -0700, Matthew Brost wrote:
> > For an unknown reason the math to determine the PF queue size does is
> > not correct - compute UMD applications are overflowing the PF queue
> > which is fatal. A multipplier of 8 fixes the problem.
> 
> Sorry, but that seems a terrible justification and I'm sure having it
> "fixed" here is just a recipe to let it forever like that. Could we
> spend a bit more time understanding the root-cause?  This showed up
> on my queue for drm-xe-fixes, but I'm dropping it. At least until next
> week in the hope we have more clarity.
>

I agree this is less than ideal. If I had to guess, each EU thread can
fault in parallel (i.e. upon EU thread fault, it context switches to
another EU which also faults...).

> > 
> > Fixes: 3338e4f90c14 ("drm/xe: Use topology to determine page fault queue size")
> > Cc: stable at vger.kernel.org
> > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_gt_pagefault.c | 11 +++++++++--
> > 1 file changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > index 9fa11e837dd1..10622ca471a2 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > @@ -435,9 +435,16 @@ static int xe_alloc_pf_queue(struct xe_gt *gt, struct pf_queue *pf_queue)
> > 	num_eus = bitmap_weight(gt->fuse_topo.eu_mask_per_dss,
> > 				XE_MAX_EU_FUSE_BITS) * num_dss;
> 
> So... for Xe2 we have eu_type == XE_GT_EU_TYPE_SIMD16. Would that be
> something into play here? You said 8 fixes the problem, but I don't see

I tried that but original authors of this code suggested this was
incorrect.

> anything about "does 2 fixes it? what about 4?"
> 

4 is reported number from the UMD which fixes this. I went with 8 to be
safe.

> Another thing is about 	"threads per-EU" as documented in
> e.g. bspec 67165.
> 

I'm not seeing anything about threads per-EU in this link.

> > 
> > -	/* user can issue separate page faults per EU and per CS */
> > +	/*
> > +	 * user can issue separate page faults per EU and per CS
> > +	 *
> > +	 * XXX: Multiplier required as compute UMD are getting PF queue errors
> > +	 * without it. Follow on why this multiplier is required.
> > +	 */
> > +#define PF_MULTIPLIER	8
> > 	pf_queue->num_dw =
> > -		(num_eus + XE_NUM_HW_ENGINES) * PF_MSG_LEN_DW;
> > +		(num_eus + XE_NUM_HW_ENGINES) * PF_MSG_LEN_DW * PF_MULTIPLIER;
> > +#undef PF_MULTIPLIER
> 
> First of all, that XE_NUM_HW_ENGINES seems weird as we are talking about
> engines that are not necessarily enabled. As that OR between
> fuse_topo.c_dss_mask and fuse_topo.g_dss_mask also look odd when the error
> you mentioned is in the compute umd and we document num_dw as
> "Dynamically calculated based on the number of **compute** resources".
> 

I did not originally author this code. I pinged the original authors on
teams and they suggested it was correct even though the number of EUs
mismatched RIL info... So I landed on quick fix that unblocks the UMDs.

Matt

> 
> Lucas De Marchi
> 
> > 
> > 	pf_queue->gt = gt;
> > 	pf_queue->data = devm_kcalloc(xe->drm.dev, pf_queue->num_dw,
> > -- 
> > 2.34.1
> >