[PATCH 2/7] drm/xe: Count dwords before allocating

Tvrtko Ursulin tvrtko.ursulin at igalia.com
Fri Jul 4 10:20:08 UTC 2025


On 03/07/2025 23:41, Lucas De Marchi wrote:
> The bb allocation in emit_wa_job() is wrong in 2 ways: first it's
> allocating enough space for the 3DSTATE or hardcoding 4k depending on
> the engine. In the first case it doesn't account for the WAs and in the
> former it may not be sufficient. Secondly it's using the size instead of
> number of dwords, causing the buffer to be 4x bigger than needed.
> 
> While it's unlikely this is causing any real issue, let's calculate the
> needed space and allocate just enough.
> 
> Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
> ---
>   drivers/gpu/drm/xe/xe_gt.c | 33 +++++++++++++++++++++------------
>   1 file changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index d397df056e4cd..a926f560f2e36 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -189,16 +189,7 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q)
>   	long timeout;
>   	int count_rmw = 0;
>   	int count = 0;
> -
> -	if (q->hwe->class == XE_ENGINE_CLASS_RENDER)
> -		/* Big enough to emit all of the context's 3DSTATE */
> -		bb = xe_bb_new(gt, xe_gt_lrc_size(gt, q->hwe->class), false);
> -	else
> -		/* Just pick a large BB size */
> -		bb = xe_bb_new(gt, SZ_4K, false);
> -
> -	if (IS_ERR(bb))
> -		return PTR_ERR(bb);
> +	size_t bb_len;
>   
>   	/* count RMW registers as those will be handled separately */
>   	xa_for_each(&sr->xa, idx, entry) {
> @@ -211,8 +202,26 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q)
>   	if (count || count_rmw)
>   		xe_gt_dbg(gt, "LRC WA %s save-restore batch\n", sr->name);
>   
> +	bb_len = count * 2;

Shouldn't this be count * 2 + 1 ?

> +	if (count_rmw)
> +		bb_len += count_rmw * 20 + 7;

1)
This one is good. For both hardcoded value are a bit fragile maintenance 
wise but it can be improved later.

2)
Until now bb_len is in dwords, but below it is in bytes, right? In which 
case here there should be one, at this point exactly:

  bb_len *= sizeof(u32);

> +
> +	if (q->hwe->class == XE_ENGINE_CLASS_RENDER)
> +		/*
> +		 * Big enough to emit all of the context's 3DSTATE via
> +		 * xe_lrc_emit_hwe_state_instructions()
> +		 */
> +		bb_len += xe_gt_lrc_size(gt, q->hwe->class) / sizeof(u32);
> +
> +	/* Make sure accounting offsets downward is also aligned */
> +	bb_len = ALIGN(bb_len, SZ_4K);

I did not get what you mean by "accounting offsets downward"?

Regards,

Tvrtko

> +
> +	bb = xe_bb_new(gt, bb_len, false);
> +	if (IS_ERR(bb))
> +		return PTR_ERR(bb);
> +
>   	if (count) {
> -		/* emit single LRI with all non RMW regs */
> +		/* Emit single LRI with all non RMW regs: 2 dw per reg */
>   
>   		bb->cs[bb->len++] = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(count);
>   
> @@ -236,7 +245,7 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q)
>   	}
>   
>   	if (count_rmw) {
> -		/* emit MI_MATH for each RMW reg */
> +		/* Emit MI_MATH for each RMW reg: 20dw per reg + 7 trailing dw */
>   
>   		xa_for_each(&sr->xa, idx, entry) {
>   			if (entry->reg.masked || entry->clr_bits == ~0)
> 



More information about the Intel-xe mailing list