[PATCH 2/7] drm/xe: Count dwords before allocating
Tvrtko Ursulin
tvrtko.ursulin at igalia.com
Fri Jul 4 10:20:08 UTC 2025
On 03/07/2025 23:41, Lucas De Marchi wrote:
> The bb allocation in emit_wa_job() is wrong in 2 ways: first it's
> allocating enough space for the 3DSTATE or hardcoding 4k depending on
> the engine. In the first case it doesn't account for the WAs and in the
> former it may not be sufficient. Secondly it's using the size instead of
> number of dwords, causing the buffer to be 4x bigger than needed.
>
> While it's unlikely this is causing any real issue, let's calculate the
> needed space and allocate just enough.
>
> Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
> ---
> drivers/gpu/drm/xe/xe_gt.c | 33 +++++++++++++++++++++------------
> 1 file changed, 21 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index d397df056e4cd..a926f560f2e36 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -189,16 +189,7 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q)
> long timeout;
> int count_rmw = 0;
> int count = 0;
> -
> - if (q->hwe->class == XE_ENGINE_CLASS_RENDER)
> - /* Big enough to emit all of the context's 3DSTATE */
> - bb = xe_bb_new(gt, xe_gt_lrc_size(gt, q->hwe->class), false);
> - else
> - /* Just pick a large BB size */
> - bb = xe_bb_new(gt, SZ_4K, false);
> -
> - if (IS_ERR(bb))
> - return PTR_ERR(bb);
> + size_t bb_len;
>
> /* count RMW registers as those will be handled separately */
> xa_for_each(&sr->xa, idx, entry) {
> @@ -211,8 +202,26 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q)
> if (count || count_rmw)
> xe_gt_dbg(gt, "LRC WA %s save-restore batch\n", sr->name);
>
> + bb_len = count * 2;
Shouldn't this be count * 2 + 1 ?
> + if (count_rmw)
> + bb_len += count_rmw * 20 + 7;
1)
This one is good. For both hardcoded value are a bit fragile maintenance
wise but it can be improved later.
2)
Until now bb_len is in dwords, but below it is in bytes, right? In which
case here there should be one, at this point exactly:
bb_len *= sizeof(u32);
> +
> + if (q->hwe->class == XE_ENGINE_CLASS_RENDER)
> + /*
> + * Big enough to emit all of the context's 3DSTATE via
> + * xe_lrc_emit_hwe_state_instructions()
> + */
> + bb_len += xe_gt_lrc_size(gt, q->hwe->class) / sizeof(u32);
> +
> + /* Make sure accounting offsets downward is also aligned */
> + bb_len = ALIGN(bb_len, SZ_4K);
I did not get what you mean by "accounting offsets downward"?
Regards,
Tvrtko
> +
> + bb = xe_bb_new(gt, bb_len, false);
> + if (IS_ERR(bb))
> + return PTR_ERR(bb);
> +
> if (count) {
> - /* emit single LRI with all non RMW regs */
> + /* Emit single LRI with all non RMW regs: 2 dw per reg */
>
> bb->cs[bb->len++] = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(count);
>
> @@ -236,7 +245,7 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q)
> }
>
> if (count_rmw) {
> - /* emit MI_MATH for each RMW reg */
> + /* Emit MI_MATH for each RMW reg: 20dw per reg + 7 trailing dw */
>
> xa_for_each(&sr->xa, idx, entry) {
> if (entry->reg.masked || entry->clr_bits == ~0)
>
More information about the Intel-xe
mailing list