[PATCH 2/7] drm/xe: Count dwords before allocating
Lucas De Marchi
lucas.demarchi at intel.com
Mon Jul 7 03:30:11 UTC 2025
On Fri, Jul 04, 2025 at 11:20:08AM +0100, Tvrtko Ursulin wrote:
>
>On 03/07/2025 23:41, Lucas De Marchi wrote:
>>The bb allocation in emit_wa_job() is wrong in 2 ways: first it's
>>allocating enough space for the 3DSTATE or hardcoding 4k depending on
>>the engine. In the first case it doesn't account for the WAs and in the
>>former it may not be sufficient. Secondly it's using the size instead of
>>number of dwords, causing the buffer to be 4x bigger than needed.
>>
>>While it's unlikely this is causing any real issue, let's calculate the
>>needed space and allocate just enough.
>>
>>Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
>>---
>> drivers/gpu/drm/xe/xe_gt.c | 33 +++++++++++++++++++++------------
>> 1 file changed, 21 insertions(+), 12 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
>>index d397df056e4cd..a926f560f2e36 100644
>>--- a/drivers/gpu/drm/xe/xe_gt.c
>>+++ b/drivers/gpu/drm/xe/xe_gt.c
>>@@ -189,16 +189,7 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q)
>> long timeout;
>> int count_rmw = 0;
>> int count = 0;
>>-
>>- if (q->hwe->class == XE_ENGINE_CLASS_RENDER)
>>- /* Big enough to emit all of the context's 3DSTATE */
>>- bb = xe_bb_new(gt, xe_gt_lrc_size(gt, q->hwe->class), false);
>>- else
>>- /* Just pick a large BB size */
>>- bb = xe_bb_new(gt, SZ_4K, false);
>>-
>>- if (IS_ERR(bb))
>>- return PTR_ERR(bb);
>>+ size_t bb_len;
>> /* count RMW registers as those will be handled separately */
>> xa_for_each(&sr->xa, idx, entry) {
>>@@ -211,8 +202,26 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q)
>> if (count || count_rmw)
>> xe_gt_dbg(gt, "LRC WA %s save-restore batch\n", sr->name);
>>+ bb_len = count * 2;
>
>Shouldn't this be count * 2 + 1 ?
right, but then we also need a `if (count)`, like we have for count_rmw.
>
>>+ if (count_rmw)
>>+ bb_len += count_rmw * 20 + 7;
>
>1)
>This one is good. For both hardcoded value are a bit fragile
>maintenance wise but it can be improved later.
>
>2)
>Until now bb_len is in dwords, but below it is in bytes, right? In
>which case here there should be one, at this point exactly:
>
> bb_len *= sizeof(u32);
not at this point exactly... xe_bb_new() arg is number of dwords.
What we are missing is to make sure the size is 4k aligned...
>
>>+
>>+ if (q->hwe->class == XE_ENGINE_CLASS_RENDER)
>>+ /*
>>+ * Big enough to emit all of the context's 3DSTATE via
>>+ * xe_lrc_emit_hwe_state_instructions()
>>+ */
>>+ bb_len += xe_gt_lrc_size(gt, q->hwe->class) / sizeof(u32);
>>+
>>+ /* Make sure accounting offsets downward is also aligned */
>>+ bb_len = ALIGN(bb_len, SZ_4K);
>
>I did not get what you mean by "accounting offsets downward"?
we use e.g. `xe_bo_size(lrc->bo) - 4K` to get the offset to "the last
page" in the buffer. However if the overall size is not a multiple, it
will return a non-aligned value which is not allowed here: the
bb_per_ctx_ptr for example needs to be 4k-aligned.
thanks
Lucas De Marchi
>
>Regards,
>
>Tvrtko
>
>>+
>>+ bb = xe_bb_new(gt, bb_len, false);
>>+ if (IS_ERR(bb))
>>+ return PTR_ERR(bb);
>>+
>> if (count) {
>>- /* emit single LRI with all non RMW regs */
>>+ /* Emit single LRI with all non RMW regs: 2 dw per reg */
>> bb->cs[bb->len++] = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(count);
>>@@ -236,7 +245,7 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q)
>> }
>> if (count_rmw) {
>>- /* emit MI_MATH for each RMW reg */
>>+ /* Emit MI_MATH for each RMW reg: 20dw per reg + 7 trailing dw */
>> xa_for_each(&sr->xa, idx, entry) {
>> if (entry->reg.masked || entry->clr_bits == ~0)
>>
>
More information about the Intel-xe
mailing list