[PATCH 2/7] drm/xe: Count dwords before allocating

Lucas De Marchi lucas.demarchi at intel.com
Mon Jul 7 23:09:56 UTC 2025


On Sun, Jul 06, 2025 at 10:30:11PM -0500, Lucas De Marchi wrote:
>On Fri, Jul 04, 2025 at 11:20:08AM +0100, Tvrtko Ursulin wrote:
>>
>>On 03/07/2025 23:41, Lucas De Marchi wrote:
>>>The bb allocation in emit_wa_job() is wrong in 2 ways: first it's
>>>allocating enough space for the 3DSTATE or hardcoding 4k depending on
>>>the engine. In the first case it doesn't account for the WAs and in the
>>>former it may not be sufficient. Secondly it's using the size instead of
>>>number of dwords, causing the buffer to be 4x bigger than needed.
>>>
>>>While it's unlikely this is causing any real issue, let's calculate the
>>>needed space and allocate just enough.
>>>
>>>Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
>>>---
>>> drivers/gpu/drm/xe/xe_gt.c | 33 +++++++++++++++++++++------------
>>> 1 file changed, 21 insertions(+), 12 deletions(-)
>>>
>>>diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
>>>index d397df056e4cd..a926f560f2e36 100644
>>>--- a/drivers/gpu/drm/xe/xe_gt.c
>>>+++ b/drivers/gpu/drm/xe/xe_gt.c
>>>@@ -189,16 +189,7 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q)
>>> 	long timeout;
>>> 	int count_rmw = 0;
>>> 	int count = 0;
>>>-
>>>-	if (q->hwe->class == XE_ENGINE_CLASS_RENDER)
>>>-		/* Big enough to emit all of the context's 3DSTATE */
>>>-		bb = xe_bb_new(gt, xe_gt_lrc_size(gt, q->hwe->class), false);
>>>-	else
>>>-		/* Just pick a large BB size */
>>>-		bb = xe_bb_new(gt, SZ_4K, false);
>>>-
>>>-	if (IS_ERR(bb))
>>>-		return PTR_ERR(bb);
>>>+	size_t bb_len;
>>> 	/* count RMW registers as those will be handled separately */
>>> 	xa_for_each(&sr->xa, idx, entry) {
>>>@@ -211,8 +202,26 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q)
>>> 	if (count || count_rmw)
>>> 		xe_gt_dbg(gt, "LRC WA %s save-restore batch\n", sr->name);
>>>+	bb_len = count * 2;
>>
>>Shouldn't this be count * 2 + 1 ?
>
>right, but then we also need a `if (count)`, like we have for count_rmw.
>
>>
>>>+	if (count_rmw)
>>>+		bb_len += count_rmw * 20 + 7;
>>
>>1)
>>This one is good. For both hardcoded value are a bit fragile 
>>maintenance wise but it can be improved later.
>>
>>2)
>>Until now bb_len is in dwords, but below it is in bytes, right? In 
>>which case here there should be one, at this point exactly:
>>
>>bb_len *= sizeof(u32);
>
>not at this point exactly... xe_bb_new() arg is number of dwords.

actually not at this point and not below neither. the 4k align is
completely bogus there.

>What we are missing is to make sure the size is 4k aligned...
>
>>
>>>+
>>>+	if (q->hwe->class == XE_ENGINE_CLASS_RENDER)
>>>+		/*
>>>+		 * Big enough to emit all of the context's 3DSTATE via
>>>+		 * xe_lrc_emit_hwe_state_instructions()
>>>+		 */
>>>+		bb_len += xe_gt_lrc_size(gt, q->hwe->class) / sizeof(u32);
>>>+
>>>+	/* Make sure accounting offsets downward is also aligned */
>>>+	bb_len = ALIGN(bb_len, SZ_4K);
>>
>>I did not get what you mean by "accounting offsets downward"?
>
>we use e.g. `xe_bo_size(lrc->bo) - 4K` to get the offset to "the last
>page" in the buffer. However if the overall size is not a multiple, it
>will return a non-aligned value which is not allowed here: the
>bb_per_ctx_ptr for example needs to be 4k-aligned.

my bad... this is the bb from xe_bb_new(), not the bo. It will already
not be aligned due to xe_bb_new() and it also doesn't matter.

Lucas De Marchi


More information about the Intel-xe mailing list