[Intel-xe] [PATCH 1/3] drm/xe: Include hardware prefetch buffer in batchbuffer allocations

Lucas De Marchi lucas.demarchi at intel.com
Fri Mar 31 20:14:15 UTC 2023


On Wed, Mar 29, 2023 at 10:33:32AM -0700, Matt Roper wrote:
>The hardware prefetches several cachelines of data from batchbuffers
>before they are parsed.  This prefetching only stops when the parser
>encounters an MI_BATCH_BUFFER_END instruction (or a nested
>MI_BATCH_BUFFER_START), so we must ensure that there is enough padding
>at the end of the batchbuffer to prevent the prefetcher from running
>past the end of the allocation and potentially faulting.
>
>Bspec: 45717
>Signed-off-by: Matt Roper <matthew.d.roper at intel.com>
>---
> drivers/gpu/drm/xe/xe_bb.c | 25 +++++++++++++++++++++++--
> 1 file changed, 23 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/xe/xe_bb.c b/drivers/gpu/drm/xe/xe_bb.c
>index 5b24018e2a80..f326f117ba3b 100644
>--- a/drivers/gpu/drm/xe/xe_bb.c
>+++ b/drivers/gpu/drm/xe/xe_bb.c
>@@ -8,11 +8,26 @@
> #include "regs/xe_gpu_commands.h"
> #include "xe_device.h"
> #include "xe_engine_types.h"
>+#include "xe_gt.h"
> #include "xe_hw_fence.h"
> #include "xe_sa.h"
> #include "xe_sched_job.h"
> #include "xe_vm_types.h"
>
>+static int bb_prefetch(struct xe_gt *gt)
>+{
>+	struct xe_device *xe = gt->xe;
>+
>+	if (GRAPHICS_VERx100(xe) >= 1250 && !xe_gt_is_media_type(gt))
>+		/*
>+		 * RCS and CCS require 1K, although other engines would be
>+		 * okay with 512.
>+		 */
>+		return SZ_1K;
>+	else
>+		return SZ_512;
>+}
>+
> struct xe_bb *xe_bb_new(struct xe_gt *gt, u32 dwords, bool usm)
> {
> 	struct xe_bb *bb = kmalloc(sizeof(*bb), GFP_KERNEL);
>@@ -21,8 +36,14 @@ struct xe_bb *xe_bb_new(struct xe_gt *gt, u32 dwords, bool usm)
> 	if (!bb)
> 		return ERR_PTR(-ENOMEM);
>
>-	bb->bo = xe_sa_bo_new(!usm ? &gt->kernel_bb_pool :
>-			      &gt->usm.bb_pool, 4 * dwords + 4);
>+	/*
>+	 * We need to allocate space for the requested number of dwords,
>+	 * one additional MI_BATCH_BUFFER_END dword, and additional buffer
>+	 * space to accomodate the platform-specific hardware prefetch
>+	 * requirements.
>+	 */
>+	bb->bo = xe_sa_bo_new(!usm ? &gt->kernel_bb_pool : &gt->usm.bb_pool,
>+			      4 * (dwords + 1) + bb_prefetch(gt));

if the command buffer for the CS is 512 or 1024, wouldn't it be
sufficient to just align the end rather than increase it by that?

Lucas De Marchi


More information about the Intel-xe mailing list