[Intel-xe] [PATCH] drm/xe: Implement HW workaround 14016763929

Zeng, Oak oak.zeng at intel.com
Fri Aug 4 20:08:13 UTC 2023



Thanks,
Oak

> -----Original Message-----
> From: Welty, Brian <brian.welty at intel.com>
> Sent: August 4, 2023 2:29 PM
> To: Zeng, Oak <oak.zeng at intel.com>; Souza, Jose <jose.souza at intel.com>;
> intel-xe at lists.freedesktop.org
> Cc: De Marchi, Lucas <lucas.demarchi at intel.com>; Roper, Matthew D
> <matthew.d.roper at intel.com>
> Subject: Re: [Intel-xe] [PATCH] drm/xe: Implement HW workaround 14016763929
> 
> 
> On 8/3/2023 12:53 PM, Zeng, Oak wrote:
> >
> >
> > Thanks,
> > Oak
> >
> >> -----Original Message-----
> >> From: Souza, Jose <jose.souza at intel.com>
> >> Sent: August 3, 2023 3:10 PM
> >> To: intel-xe at lists.freedesktop.org; Zeng, Oak <oak.zeng at intel.com>
> >> Cc: Roper, Matthew D <matthew.d.roper at intel.com>; De Marchi, Lucas
> >> <lucas.demarchi at intel.com>
> >> Subject: Re: [Intel-xe] [PATCH] drm/xe: Implement HW workaround
> 14016763929
> >>
> >> On Thu, 2023-08-03 at 15:01 -0400, Oak Zeng wrote:
> >>> To workaround a HW bug on DG2, driver is required to map the whole
> >>> ppgtt virtual address space before GPU workload submission. Thus
> >>> set the XE_VM_FLAG_SCRATCH_PAGE flag during vm create so the whole
> >>> address space is mapped to point to scratch page.
> >>>
> >>> Signed-off-by: Oak Zeng <oak.zeng at intel.com>
> >>> ---
> >>>   drivers/gpu/drm/xe/Makefile        | 2 +-
> >>>   drivers/gpu/drm/xe/xe_vm.c         | 5 +++++
> >>>   drivers/gpu/drm/xe/xe_wa_oob.rules | 1 +
> >>>   3 files changed, 7 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> >>> index e79624ab2cb3..e6a05d35c931 100644
> >>> --- a/drivers/gpu/drm/xe/Makefile
> >>> +++ b/drivers/gpu/drm/xe/Makefile
> >>> @@ -37,7 +37,7 @@ quiet_cmd_wa_oob = GEN     $(notdir
> $(generated_oob))
> >>>   $(generated_oob) &: $(obj)/xe_gen_wa_oob
> >> $(srctree)/$(src)/xe_wa_oob.rules
> >>>   	$(call cmd,wa_oob)
> >>>
> >>> -$(obj)/xe_guc.o $(obj)/xe_wa.o $(obj)/xe_ring_ops.o: $(generated_oob)
> >>> +$(obj)/xe_guc.o $(obj)/xe_wa.o $(obj)/xe_ring_ops.o $(obj)/xe_vm.o:
> >> $(generated_oob)
> >>>
> >>>   # Please keep these build lists sorted!
> >>>
> >>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> >>> index cb28dbc2bdbb..b045bb341a5c 100644
> >>> --- a/drivers/gpu/drm/xe/xe_vm.c
> >>> +++ b/drivers/gpu/drm/xe/xe_vm.c
> >>> @@ -29,6 +29,8 @@
> >>>   #include "xe_res_cursor.h"
> >>>   #include "xe_sync.h"
> >>>   #include "xe_trace.h"
> >>> +#include "generated/xe_wa_oob.h"
> >>> +#include "xe_wa.h"
> >>>
> >>>   #define TEST_VM_ASYNC_OPS_ERROR
> >>>
> >>> @@ -1235,6 +1237,9 @@ struct xe_vm *xe_vm_create(struct xe_device *xe,
> >> u32 flags)
> >>>
> >>>   	INIT_LIST_HEAD(&vm->extobj.list);
> >>>
> >>> +	if (XE_WA(xe_root_mmio_gt(xe), 14016763929))
> >>> +		flags |= XE_VM_FLAG_SCRATCH_PAGE;
> 
> Is above correct place to insert this?
> I didn't lookup the WA, but wouldn't only user VM's be the concern?
> So the above code should instead go in xe_vm_create_ioctl() so then is
> not getting applied for example to the special vm created in
> xe_migrate_init().

You are right Brian. I will resubmit.
> 
> 
> >>
> >> Isn't DRM_XE_VM_CREATE_FAULT_MODE incompatible with SCRATCH_PAGE?
> >> What if user-space asks for fault mode in DG2?
> 
> So I think we agreed this is bascially a moot point since FAULT_MODE is
> not supported on DG2.
> 
> To tidy this up and address Jose's point, I think you could reorder the
> error checks then?
> 
> So make below the first of the FAULT_MODE checks?  Or maybe doesn't
> matter...
> 
>    if (XE_IOCTL_DBG(xe, args->flags & DRM_XE_VM_CREATE_FAULT_MODE &&
>                     !xe->info.supports_usm))
>            return -EINVAL;

Make sense to me. Will do

Oak
> 
> 
> 
> >
> > Very good point. When I first get this request (to implement this workaround
> the way as showed in this patch), I thought this workaround is better to be
> implemented from UMD, since kmd already exposed those flags to user space.
> But there were some other experts has discussed, see the comment here:
> https://jira.devtools.intel.com/browse/VLK-
> 49991?focusedCommentId=21055607&page=com.atlassian.jira.plugin.system.iss
> uetabpanels:comment-tabpanel#comment-21055607
> >
> > Let me communicate back to see whether we can implement this workaround
> in umd so we don't have such incompatible configuration and also not silently
> change user's setting.
> >
> > Thanks,
> > Oak
> >>
> >>
> >>> +
> >>>   	if (!(flags & XE_VM_FLAG_MIGRATION))
> >>>   		xe_device_mem_access_get(xe);
> >>>
> >>> diff --git a/drivers/gpu/drm/xe/xe_wa_oob.rules
> >> b/drivers/gpu/drm/xe/xe_wa_oob.rules
> >>> index 15c23813398a..6fc68c592ca8 100644
> >>> --- a/drivers/gpu/drm/xe/xe_wa_oob.rules
> >>> +++ b/drivers/gpu/drm/xe/xe_wa_oob.rules
> >>> @@ -15,3 +15,4 @@
> >>>   18020744125	PLATFORM(PVC)
> >>>   1509372804	PLATFORM(PVC), GRAPHICS_STEP(A0, C0)
> >>>   1409600907	GRAPHICS_VERSION_RANGE(1200, 1250)
> >>> +14016763929	PLATFORM(DG2)
> >


More information about the Intel-xe mailing list