[Intel-xe] [PATCH] drm/xe: Implement HW workaround 14016763929
Welty, Brian
brian.welty at intel.com
Fri Aug 4 18:29:13 UTC 2023
On 8/3/2023 12:53 PM, Zeng, Oak wrote:
>
>
> Thanks,
> Oak
>
>> -----Original Message-----
>> From: Souza, Jose <jose.souza at intel.com>
>> Sent: August 3, 2023 3:10 PM
>> To: intel-xe at lists.freedesktop.org; Zeng, Oak <oak.zeng at intel.com>
>> Cc: Roper, Matthew D <matthew.d.roper at intel.com>; De Marchi, Lucas
>> <lucas.demarchi at intel.com>
>> Subject: Re: [Intel-xe] [PATCH] drm/xe: Implement HW workaround 14016763929
>>
>> On Thu, 2023-08-03 at 15:01 -0400, Oak Zeng wrote:
>>> To workaround a HW bug on DG2, driver is required to map the whole
>>> ppgtt virtual address space before GPU workload submission. Thus
>>> set the XE_VM_FLAG_SCRATCH_PAGE flag during vm create so the whole
>>> address space is mapped to point to scratch page.
>>>
>>> Signed-off-by: Oak Zeng <oak.zeng at intel.com>
>>> ---
>>> drivers/gpu/drm/xe/Makefile | 2 +-
>>> drivers/gpu/drm/xe/xe_vm.c | 5 +++++
>>> drivers/gpu/drm/xe/xe_wa_oob.rules | 1 +
>>> 3 files changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>>> index e79624ab2cb3..e6a05d35c931 100644
>>> --- a/drivers/gpu/drm/xe/Makefile
>>> +++ b/drivers/gpu/drm/xe/Makefile
>>> @@ -37,7 +37,7 @@ quiet_cmd_wa_oob = GEN $(notdir $(generated_oob))
>>> $(generated_oob) &: $(obj)/xe_gen_wa_oob
>> $(srctree)/$(src)/xe_wa_oob.rules
>>> $(call cmd,wa_oob)
>>>
>>> -$(obj)/xe_guc.o $(obj)/xe_wa.o $(obj)/xe_ring_ops.o: $(generated_oob)
>>> +$(obj)/xe_guc.o $(obj)/xe_wa.o $(obj)/xe_ring_ops.o $(obj)/xe_vm.o:
>> $(generated_oob)
>>>
>>> # Please keep these build lists sorted!
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>>> index cb28dbc2bdbb..b045bb341a5c 100644
>>> --- a/drivers/gpu/drm/xe/xe_vm.c
>>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>>> @@ -29,6 +29,8 @@
>>> #include "xe_res_cursor.h"
>>> #include "xe_sync.h"
>>> #include "xe_trace.h"
>>> +#include "generated/xe_wa_oob.h"
>>> +#include "xe_wa.h"
>>>
>>> #define TEST_VM_ASYNC_OPS_ERROR
>>>
>>> @@ -1235,6 +1237,9 @@ struct xe_vm *xe_vm_create(struct xe_device *xe,
>> u32 flags)
>>>
>>> INIT_LIST_HEAD(&vm->extobj.list);
>>>
>>> + if (XE_WA(xe_root_mmio_gt(xe), 14016763929))
>>> + flags |= XE_VM_FLAG_SCRATCH_PAGE;
Is above correct place to insert this?
I didn't lookup the WA, but wouldn't only user VM's be the concern?
So the above code should instead go in xe_vm_create_ioctl() so then is
not getting applied for example to the special vm created in
xe_migrate_init().
>>
>> Isn't DRM_XE_VM_CREATE_FAULT_MODE incompatible with SCRATCH_PAGE?
>> What if user-space asks for fault mode in DG2?
So I think we agreed this is bascially a moot point since FAULT_MODE is
not supported on DG2.
To tidy this up and address Jose's point, I think you could reorder the
error checks then?
So make below the first of the FAULT_MODE checks? Or maybe doesn't
matter...
if (XE_IOCTL_DBG(xe, args->flags & DRM_XE_VM_CREATE_FAULT_MODE &&
!xe->info.supports_usm))
return -EINVAL;
>
> Very good point. When I first get this request (to implement this workaround the way as showed in this patch), I thought this workaround is better to be implemented from UMD, since kmd already exposed those flags to user space. But there were some other experts has discussed, see the comment here: https://jira.devtools.intel.com/browse/VLK-49991?focusedCommentId=21055607&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-21055607
>
> Let me communicate back to see whether we can implement this workaround in umd so we don't have such incompatible configuration and also not silently change user's setting.
>
> Thanks,
> Oak
>>
>>
>>> +
>>> if (!(flags & XE_VM_FLAG_MIGRATION))
>>> xe_device_mem_access_get(xe);
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_wa_oob.rules
>> b/drivers/gpu/drm/xe/xe_wa_oob.rules
>>> index 15c23813398a..6fc68c592ca8 100644
>>> --- a/drivers/gpu/drm/xe/xe_wa_oob.rules
>>> +++ b/drivers/gpu/drm/xe/xe_wa_oob.rules
>>> @@ -15,3 +15,4 @@
>>> 18020744125 PLATFORM(PVC)
>>> 1509372804 PLATFORM(PVC), GRAPHICS_STEP(A0, C0)
>>> 1409600907 GRAPHICS_VERSION_RANGE(1200, 1250)
>>> +14016763929 PLATFORM(DG2)
>
More information about the Intel-xe
mailing list