<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>Hi Brian,<br>
</p>
<div class="moz-cite-prefix">On 4/29/2024 10:14 PM, Welty, Brian
wrote:<br>
</div>
<blockquote type="cite" cite="mid:41eb1fab-d9a5-45be-9c95-05a34fc7685e@intel.com">
<br>
<br>
On 4/25/2024 3:23 PM, Nirmoy Das wrote:
<br>
<blockquote type="cite">The default behavior of device atomics
depends on the
<br>
VM type and buffer allocation types. Device atomics are
<br>
expected to function with all types of allocations for
<br>
traditional applications/APIs. Additionally, in compute/SVM
<br>
API scenarios with fault mode or LR mode VMs, device atomics
<br>
must work with single-region allocations. </blockquote>
<br>
I think additional patch may be needed..... for the Compute mode
<br>
handling.
<br>
I'm not sure correct thing will happened for 'shared'
(multi-placement) allocations.
<br>
HW will raise an atomic access page fault.
<br>
</blockquote>
<p>Yes, I think the change should be then <br>
</p>
<p> if (vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT) {
<br>
if (xe_vm_in_fault_mode(xe_vma_vm(vma)) ||
<br>
xe_vm_in_lr_mode(xe_vma_vm(vma))) {
<br>
if (bo && (xe_bo_has_single_placement(bo) || <b>is_devmem</b>)
)<br>
xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE; <br>
</p>
<p> } else {
<br>
xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
<br>
}
<br>
} <br>
</p>
<p>So when the handle after migration state. Though currently shared
allocation is broken as cpu access will not trigger any <br>
</p>
<p>cpu migration. This needs future uAPI.<br>
</p>
<blockquote type="cite" cite="mid:41eb1fab-d9a5-45be-9c95-05a34fc7685e@intel.com">Our page
fault handler only checks this was atomic fault and proceeds to
<br>
migrate the BO to VRAM. I think PTE_AE bit won't be set with your
<br>
changes, so seems are in infinite loop of atomic access faults on
same address.
<br>
<br>
I think we need to fail the page fault in this case, so HW raise
CAT error here? As atomic access not supposed to be allowed for
above case
<br>
until future uAPI is added?
<br>
<br>
And need IGT test for this.
<br>
</blockquote>
<p>IGT for the patch ? Currently we have a test(xe_exec_atomic) for
atomics access which will test the traditional one but yes, we
need some IGT for compute case.</p>
<p><br>
</p>
<p>Thanks,</p>
<p>Nirmoy<br>
</p>
<blockquote type="cite" cite="mid:41eb1fab-d9a5-45be-9c95-05a34fc7685e@intel.com">
<br>
-Brian
<br>
<br>
<br>
<blockquote type="cite">In all other cases
<br>
device atomics should be disabled by default.
<br>
<br>
Signed-off-by: Nirmoy Das <a class="moz-txt-link-rfc2396E" href="mailto:nirmoy.das@intel.com"><nirmoy.das@intel.com></a>
<br>
---
<br>
drivers/gpu/drm/xe/xe_pt.c | 24 ++++++++++++++++++++----
<br>
drivers/gpu/drm/xe/xe_vm.c | 3 ++-
<br>
2 files changed, 22 insertions(+), 5 deletions(-)
<br>
<br>
diff --git a/drivers/gpu/drm/xe/xe_pt.c
b/drivers/gpu/drm/xe/xe_pt.c
<br>
index 5b7930f46cf3..a8e9e8592c43 100644
<br>
--- a/drivers/gpu/drm/xe/xe_pt.c
<br>
+++ b/drivers/gpu/drm/xe/xe_pt.c
<br>
@@ -597,7 +597,6 @@ static int
<br>
xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
<br>
struct xe_vm_pgtable_update *entries, u32
*num_entries)
<br>
{
<br>
- struct xe_device *xe = tile_to_xe(tile);
<br>
struct xe_bo *bo = xe_vma_bo(vma);
<br>
bool is_devmem = !xe_vma_is_userptr(vma) && bo
&&
<br>
(xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo));
<br>
@@ -619,9 +618,26 @@ xe_pt_stage_bind(struct xe_tile *tile,
struct xe_vma *vma,
<br>
struct xe_pt *pt =
xe_vma_vm(vma)->pt_root[tile->id];
<br>
int ret;
<br>
- if ((vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT)
&&
<br>
- (is_devmem || !IS_DGFX(xe)))
<br>
- xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
<br>
+ /**
<br>
+ * Default atomic expectations for different allocation
scenarios are as follows:
<br>
+ *
<br>
+ * 1. Traditional API: When the VM is not in fault mode or
LR mode:
<br>
+ * - Device atomics are expected to function with all
allocations.
<br>
+ *
<br>
+ * 2. Compute/SVM API: When the VM is either in fault mode
or LR mode:
<br>
+ * - Device atomics are the default behavior when the bo
is placed in a single region.
<br>
+ * - In all other cases device atomics will be disabled
with AE=0 until an application
<br>
+ * request differently using a ioctl like madvise.
<br>
+ */
<br>
+ if (vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT) {
<br>
+ if (xe_vm_in_fault_mode(xe_vma_vm(vma)) ||
<br>
+ xe_vm_in_lr_mode(xe_vma_vm(vma))) {
<br>
+ if (bo && xe_bo_has_single_placement(bo))
<br>
+ xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
<br>
+ } else {
<br>
+ xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
<br>
+ }
<br>
+ }
<br>
if (is_devmem) {
<br>
xe_walk.default_pte |= XE_PPGTT_PTE_DM;
<br>
diff --git a/drivers/gpu/drm/xe/xe_vm.c
b/drivers/gpu/drm/xe/xe_vm.c
<br>
index e41345c1627d..ac08b6fd537e 100644
<br>
--- a/drivers/gpu/drm/xe/xe_vm.c
<br>
+++ b/drivers/gpu/drm/xe/xe_vm.c
<br>
@@ -805,7 +805,8 @@ static struct xe_vma *xe_vma_create(struct
xe_vm *vm,
<br>
for_each_tile(tile, vm->xe, id)
<br>
vma->tile_mask |= 0x1 << id;
<br>
- if (GRAPHICS_VER(vm->xe) >= 20 ||
vm->xe->info.platform == XE_PVC)
<br>
+ if (vm->xe->info.has_atomic_enable_pte_bit &&
<br>
+ vm->xe->info.has_device_atomics_on_smem)
<br>
vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
<br>
vma->pat_index = pat_index;
<br>
</blockquote>
</blockquote>
</body>
</html>