[PATCH v6 5/5] drm/xe: Refactor default device atomic settings

Fri May 3 16:11:32 UTC 2024

> -----Original Message-----
> From: Das, Nirmoy <nirmoy.das at intel.com>
> Sent: Friday, May 3, 2024 12:01 PM
> To: Zeng, Oak <oak.zeng at intel.com>; intel-xe at lists.freedesktop.org
> Cc: Mrozek, Michal <michal.mrozek at intel.com>
> Subject: Re: [PATCH v6 5/5] drm/xe: Refactor default device atomic settings
> 
> Hi Oak,
> 
> On 5/3/2024 5:39 PM, Zeng, Oak wrote:
> > Hi, Nirmoy,
> >
> >> -----Original Message-----
> >> From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of
> >> Nirmoy Das
> >> Sent: Tuesday, April 30, 2024 12:25 PM
> >> To: intel-xe at lists.freedesktop.org
> >> Cc: Das, Nirmoy <nirmoy.das at intel.com>; Mrozek, Michal
> >> <michal.mrozek at intel.com>
> >> Subject: [PATCH v6 5/5] drm/xe: Refactor default device atomic settings
> >>
> >> The default behavior of device atomics depends on the
> >> VM type and buffer allocation types. Device atomics are
> >> expected to function with all types of allocations for
> >> traditional applications/APIs. Additionally, in compute/SVM
> >> API scenarios with fault mode or LR mode VMs, device atomics
> >> must work with single-region allocations. In all other cases
> >> device atomics should be disabled by default also on platforms
> >> where we know device atomics doesn't on work on particular
> >> allocations types.
> >>
> >> v3: fault mode requires LR mode so only check for LR mode
> >>      to determine compute API(Jose).
> >>      Handle SMEM+LMEM BO's migration to LMEM where device
> >>      atomics is expected to work. (Brian).
> >> v2: Fix platform checks to correct atomics behaviour on PVC.
> >>
> >> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
> >> Acked-by: Michal Mrozek <michal.mrozek at intel.com>
> >> ---
> >>   drivers/gpu/drm/xe/xe_pt.c | 37
> >> ++++++++++++++++++++++++++++++++++---
> >>   drivers/gpu/drm/xe/xe_vm.c |  2 +-
> >>   2 files changed, 35 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> >> index 8d3765d3351e..87975e45622a 100644
> >> --- a/drivers/gpu/drm/xe/xe_pt.c
> >> +++ b/drivers/gpu/drm/xe/xe_pt.c
> >> @@ -619,9 +619,40 @@ xe_pt_stage_bind(struct xe_tile *tile, struct
> xe_vma
> >> *vma,
> >>   	struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id];
> >>   	int ret;
> >>
> >> -	if ((vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT) &&
> >> -	    (is_devmem || !IS_DGFX(xe)))
> >> -		xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
> > I think below logic can be moved to a separate function? I am also good if
> you leave it as below. But I think a separate function is better
> I was thinking about but I think we should do that, once it gets more
> complicated with madvise-like options.
> >
> >
> >> +	/**
> >> +	 * Default atomic expectations for different allocation scenarios are
> >> as follows:
> >> +	 *
> >> +	 * 1. Traditional API: When the VM is not in LR mode:
> >> +	 *    - Device atomics are expected to function with all allocations.
> >> +	 *
> >> +	 * 2. Compute/SVM API: When the VM is in LR mode:
> >> +	 *    - Device atomics are the default behavior when the bo is placed
> >> in a single region.
> >> +	 *    - In all other cases device atomics will be disabled with AE=0 until
> >> an application
> >> +	 *      request differently using a ioctl like madvise.
> >> +	 */
> >> +	if (vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT) {
> >> +		if (xe_vm_in_lr_mode(xe_vma_vm(vma))) {
> >> +			if (bo && xe_bo_has_single_placement(bo))
> >> +				xe_walk.default_pte |=
> >> XE_USM_PPGTT_PTE_AE;
> >> +			/**
> >> +			 * If a SMEM+LMEM allocation is backed by SMEM, a
> >> device
> >> +			 * atomics will cause a gpu page fault and which then
> >> +			 * gets migrated to LMEM, bind such allocations with
> >> +			 * device atomics enabled.
> >> +			 */
> >> +			else if (is_devmem
> >> && !xe_bo_has_single_placement(bo))
> >
> > Note bo could be NULL here...
> 
> is_devmem checks bo that so I left it out.
> 
> >
> > So userptr and system allocator don't have a bo
> >
> > Userptr can't run into this case because userptr can't be is_devmem
> >
> > System allocator allocated memory can have devmem backing store... so
> seems the logic can be:
> >
> > (Is_devmem && ((bo && !single_placement) || !bo)
> >
> > But we don't have system allocator for now, so your logic should work for
> the current code.
> 
> Yes, default for userptr is AE=0 hence I left it so and planning to
> change that when we have madvise-like ioctl.

Not only madvise

What is in my mind for system allocator is here: https://patchwork.freedesktop.org/patch/588534/?series=132229&rev=1

As you can see, there is a case where xe_vma_is_devmem returns true while !bo.

Anyway, the code seems good as of now.

Oak

> 
> 
> >   If you write as above, then it is future proofed. I am okay with your
> current writing also - I just need to change it a little when system allocator
> come into picture....
> >
> > I do feel the logic here is getting complicated, hard for understanding and
> maintenance. Let's follow up the other email thread to see whether we can
> simply default all allocation to be NO_ATOMIC, and depends on compiler and
> UMD to set atomics up.
> 
> I will keep track of that.
> 
> 
> >
> > As of now, Patch is:
> >
> > Reviewed-by: Oak Zeng <oak.zeng at intel.com>
> 
> 
> Thanks a lot.
> 
> Nirmoy
> 
> >
> >> +				xe_walk.default_pte |=
> >> XE_USM_PPGTT_PTE_AE;
> >> +		} else {
> >> +			xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
> >> +		}
> >> +
> >> +		/**
> >> +		 * Unset AE if the platform(PVC) doesn't support it on an
> >> +		 * allocation
> >> +		 */
> >> +		if (!xe->info.has_device_atomics_on_smem && !is_devmem)
> >> +			xe_walk.default_pte &= ~XE_USM_PPGTT_PTE_AE;
> >> +	}
> >>
> >>   	if (is_devmem) {
> >>   		xe_walk.default_pte |= XE_PPGTT_PTE_DM;
> >> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> >> index f1357e2a3b10..d17192c8b7de 100644
> >> --- a/drivers/gpu/drm/xe/xe_vm.c
> >> +++ b/drivers/gpu/drm/xe/xe_vm.c
> >> @@ -888,7 +888,7 @@ static struct xe_vma *xe_vma_create(struct
> xe_vm
> >> *vm,
> >>   	for_each_tile(tile, vm->xe, id)
> >>   		vma->tile_mask |= 0x1 << id;
> >>
> >> -	if (GRAPHICS_VER(vm->xe) >= 20 || vm->xe->info.platform ==
> >> XE_PVC)
> >> +	if (vm->xe->info.has_atomic_enable_pte_bit)
> >>   		vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
> >>
> >>   	vma->pat_index = pat_index;
> >> --
> >> 2.42.0