[PATCH v6 5/5] drm/xe: Refactor default device atomic settings

Fri May 3 16:30:26 UTC 2024

On 5/3/2024 6:11 PM, Zeng, Oak wrote:
>
>> -----Original Message-----
>> From: Das, Nirmoy <nirmoy.das at intel.com>
>> Sent: Friday, May 3, 2024 12:01 PM
>> To: Zeng, Oak <oak.zeng at intel.com>; intel-xe at lists.freedesktop.org
>> Cc: Mrozek, Michal <michal.mrozek at intel.com>
>> Subject: Re: [PATCH v6 5/5] drm/xe: Refactor default device atomic settings
>>
>> Hi Oak,
>>
>> On 5/3/2024 5:39 PM, Zeng, Oak wrote:
>>> Hi, Nirmoy,
>>>
>>>> -----Original Message-----
>>>> From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of
>>>> Nirmoy Das
>>>> Sent: Tuesday, April 30, 2024 12:25 PM
>>>> To: intel-xe at lists.freedesktop.org
>>>> Cc: Das, Nirmoy <nirmoy.das at intel.com>; Mrozek, Michal
>>>> <michal.mrozek at intel.com>
>>>> Subject: [PATCH v6 5/5] drm/xe: Refactor default device atomic settings
>>>>
>>>> The default behavior of device atomics depends on the
>>>> VM type and buffer allocation types. Device atomics are
>>>> expected to function with all types of allocations for
>>>> traditional applications/APIs. Additionally, in compute/SVM
>>>> API scenarios with fault mode or LR mode VMs, device atomics
>>>> must work with single-region allocations. In all other cases
>>>> device atomics should be disabled by default also on platforms
>>>> where we know device atomics doesn't on work on particular
>>>> allocations types.
>>>>
>>>> v3: fault mode requires LR mode so only check for LR mode
>>>>       to determine compute API(Jose).
>>>>       Handle SMEM+LMEM BO's migration to LMEM where device
>>>>       atomics is expected to work. (Brian).
>>>> v2: Fix platform checks to correct atomics behaviour on PVC.
>>>>
>>>> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
>>>> Acked-by: Michal Mrozek <michal.mrozek at intel.com>
>>>> ---
>>>>    drivers/gpu/drm/xe/xe_pt.c | 37
>>>> ++++++++++++++++++++++++++++++++++---
>>>>    drivers/gpu/drm/xe/xe_vm.c |  2 +-
>>>>    2 files changed, 35 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
>>>> index 8d3765d3351e..87975e45622a 100644
>>>> --- a/drivers/gpu/drm/xe/xe_pt.c
>>>> +++ b/drivers/gpu/drm/xe/xe_pt.c
>>>> @@ -619,9 +619,40 @@ xe_pt_stage_bind(struct xe_tile *tile, struct
>> xe_vma
>>>> *vma,
>>>>    	struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id];
>>>>    	int ret;
>>>>
>>>> -	if ((vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT) &&
>>>> -	    (is_devmem || !IS_DGFX(xe)))
>>>> -		xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
>>> I think below logic can be moved to a separate function? I am also good if
>> you leave it as below. But I think a separate function is better
>> I was thinking about but I think we should do that, once it gets more
>> complicated with madvise-like options.
>>>
>>>> +	/**
>>>> +	 * Default atomic expectations for different allocation scenarios are
>>>> as follows:
>>>> +	 *
>>>> +	 * 1. Traditional API: When the VM is not in LR mode:
>>>> +	 *    - Device atomics are expected to function with all allocations.
>>>> +	 *
>>>> +	 * 2. Compute/SVM API: When the VM is in LR mode:
>>>> +	 *    - Device atomics are the default behavior when the bo is placed
>>>> in a single region.
>>>> +	 *    - In all other cases device atomics will be disabled with AE=0 until
>>>> an application
>>>> +	 *      request differently using a ioctl like madvise.
>>>> +	 */
>>>> +	if (vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT) {
>>>> +		if (xe_vm_in_lr_mode(xe_vma_vm(vma))) {
>>>> +			if (bo && xe_bo_has_single_placement(bo))
>>>> +				xe_walk.default_pte |=
>>>> XE_USM_PPGTT_PTE_AE;
>>>> +			/**
>>>> +			 * If a SMEM+LMEM allocation is backed by SMEM, a
>>>> device
>>>> +			 * atomics will cause a gpu page fault and which then
>>>> +			 * gets migrated to LMEM, bind such allocations with
>>>> +			 * device atomics enabled.
>>>> +			 */
>>>> +			else if (is_devmem
>>>> && !xe_bo_has_single_placement(bo))
>>> Note bo could be NULL here...
>> is_devmem checks bo that so I left it out.
>>
>>> So userptr and system allocator don't have a bo
>>>
>>> Userptr can't run into this case because userptr can't be is_devmem
>>>
>>> System allocator allocated memory can have devmem backing store... so
>> seems the logic can be:
>>> (Is_devmem && ((bo && !single_placement) || !bo)
>>>
>>> But we don't have system allocator for now, so your logic should work for
>> the current code.
>>
>> Yes, default for userptr is AE=0 hence I left it so and planning to
>> change that when we have madvise-like ioctl.
> Not only madvise
>
> What is in my mind for system allocator is here: https://patchwork.freedesktop.org/patch/588534/?series=132229&rev=1
>
>
> As you can see, there is a case where xe_vma_is_devmem returns true while !bo.

ah yes, I see what you mean.

>
> Anyway, the code seems good as of now.
>
> Oak
>
>
>>
>>>    If you write as above, then it is future proofed. I am okay with your
>> current writing also - I just need to change it a little when system allocator
>> come into picture....
>>> I do feel the logic here is getting complicated, hard for understanding and
>> maintenance. Let's follow up the other email thread to see whether we can
>> simply default all allocation to be NO_ATOMIC, and depends on compiler and
>> UMD to set atomics up.
>>
>> I will keep track of that.
>>
>>
>>> As of now, Patch is:
>>>
>>> Reviewed-by: Oak Zeng <oak.zeng at intel.com>
>>
>> Thanks a lot.
>>
>> Nirmoy
>>
>>>> +				xe_walk.default_pte |=
>>>> XE_USM_PPGTT_PTE_AE;
>>>> +		} else {
>>>> +			xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
>>>> +		}
>>>> +
>>>> +		/**
>>>> +		 * Unset AE if the platform(PVC) doesn't support it on an
>>>> +		 * allocation
>>>> +		 */
>>>> +		if (!xe->info.has_device_atomics_on_smem && !is_devmem)
>>>> +			xe_walk.default_pte &= ~XE_USM_PPGTT_PTE_AE;
>>>> +	}
>>>>
>>>>    	if (is_devmem) {
>>>>    		xe_walk.default_pte |= XE_PPGTT_PTE_DM;
>>>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>>>> index f1357e2a3b10..d17192c8b7de 100644
>>>> --- a/drivers/gpu/drm/xe/xe_vm.c
>>>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>>>> @@ -888,7 +888,7 @@ static struct xe_vma *xe_vma_create(struct
>> xe_vm
>>>> *vm,
>>>>    	for_each_tile(tile, vm->xe, id)
>>>>    		vma->tile_mask |= 0x1 << id;
>>>>
>>>> -	if (GRAPHICS_VER(vm->xe) >= 20 || vm->xe->info.platform ==
>>>> XE_PVC)
>>>> +	if (vm->xe->info.has_atomic_enable_pte_bit)
>>>>    		vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>>>>
>>>>    	vma->pat_index = pat_index;
>>>> --
>>>> 2.42.0